The relabel
pipeline works by building a 1:1 map between the previously used gene panel and the new gene panel. This map is used to correct the gene labels given to each of the discovered transcripts. After correcting the labels, all results are recomputed using the same methods employed by the Xenium Onboard Analysis pipeline.
The resegment
pipeline uses the same nucleus segmentation model as the corresponding Xenium Onboard Analysis (XOA) version's segmentation algorithm. For example, Xenium Ranger v3.0 uses the nucleus segmentation model included in the XOA v3.0 release. XOA segmentation algorithm changes by software version are summarized in this table.
Nucleus segmentation will always be done using the 3D DAPI Z-stack image. Nuclei are filtered by 95th percentile pixel intensity (the intensity threshold can be adjusted with the --dapi-filter
parameter).
See Xenium Onboard Analysis algorithm overview for details about the nuclear expansion-only and multimodal cell segmentation algorithms.
Resegment with multimodal cell segmentation
In the Xenium Onboard Analysis pipeline v2.0 and later, the multimodal cell segmentation algorithm results are prioritized in this order for each cell:
- Segment cells based on their cell boundary stain: The inferred segmentation from this method should be closest to the true cell membrane boundary. It uses cell-surface marker antibodies to target epithelial markers (ATP1A1, E-Cadherin) and immune markers (pan-lymphocyte: CD45). This method can split nuclei, define cells missing a nucleus, and identify multinucleate cells. Nuclei that overlap with anucleate cells are assigned to the cell.
- Segment cells based on expansion from the nucleus to the cell interior stain edge: This method includes both a deep learning model and a nuclear expansion method using the interior stain to infer cell boundaries. It uses the interior stain (18S rRNA marker) and the DAPI stain for nuclei.
- Nuclear expansion: For cases where cells that do not have boundary or interior stains, segment cells with a nuclear (DAPI) expansion distance of 5 µm or until another cell boundary is encountered (described more on the XOA segmentation algorithms page).
In Xenium Ranger:
- If
--boundary-stain
is enabled (default), the algorithm will do cell segmentation using the selected boundary stain and DAPI nuclear expansion for any cells that do not have a boundary stain. If disabled, Xenium Ranger will not use the boundary stain segmentation method. - If
--interior-stain
is enabled (default), the algorithm will do interior segmentation and expansion with selected stain and DAPI nuclear expansion for any cells that do not have an interior stain. If disabled, Xenium Ranger will not use the interior stain segmentation method.
Next, if boundary cell segmentation results are available, Xenium Ranger can assign nuclei that overlap significantly with a boundary-segmented cell to that cell. A boundary-segmented cell can have multiple overlapping nuclei. For each nucleus, if 50% or more of that nucleus overlaps with the cell boundary, the overlapping portion of the nucleus is assigned to that cell. For each nucleus, if 50% or more of that nucleus is outside the boundary-segmented cell, the algorithm designates it as a new nucleus outside of that cell and continues to the next prioritized segmentation method (interior segmentation or free expansion). This guarantees that a nucleus will never partially overlap a boundary-segmented cell in the final result. If the nucleus and cell overlap is < 50%, it will be removed from the outputs.
For the remaining nuclei, if interior segmentation is available, the algorithm then finds the nuclei that have significant overlap with the interior stain and expands those nuclei with the interior stain mask. Finally, the remaining nuclei that do not overlap significantly with either boundary-segmented or interior-segmented cells will expand isotropically by the --expansion-distance
parameter (5 µm default in v2.0).
Xenium Ranger can import a variety of community-developed and XOA segmentation formats. XOA segmentation algorithm changes by software version are summarized in this table.
The methods used to incorporate new segmentations fall under three scenarios (read more below):
- Import nucleus and cell labeled segmentation masks (TIFF or NPY), where each pixel is an integer corresponding to the cell ID
- Import nucleus and cell segmentation polygons (GeoJSON)
- Import transcript-based segmentations
For every scenario, a unique random ID is assigned to each cell in the same string format used by the XOA pipeline.
In scenario 1, if the user only imports a nuclear segmentation mask, then a new cell segmentation is generated by nuclear expansion. If both nuclei and cells are imported, then Xenium Ranger will inspect the masks for consistency.
- If nuclei and cells are imported, imported nuclei are treated as nuclei and imported cells are treated in the same way as boundary-segmented cells (described above for
resegment
pipeline). An imported cell can have multiple overlapping nuclei. For each nucleus, if 50% or more of that nucleus overlaps with the cell boundary, the overlapping portion of the nucleus is assigned to that cell. For each nucleus, if 50% or more of that nucleus is outside the imported cell, the algorithm designates it as a new nucleus outside of that cell and continues to the next prioritized segmentation method (nuclear expansion). This guarantees that a nucleus will never partially overlap an imported cell in the final result. If the nucleus and cell overlap is < 50%, it will be removed from the outputs. - If only cells are imported, Xenium Ranger will produce two polygon sets and masks in the
cells.zarr.zip
file, where the polygon set and mask that are usually reserved for nuclei will be empty. - If only nuclei are imported, Xenium Ranger will isotropically expand using the
--expansion-distance
parameter (5 µm default in v2.0 and later).
In scenario 2, Xenium Ranger first takes the input GeoJSON polygons and converts them into labeled masks. Given the flexibility of the GeoJSON format, it is possible the input polygons do not fit neatly into a mask. For example, two polygons could overlap one another. In the process of converting polygons into masks, Xenium Ranger detects polygons that overlap one another and marks the overlapping pixels as ambiguous. The ambiguous pixels are then resolved by assigning the pixel to the object with the most neighboring pixels. Metrics are generated to explain how many ambiguous pixels were found. For polygons with holes ("non-simple polygons"), the holes are removed. For cells defined as multipolygons, the cell is removed entirely. Metrics are generated to report these removals. After masks have been generated, the remaining methods follow scenario 1.
In scenario 3, when importing a transcript-based segmentation, Xenium Ranger records the cell assignments for each of the transcripts. Subsequently, all results are recomputed using the imported transcript assignments. When constructing the cell-feature matrix, Xenium Ranger uses the transcript quality score from the transcripts output file and only includes transcripts with Q-score ≥ 20. If any cells only have low quality transcripts, this will result in cells with zero transcripts in the cell-feature matrix file.
As mentioned above, there can potentially be issues in converting imported polygons into a mask. For the case of importing transcript assignments, Xenium Ranger will not try to convert the visualization polygons into a mask. Instead, it will generate an empty mask and leave the polygons untouched.
To combine spatial data in the --viz-polygons
GeoJSON with transcript data in the --transcript-assignment
segmentation CSV, Xenium Ranger matches the CSV file's integer suffix (e.g., "3" in cell
= "CRc17aaabcd-3" for a Baysor ID) to the GeoJSON file's integer value (e.g., "cell":3
for a Baysor ID). Every cell in the visualized polygons must have a transcript assigned to it or Xenium Ranger will error. See this Knowledge Base article for information about cleaning transcript-based segmentation inputs.
How do transcript-based method (i.e., Baysor) cell IDs map to Xenium Ranger cell IDs?
The Xenium label_id
is found in the cell_boundaries
file and corresponds to the integer value of segmentation mask image pixels for non-transcript-assignment segmentations. It is always contiguous from 1 - N, where N is the number of cells.
Xenium Ranger import-segmentation
sorts the imported cell ID integer suffixes (e.g., "3" in "CRc17aaabcd-3" for a Baysor ID) and then maps them to label_ids
1 - N.
If the imported cell IDs skip any integers or start from 0 instead of 1, they will be shifted relative to the Xenium label_id
.