This page describes raw output (decoded transcript counts and morphology images) and other standard output files derived from them, which are included in the Xenium output directory for each selected region (also see Archiving Xenium data). These data reduce low-level internal image sensor data, preserving details needed to assess decoded transcript quality (learn more at Overview of Xenium Algorithms).
All run data will be stored in the output/
directory on the Xenium Analysis Computer and will be accessible on the Desktop. Refer to the Xenium Instrument User Guide (CG000584) for instructions to export run data from the instrument.
Within the output/
directory, the data from individual runs are stored as subfolders and include the user-defined run name in the folder name. Within the top-level run folder, there are subfolders for each of the user-defined regions on the Xenium slides. The overall organization of subfolders is shown below:
output
└── <yyyymmdd>__<hhmmss>__<runName>
└── output-<instrumentSN>__<slideID>__<regionName>__<yyyymmdd>__<hhmmss>
The runName
and regionName
strings are user-defined; the other components of the directory names are auto-generated. <yyyymmdd>
is the start date and <hhmmss>
is the start time (in UTC). The separators between the strings in the directory name are two underscores. Spaces in runName
and regionName
will be replaced by an underscore (_
) in the output directory name.
The remaining sections describe the Xenium output bundle files for each analysis run.
The experiment.xenium
is an experiment manifest file in JSON format that includes experiment metadata and relative file paths to other data files in the output folder needed by Xenium Explorer to visualize results.
Field | Description |
---|---|
major_version | Indicates major version of analysis output file formats read by Xenium Explorer |
minor_version | Indicates minor version of analysis output file formats read by Xenium Explorer |
patch_version | Patch version of analysis output file formats read by Xenium Explorer |
run_name | User-specified Run Name entered on instrument |
run_start_time | Instrument run start time |
region_name | User-specified name for region selected on instrument |
preservation_method | User-specified sample preservation method |
num_cells | Cells detected by Xenium Onboard Analysis pipeline |
transcripts_per_cell | Median transcripts per cell calculated by Xenium Onboard Analysis pipeline |
transcripts_per_100um | Transcripts per 100 µm2 calculated by Xenium Onboard Analysis pipeline |
cassette_name | User-specified Xenium Cassette Name entered on instrument |
slide_id | User-specified Xenium Slide ID entered on instrument |
panel_design_id | Panel design ID specified by panel selection on instrument (additionally panel_predesigned_id is included for add-on custom panel designs) |
panel_name | Panel name specified by panel selection on instrument |
panel_organism | Sample organism specified by selected gene panel |
panel_tissue_type | User-specified tissue type selected on instrument |
panel_num_targets_predesigned | Number of gene targets from the pre-designed gene panel |
panel_num_targets_custom | Number of gene targets from add-on custom panel if included in panel design |
pixel_size | Pixel size in the morphology.ome.tif image file (in µm) |
instrument_sn | Xenium Analyzer instrument serial number |
instrument_sw_version | Version of the Xenium Analyzer firmware used during analysis run |
analysis_sw_version | Version of Xenium Onboard Analysis pipeline used to analyze data |
analysis_uuid | Instrument metadata |
experiment_uuid | Instrument metadata |
cassette_uuid | Instrument metadata |
roi_uuid | Instrument metadata |
z_step_size | Z-step size (in µm) used for subsampling the morphology.ome.tif image Z-stacks |
well_uuid | Instrument metadata |
calibration_uuid | Instrument metadata |
segmentation_stain | Specifies the stain method selected on instrument for cell segmentation ("Xenium Multi-Tissue Stain" or "Nuclei (DAPI)"). |
images | Specifies the file paths to the morphology image files; used by Xenium Explorer to find input files |
xenium_explorer_files | Specifies the file paths to transcript, cell, secondary analysis, and analysis summary files; used by Xenium Explorer to find input files |
xenium_ranger | If the data was reanalyzed with Xenium Ranger, this section specifies the run_id , Xenium Ranger version, and commands used to analyze the data. |
The Xenium onboard analysis pipeline outputs an interactive HTML file named analysis_summary.html
. Open it on-instrument, in a web browser, or in Xenium Explorer. It contains summary metrics and automated secondary analysis results. Any alerts issued by the pipeline are displayed at the top of the page.
There are four clickable tabs that capture different information:
- The Summary tab contains summary metrics, images, and experiment information for a quick overview of the data.
- The Decoding tab contains more specific transcript decoding metrics.
- The Cell Segmentation tab shows the metrics for cell segmentation and partitioning transcripts into single cells.
- The Analysis tab captures the results from the pipeline's secondary analysis run on single cell data.
- The Image QC tab contains two galleries. One for downsampled morphology stain images and a second for the RNA images for each cycle and channel.
Click the ?
at the top of each dashboard for more information about each metric. For detailed descriptions and guidance on interpretation, see the Overview of the Xenium Analysis Summary documentation.
A series of tissue morphology images are output by the pipeline, which are either nuclei-stained (DAPI) or nuclei and multi-tissue stained (DAPI, cell boundary, interior stains) images in OME-TIFF format. These files include a pyramid of resolutions and tiled chunks of image data, which allows for efficient interactive image visualization (JPEG-2000 compression, 16-bit grayscale, full and downsampled resolutions down to 256 x 256 pixels, learn more here). All morphology image files can be read by Xenium Explorer.
- The
morphology.ome.tif
is a 3D Z-stack of the DAPI image that can be useful to resegment cells, assess segmentation quality, and view data. DAPI image processing is described here. - The
morphology_focus/
directory contains the 2D autofocus projection images for the nuclei DAPI stain image, as well as three additional stain images for Xenium outputs generated with the multimodal cell segmentation assay workflow. These files are in multi-file OME-TIFF format. They each contain a pyramid of images including full resolution and downsampled images. The image order is specified in OME-XML metadata.morphology_focus_0000.ome.tif
: DAPI imagemorphology_focus_0001.ome.tif
: boundary (ATP1A1/E-Cadherin/CD45) imagemorphology_focus_0002.ome.tif
: interior - RNA (18S) imagemorphology_focus_0003.ome.tif
: interior - protein (alphaSMA/Vimentin) image
The multi-file OME-TIFF images can be viewed in community-developed visualization software such as QuPath, Napari, or Fiji/ImageJ. To view all four focus images in these programs, simply open one of the files (i.e., open program, drag and drop morphology_focus_0000.ome.tif
file). Since each focus image file's metadata specifies that all the focus files are stored in the morphology_focus/
directory, they do not need to be imported separately. If needed, these files can also be converted to a single stack OME-TIFF format, for example by following these QuPath and Tifffile file conversion instructions.
morphology_focus
directory contains all four images (i.e., QuPath), while others will open with fewer than four (i.e., Napari, missing images display as blanks).A commonly used python package, Tifffile, can also be used to view these files. However, it does not open downsampled multi-file images by default. Downsampled images may be useful for converting to other image file types or for viewing the image if the full resolution is too large. The code snippet below illustrates how to open either full or downsampled resolution images.
To open all of the multi-tissue stain images together, all four morphology_focus_xxxx.ome.tif
files must be present in the morphology_focus/
directory.
# Import Python libraries # This code uses python v3.12.0, tifffile v2023.9.26, matplotlib v3.8.2 import tifffile import matplotlib.pyplot as plt # Option 1: Load full resolution image channels # The following may produce a warning: 'OME series cannot read multi-file pyramids'. This is because tifffile does not support loading a pyramidal multi-file OME-TIFF file. Only the full resolution (level=0) data will load for all channels in the directory. fullres_multich_img = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=True, level=0, aszarr=False) # Examine shape of array (number of channels, height, width), e.g. (4, 40867, 31318) fullres_multich_img.shape # Extract number of channels, e.g. 4 n_ch = fullres_multich_img.shape[0] # Plot each channel fig, axes = plt.subplots(ncols=n_ch, nrows=1, squeeze=False) for i in range(n_ch): axes[0, i].imshow(fullres_multich_img[i], cmap="gray") axes[0, i].set_title(f"Channel: {i}") plt.savefig('tifffile_fullres_four_channels.png') # Option 2: Load a single channel image at any resolution, e.g., level=0 or level=1. Note 'is_ome' is set to False. # Load one of the multi-file OME-TIFF files as a regular TIFF file at full resolution. fullres_img_tiff = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=False, level=0) # Now load the file at downsampled resolution downsampled_img = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=False, level=1) # Plot the full resolution and downsampled images side-by-side fig, axes = plt.subplots(ncols=2, nrows=1, squeeze=False) axes[0, 0].imshow(fullres_img_tiff, cmap="gray") axes[0, 0].set_title(f"Full resolution: {fullres_img_tiff.shape}") axes[0, 1].imshow(downsampled_img, cmap="gray") axes[0, 1].set_title(f"Downsampled: {downsampled_img.shape}") plt.savefig('example_fullres_downsample.png')
The cell summary file (cells.csv.gz
) in gzipped CSV format contains data to help QC the transcript counts for each identified cell. The file contains one row for each cell, with the following columns:
Column Name | Description |
---|---|
cell_id | Unique ID of the cell, consisting of a cell prefix and dataset suffix |
x_centroid | X location of the cell centroid in µm |
y_centroid | Y location of the cell centroid in µm |
transcript_counts | Molecule count of gene features with Q-Score ≥ 20 |
control_probe_counts | Molecule count of negative control probes |
control_codeword_counts | Count of negative control codewords |
unassigned_codeword_counts | Count of unassigned codewords |
deprecated_codeword_counts | Count of deprecated codewords |
total_counts | Sum total of transcript_counts , control_probe_counts , control_codeword_counts , and unassigned_codeword_counts |
cell_area | The two-dimensional area covered by the cell in µm2 |
nucleus_area | The two-dimensional area covered by the nucleus in µm2 |
The cell summary is also provided in Parquet format (cells.parquet
) to enable faster loading and reading of data.
Nucleus boundaries are determined by a nucleus segmentation algorithm that runs on the nuclei-stained (DAPI) morphology image. Cell boundaries are either determined by 1) expanding the nucleus boundaries 5 µm or until the expanded boundary hits another cell, or 2) using boundary and interior cell stains.
The cells.zarr.zip
file in zipped Zarr format contains segmentation masks and boundaries for nuclei and cells. These segmentation masks are used for assigning transcripts to cells. The boundary polygons are approximations of the segmentation masks, and are provided for efficient visualization of cell segmentation in Xenium Explorer and other analysis software. See Overview of Xenium Zarr Output Files for file specifications.
The nucleus_boundaries.csv.gz
and cell_boundaries.csv.gz
are the CSV representation of the nucleus and cell boundaries, respectively. Each row represents a vertex in the boundary polygon of one cell. The boundary points for each cell appear in clockwise order, and the first and the last points are duplicates to indicate a closed polygon. Both files contain the following columns:
Column Name | Description |
---|---|
cell_id | Unique ID of the cell, consisting of a cell prefix and dataset suffix |
vertex_x | X-coordinate of the boundary point in µm |
vertex_y | Y-coordinate of the boundary point in µm |
label_id | A nonzero number corresponding to the segmentation mask pixel value (added in XOA v2.0) |
The same nucleus and cell boundary information is also provided in Parquet format (nucleus_boundaries.parquet
and cell_boundaries.parquet
) to enable faster loading and reading of data.
The transcripts file (transcripts.csv.gz
) in gzipped CSV format contains data to evaluate transcript quality and localization. The file contains one row for each decoded transcript, with the following columns:
Column Name | Description |
---|---|
transcript_id | Unique ID of the transcript |
cell_id | Unique ID of the cell, consisting of a cell prefix and dataset suffix |
overlaps_nucleus | Binary value to indicate if the transcript falls within the segmented nucleus of the cell (1) or not (0) |
feature_name | Gene or control name |
x_location | X location in µm |
y_location | Y location in µm |
z_location | Z location in µm |
qv | Phred-scaled quality value (Q-Score) estimating the probability of incorrect call |
fov_name | Name of the field of view (FOV) that the transcript falls within |
nucleus_distance | The distance between the transcript and the nearest nucleus boundary in µm based on segmentation mask boundaries. The nearest nucleus may not necessarily belong to the cell that the transcript is assigned to. Transcripts localized within the nucleus have a distance of 0.0 µm. |
codeword_index | An integer index for each codeword used to decode transcripts (same value as codewords in the gene_panel.json file). |
Transcript information is also provided in:
- Parquet format (
transcripts.parquet
) to enable faster loading and reading of data. - Zipped Zarr format (
transcripts.zarr.zip
). This file can be read by Xenium Explorer. See Overview of Xenium Zarr Output Files for file specifications.
The Xenium onboard analysis pipeline outputs a cell-feature matrix (cell_feature_matrix
) in three file formats: the Market Exchange Format (MEX), the Hierarchical Data Format (HDF5), and the Zarr format. The matrices only include transcripts that pass the default quality value (Q-Score) threshold of Q20.
Each matrix in the cell_feature_matrix/
folder is stored in the MEX format for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. The cell_feature_matrix/features.tsv.gz
file contains a list of pre-designed panel genes (and any custom add-on genes), negative controls, unassigned codewords, and deprecated codewords (learn more about control and codeword categories on the Algorithms page).
Column Number | Description |
---|---|
1 | Ensembl ID for panel and add-on genes |
2 | Gene name for panel and add-on genes |
3 | Feature type (Gene Expression , Negative Control Codeword , Negative Control Probe , Unassigned Codeword , Deprecated Codeword ). |
The cell-feature matrix is also provided in:
- HDF5 format (
cell_feature_matrix.h5
), a binary format that compresses and accesses data more efficiently than text formats such as MEX and is useful when analyzing large datasets. H5 files are supported in both R and Python. - Zipped Zarr format (
cell_feature_matrix.zarr.zip
). This file can be read by Xenium Explorer. See Overview of Xenium Zarr Output Files for file specifications.
The Xenium onboard analysis pipeline outputs key metrics in text format as metrics_summary.csv
. This file contains metrics that are useful for assessing decoding and cell segmentation quality.
The Xenium onboard analysis pipeline outputs an analysis/
directory with subdirectories containing several CSV files, which store the automated secondary analysis results. A subset of these results is used to render the Analysis tab in the Analysis summary file. The subdirectories correspond to:
- Clustering (
clustering/
) with graph-based and K-means results. Graph-based clustering (undergraphclust
) is run once as it does not require a pre-specified number of clusters. K-means (underkmeans
) is run for K=2..N where K corresponds to the number clusters, and N=10 by default. Each value of K has its own results directory. - Differential Expression (
diffexp/
) with graph-based and K-means results. Under each of the subdirectories are thedifferential_expression.csv
files, which contain the list of cluster-specific features that are differentially expressed in each cluster relative to all the other clusters. - Principal Component Analysis (
pca/
) which contains a total of five files listing the features used in the dimension reduction i.e., to reduce the feature space. These results are used to perform clustering. - UMAP (
umap/
) contains the Uniform Manifold Approximation and Projection results.
The secondary analysis results are also saved as a zipped Zarr file (analysis.zarr.zip
), which can be read by Xenium Explorer for data visualization. See Overview of Xenium Zarr Output Files for file specifications.
The gene_panel.json
file is a copy of the gene panel file used in the experiment on the Xenium Analyzer instrument.
- The JSON files and additional resources for 10x pre-designed panels are provided on the Pre-designed Xenium Gene Expression Panels page.
- For custom panels, the JSON file can be downloaded from the Xenium Panel Designer tool after the design is finalized. See Getting started with Xenium Panel Design for guidance.
The JSON schema contains metadata
and payload
objects. The payload
object contains the following:
Object | Description |
---|---|
chemistry | Version of Xenium In Situ assay chemistry (i.e., "v1"). |
customer | Customer contact information derived from design or 10x cloud if the Xenium Panel Designer was used. |
designer | When and who created the design. |
panel | Information about the panel design, including name, ID, and total number of targets. |
spec_version | Version of panel JSON file format. |
targets | Information about each target (gene, control) in the panel, including gene identifiers and gene coverage (also referred to as the number of probe sets). The latter may be useful for assessing per-gene sensitivity. |
Python or other tools can be used to parse the JSON file. Here is example Python code to extract gene name, Ensembl ID, and gene coverage information for each gene target in a given panel:
# Import Python libraries # Example with Python v3.12, pandas v2.1.1 import json import pandas as pd # Open JSON file f = open('gene_panel.json') # Edit file name here # Return JSON object as a dictionary data = json.load(f) # Create lists to store extracted information gene = [] ensembl = [] cov = [] # Iterate through the JSON list to extract information for i in data['payload']['targets']: if (i['type']['descriptor'] == "gene"): # Only collect info for genes, not controls gene_name = i['type']['data']['name'] ensembl_id = i['type']['data']['id'] coverage = str(i['info']['gene_coverage']) gene.append(gene_name) ensembl.append(ensembl_id) cov.append(coverage) # Create output CSV file out_df = pd.DataFrame(list(zip(gene, ensembl, cov)), columns=['Gene name', 'Ensembl ID', 'Gene coverage']) out_df.to_csv('my_panel_gene_info.csv', index=False) # Close file f.close()
The following are provided in aux_outputs/
(see release notes for updates):
- The
morphology_fov_locations.json
file contains the field of view (FOV) name, height, width, and XY positions in the space of the region of interest's (ROI) morphology image. This is the same space used to compute transcript and cell locations and the units are in microns. The FOVs have 3,520 rows and 2,960 columns with 128 pixels of overlap on each edge (this may change in future versions of the Xenium platform). The position information is useful for determining where FOV boundaries are to assess transcript deduplication and any FOV edge effects. - The
overview_scan_fov_locations.json
file contains the FOV name, height, width, and approximate XY positions in the space of the overview scan image. This is the space that contains all the ROIs and the units are in pixels. The accuracy of the ROI coordinates have a 5 - 10 µm error. This position information is useful for approximating where multiple ROIs are located on an overview scan image. - The
per_cycle_channel_images/
directory contains downsampled 2D RNA images (maximum intensity projection) from each cycle and channel (not the morphology stain images). These images may be helpful for troubleshooting analysis summary alerts or unexpected metrics and analysis results. - The
overview_scan.png
is the full-resolution (1672 x 3498 pixels) image of the entire sample on the slide.