Support homeXenium Onboard AnalysisAnalysis
Archiving Xenium Data

Archiving Xenium Data

The Xenium platform aims to support and embrace principles of data findability, accessibility, interoperability, and reusability (FAIR) so that it is easy to share newly generated Xenium data for collaborative analysis and reproduce findings from published Xenium data.

What Xenium output should I keep for archival storage for reanalysis and grant funding requirements?

We recommend archiving Xenium raw data outputs, which consist of:

  1. Decoded transcripts with assigned Phred-scaled Q-Scores
  2. High-resolution morphology images

Decoded transcripts are provided in .zarr and .parquet format. Morphology images are provided in ome.tif format. These data should be archived to fulfill grant funding requirements and for reanalysis, and may be submitted to repositories such as GEO. All other Xenium outputs are derived from these raw data in Xenium Onboard Analysis, can be rederived after a Xenium instrument run, and are not strictly necessary for long-term archival and reproducibility.

Additional detail on Xenium raw data output:

  • A Xenium Q-Score indicates the probability that the detected object exists and was correctly identified by the decoding algorithm. All decoded transcript Q-Scores are output in the transcripts files. The cells and cell-feature matrix output files in the Xenium output bundle are filtered to Q-Score ≥ 20. For more details, see our Overview of Xenium Algorithms support page.
  • Xenium morphology images will always be provided at the same resolution that our onboard segmentation algorithm uses as input. This ensures that you can benefit from improvements to our segmentation model as we add to its training over time, or run your own segmentation if you choose. Our off-instrument reanalysis package, Xenium Ranger, enables you to easily rerun segmentation or import your own segmentation results to generate derived outputs (e.g., cell-feature matrix) and view them in Xenium Explorer.
  • We will stand by these FAIR principles with future capabilities. High-resolution morphology images will continue to be included in the Xenium output bundle for our onboard multimodal segmentation method.
  • Other outputs from Xenium Onboard Analysis (XOA) are derived data from these raw outputs, and the community can recapitulate them from Xenium raw data.

Xenium raw data reduces low-level internal sensor data as described at Overview of Xenium Algorithms. It preserves details needed to assess decoded transcript quality, abstracting away low-level details of the instrumentation and assay that require calibration and specialized methods that will change over time as the platform improves and gains new capabilities.

On-instrument processing of Xenium internal sensor data — i.e., the 3D per-pixel values that Xenium Analyzer’s internal image sensor captures across multiple FOVs, multiple fluorescence channels, and multiple cycles of chemistry and imaging processing — is closely tied to Xenium optics. Consequently, Xenium internal sensor data cannot be reanalyzed after processing with Xenium Onboard Analysis.

Internal sensor data is not practically useful for reanalysis or storage (~tens of terabytes of data per sample). In the spirit of scientific reproducibility, it is more useful to store the Xenium decoded transcripts with assigned Phred-scaled Q-Scores and morphology images (typical output directory sizes) for reanalysis.

To add further transparency and to supplement existing methods to QC Xenium data, downsampled RNA diagnostic images are available in the Xenium auxiliary output directory in Xenium Onboard Analysis v1.6 and later. In XOA v1.7 and later, these images are also available in the Analysis Summary. These images are not needed for raw data archival, but should be useful in gaining confidence in the robustness of Xenium's decoding algorithm.

Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more.

The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.

To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.

The tables below show estimated output directory sizes (GB) as a function of tissue area (cm2) and transcript density (transcripts per µm2), assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:

  • 0.72 cm2 tissue area
  • 11 Z-slices
  • 162k cells
  • 62.4M transcripts
  • 0.25 cells per 100 µm2
  • 107 transcripts > Q20 per 100 µm2
  • 80% of transcripts > Q20

Estimates are based on data generated with the cell segmentation staining workflow and multimodal cell segmentation.

Xenium v1 estimated output directory size (GB) with XOA v3.0:

Sample sourceTissue area (cm2)Transcript density: 0.5 transcripts/µm2Transcript density: 1 transcripts/µm2
Estimated directory size (Total transcripts)Estimated directory size (Total transcripts)
Core needle biopsy0.010.3 GB (500k)0.3 GB (1M)
Coronal mouse brain hemisphere0.514 GB (25M)15 GB (50M)
Full coronal mouse brain129 GB (50M)31 GB (100M)
Tissue section covering entire sample area2.3567 GB (117M)72 GB (235M)

Xenium Prime estimated output directory size (GB) with XOA v3.0:

Sample sourceTissue area (cm2)Transcript density: 1.5 transcripts/µm2Transcript density: 3 transcripts/µm2Transcript density: 12 transcripts/µm2
Estimated directory size (Total transcripts)Estimated directory size (Total transcripts)Estimated directory size (Total transcripts)
Core needle biopsy0.010.3 GB (1.5M)0.4 GB (3M)0.7 GB (12M)
Coronal mouse brain hemisphere0.516 GB (75M)19 GB (150M)35 GB (600M)
Full coronal mouse brain133 GB (150M)39 GB (300M)70 GB (1.2B)
Tissue section covering entire sample area2.3577 GB (352M)91 GB (705M)165 GB (2.8B)

The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:

DatasetChemistryTissue area (cm2)Total transcripts (MM)Output directory size (GB)
Mouse brain tiny subsetXenium v1~0.1793.5
Mouse brain full coronal sectionXenium v10.663413.0
FFPE human breast, Tissue 1Xenium v10.906824.4
FFPE human breast using the entire sample area, Replicate 1Xenium v12.2810651.9
FFPE human ovarian cancerXenium Prime0.8612026.7
FF human ovaryXenium Prime1.982,164144