Support homeXenium Panel DesignerTutorials
Single Cell Reference Data and Cell Type Annotation Format

Single Cell Reference Data and Cell Type Annotation Format

There are three options to specify up to five reference datasets in the Xenium Panel Designer:

  1. Select from a collection of publicly available reference datasets with similar tissue type and condition to the samples you plan to run from our provided list of pre-built references.
  2. Upload your own annotated single cell gene expression data in HDF5, MEX, or CLOUPE format. See Creating Single Cell References for Xenium Custom Panel Design from Seurat or AnnData for additional guidance on reference file format conversion.
  3. Choose a combination of public reference datasets and your own single cell reference datasets.

The design tool uses single cell data to create a model of the expression profiles of the cell types present in your samples. It then uses this model to evaluate the risk of optical crowding, check for highly expressed genes, and assign codewords to the genes you selected in an optimal fashion. Here are important considerations for panel design and sample preparation that can affect detection budget:

  • Mismatched panels: It is important that panels are designed and used with tissues that are representative of the reference data included in the panel design. More than one reference can be used in custom panel designs to represent the diversity of samples that may be studied using a single panel. Detection budget may be exceeded if a panel designed for one tissue is used on another with relatively higher expression in certain genes or cell types (i.e., healthy vs. tumor tissue). This affect may be prevented by including a tumor reference in panel design.
  • High utilization panels: Panels that are designed with very high utilization are more likely to exceed the detection budget described in the Xenium Add-on Panel Design Tech Note. In addition, panels that are designed using reference data that are not representative of the samples being analyzed may be designed with higher than calculated utilization. It is important to review recommendations during custom panel design to avoid panels with utilization over the detection budget.
  • Abnormally large tissue thicknesses: We recommend a section thickness of 5 µm for FFPE and 10 µm for fresh frozen as described in Xenium In Situ for FFPE - Tissue Preparation Guide and Xenium In Situ for Fresh Frozen Tissues - Tissue Preparation Guide. Sections that are thicker than this will be more likely to exceed the detection budget.

The single cell reference must be accompanied by cell type annotations for the barcodes. In the design process, the expression levels are aggregated across each cell type. This information is used to assign codewords that minimize optical crowding, as well as ensure that cell type clusters match the broad, expected categories. See general guidelines for panel gene selection in the Xenium Add-On Panel Design Technical Note.

Single cell data can come from Chromium Single Cell Gene Expression or Single Cell Gene Expression Flex assays. If the single cell data comes from Flex, it is important to note that this product does not include genes that are highly and ubiquitously expressed such as mitochondrial genes, ribosomal genes, and HLA class 1 genes.

We strongly discourage the inclusion of those genes on Xenium custom panels as well. They take up a large portion of the available optical budget and increase the risk of optical crowding. However, including a small number of genes not present in Flex data generally poses a low risk to assay performance. During the custom panel design process, those genes will be assigned an averaged expression level for the purposes of the utilization analysis.

The design tool needs a gene list and a measure of expected gene expression in the sample stratified by cell type.

If providing your own reference data, the Xenium Panel Designer will accept one of the formats described below.

One or more unnormalized whole transcriptome filtered feature-barcode matrices with cell type annotations for each matrix. The matrix and annotation files should be bundled as a .zip, .tar, or tar.gz file (one matrix + one annotation file per bundle).

  • The feature-barcode matrix can be in either Cell Ranger HDF5 or Matrix Exchange (MEX) format. The HDF5 matrix is a single file, while the MEX format is a folder containing three files (matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz).

  • The cell type annotations file can be in CSV or TSV format. It is a two-column file and headers are required. The first column must be "barcode". For example:

    barcode,annotation ATGCATTGCGTAAGTG-1,fibroblast TTGCAAAGCCGAAGTG-1,fibroblast CATCATTGCGTAATTG-1,T cell ...
    Important
    It is critical that barcode suffixes and prefixes in the annotations file exactly match those for barcodes in the matrix file.

If looking for rare cell types, providing matrix files for multiple samples may yield better results. We recommend providing a matrix file per sample; it does not need to be aggregated. If multiple matrices are provided, the cell type information across all of the matrices will be evaluated.

Important
It is very important that this matrix is not normalized or gene-filtered. Normalizing/filtering limits our ability to assess the impacts of optical crowding. If the matrix contains a subset of the total gene count data, the representation per gene will be skewed.

A single uncompressed CLOUPE file generated by Cell Ranger. The Xenium Panel Designer uses the graph-based clustering results for cell annotations, so no additional annotation file is needed for this input format.

An error message will be shown for these input file issues:

  • Gene IDs and/or gene symbols do not match between the matrix, gene list, and the 2020-A reference.
  • Gene names contain spaces/blanks in the gene name or have typos.
  • Files have missing column headers or headers with unexpected or misspelled names.
  • Matrix and annotation CSV files do not have exactly the same barcodes. This is often seen when barcodes in the annotation file have an extra sample suffix after aggregation, but the matrix itself does not.

Common input file issues that do not halt panel design but give poor results:

  • The design tool will not error with normalized counts data, but results will be skewed. The design tool should be used with integer counts data.
  • The design tool will not error with matrix files that filter many genes, but results will be skewed and consequently generate a suboptimal design.
  • Matrix files that are missing genes in the gene list.
  • Poorly matched expression data.
  • Annotation CSV files where the first two columns are not "barcode,annotation". If hierarchical annotations are present in additional columns, they are ignored.

You can use publicly available data if you do not have an annotated single cell RNA-seq or want to provide a combination of public reference datasets with your own single cell reference datasets.

The Xenium Panel Designer provides a variety of curated reference datasets from sources such as CELLxGENE and GEO for both human and mouse tissues and a variety of conditions:

Table last updated on May 28, 2024

SpeciesTissueReferences for these conditions
HumanBrainAlzheimer's; Glioblastoma; Non-diseased
HumanBreastCancer (ER+, HER2+, or triple-negative); Non-diseased
HumanColonColorectal cancer; Non-diseased
HumanHeartNon-diseased
HumanKidneyChronic kidney disease; Acute kidney injury; Clear cell carcinoma; Non-diseased
HumanLiverHepatocellular carcinoma; Non-diseased
HumanLungNon-small cell lung cancer (NSCLC); SARS-CoV-2 infection; Small cell lung carcinoma; Non-diseased
HumanLymph NodeNon-diseased
HumanMultipleNon-diseased
HumanOvaryOvarian cancer
HumanPancreasDuctal adenocarcinoma; Non-diseased
HumanRetinaNon-diseased
HumanSkinNon-diseased
MouseBrainNon-diseased
MouseHeartNon-diseased
MouseKidneyNon-diseased
MouseLungNon-diseased
MouseMultipleNon-diseased
MouseRetinaNon-diseased