Cell Annotation Algorithm (Beta) - Official 10x Genomics Support

The annotation model was co-developed by 10x Genomics and the Cellarium AI Lab at the Data Sciences Platform of the Broad Institute.

Cloud Analysis introduces a new pipeline for cell type annotation, which can be applied to standard Cell Ranger count and multi outputs to generate accurate cell type labels. This method assigns cell types by comparing each cell's gene expression profile to annotated reference datasets, rather than relying on known marker genes for each cell type or tissue-specific references. Please note that the cell annotation pipeline is a beta feature.

Specifically, each cell barcode's gene expression profile is compared to a model built on the Chan Zuckerberg CELL by GENE (CZ CELLxGENE) census, identifying the most similar cell types. A consensus label is then assigned to each barcode, with the results summarized in the web_summary.html. These labels can be viewed in Loupe Browser or accessed via the cell_types.csv output file.

The algorithm generates an embedding for each cell barcode by first applying principal component analysis (PCA) to the reference dataset, extracting the top 512 components for each reference cell. The gene expression profile of each cell barcode being analyzed is transformed into the same 512-dimensional (512-D) embedding. To classify a cell, the algorithm performs an approximate nearest-neighbor (ANN) search, identifying the 500 most similar cells in the reference set based on these embeddings. The most common cell type among these nearest neighbors is then assigned to the query cell.

This figure shows the gene expression profile of a single 10x Barcode (shown in red), transformed into a 512-D embedding. The approximate nearest neighbors (primarily yellow cells) of the 10x Barcode are shown within the grey circle.

Cell type is a term from Cell Ontology, which CZ CELLxGENE uses to annotate all datasets. The reference datasets can vary in the granularity of annotations—some experts may assign highly specific terms like "CD8-positive, CD25-positive, alpha-beta regulatory T cell," while others might use broader classifications such as "T cell."

Our goal is to help users identify high-level cell types (e.g., T cells, B cells). To achieve this, the algorithm maps specific terms from the Cell Ontology to select high-level cell types. These broader categories are displayed in both the web_summary.html and the .cloupe file. The selected groupings are illustrated in the figure below:

We aim to support users seeking high-level cell type groupings. Coarse and fine cell type annotations are available in the cell_types.csv file, which offers the option to refine classifications further.