Support homeCell Ranger
Glossary of Terms Relevant to Single Cell Products

Glossary of Terms Relevant to Single Cell Products

  • Approximate Nearest Neighbor look-up: A computational method that identifies a data point in a dataset that is near the given query point, though not necessarily the exact closest one. This approach is used to find cells with the most similar gene expression profiles to a target cell.

  • Barcode whitelist: The list of all known barcode sequences that have been included in the assay kit and are available during library preparation (learn more here).

  • Barcode collision: A scenario where two or more individual cells are mistakenly identified as a single cell due to sharing the same barcode.

  • Cell Annotation: The process of assigning labels to cells based on their gene expression profiles, allowing researchers to identify specific cell types, states, or other characteristics.

  • Cell Barcode: The barcode associated with reads that are in cells.

  • Cell Ontology: A structured vocabulary that defines cell types and their relationships, which can be visualized as a tree. In this tree, broader cell types are represented as parent nodes, while more specific cell types appear as child nodes. The Cell Ontology is used to systematically describe and organize knowledge about cells, ensuring clarity for both human researchers and computational systems. 10x Genomics' automated cell type annotations are based on the EMBL-EBI Cell Ontology database.

  • Coarse annotation: High-level categorization of cells into broad cell types, such as "T cells" or "B cells." Coarse annotations provide a general overview of cell populations without going into specific subtypes.

  • Fine(-grained) annotation: A primary classification of cells into specific subtypes which come directly from the annotations in the model. Fine-grained annotations can provide a deeper understanding of cellular heterogeneity by identifying precise differences between closely related cell types but at the cost of exact subtype correctness.

  • GEM: Gel Beads-in-emulsion, an emulsion that contains a mixture of biochemistry reagents (uniquely barcoded gel beads) and zero, one, or more suspended cells/nuclei.

  • GEM well (formerly GEM group): A set of partitioned cells (Gel Beads-in-emulsion) from a single 10x Genomics Chromium™ Chip channel. One or more sequencing libraries can be derived from a GEM well.

  • GEM well suffix: When combining libraries made from different GEM wells into one analysis, we append in silico an integer to the barcode of each read that identifies which library that read came from. This prevents barcode collisions, which otherwise create confusion in the form of virtual doublets.

  • HT (or High Throughput): The Chromium Next GEM Single Cell 3' HT v3.1 kit is a high throughput, cost-effective solution for profiling gene expression at the single cell level for 2,000-20,000 cells per channel or 2,000-60,000 cells per channel with 3' Cell Multiplexing. In combination with Feature Barcode technology, the assay also enables simultaneous cell surface protein detection or CRISPR profiling in single cells.

  • Library (or Sequencing Library): A 10x-barcoded sequencing library prepared from a single GEM well. With Feature Barcode or V(D)J assays, it is possible to create multiple libraries from the same GEM well. The library types may include Gene Expression, Antibody Capture, CRISPR Guide Capture, TCR-enrichment, etc.

  • Multi config CSV: A configuration file in CSV format that specifies all the parameters required to analyze CMO, Antigen Capture/BEAM or Flex libraries using the cellranger multi pipeline. The multi pipeline can be used to process any combination of GEX + VDJ + Feature Barcode libraries.

  • Multiplet: A cell-associated barcode containing multiple cells.

  • Non-Cell Barcode: The barcode associated with reads that is outside cells (compared to "cell barcodes").

  • OCM Barcodes Four different barcode sequences used in on-chip multiplexing.

  • Sample: A cell suspension extracted from a single biological source (blood, tissue, etc).

  • Sequencing Run (or Flow cell): A flow cell containing data from one sequencing instrument run. The sequencing data can be further demultiplexed by lane or by sample indices.

  • UMI (Unique Molecular Identifier): Each first-strand cDNA synthesis from a transcript molecule incorporates a random 12bp (for 3' Single Cell) or 10bp (for 5' Immune Profiling) nucleotide sequence next to the cell barcode called the UMI. The UMI sequence in each read allows the pipeline to determine which reads came from the same transcript molecule. In other words, the cell barcode distinguishes between cells, and the UMI distinguishes between molecules (for example, RNA fragments) within a cell.

  • LT (or Low Throughput): The Chromium Next GEM Single Cell 3’ LT v3.1 kit is a low throughput, cost-effective solution for smaller-scale and pilot studies for profiling whole transcriptome at the single cell level for 100 - 1,000 cells per sample. In combination with Feature Barcode technology (Antibody Capture), the assay also enables simultaneous cell surface protein detection in single cells.
  • Cell Surface Protein (CSP): A protein that is localized to the cell membrane, typically containing extracellular domains. These proteins can be quantified with Feature Barcodes such as TotalSeq™ antibody-oligonucleotide conjugates.

  • Count Matrix (or Feature-Barcode Matrix): Formerly known as the Gene-Barcode Matrix. A matrix of counts representing the number of unique observations of each feature within each cell barcode. Genes defined by the transcriptome reference and Feature Barcodes defined in the Feature Reference appear as rows in the matrix. Each barcode is a column of the matrix.

  • CRISPRa (or CRISPR activation): Similar to CRISPRi, but uses a Cas9 fused to an activating domain to promote expression of target gene instead of repressing it.

  • CRISPR Guide RNA: See sgRNA.

  • CRISPRi (or CRISPR Interference): A method for measuring the impact of perturbations to gene expression levels. sgRNAs with protospacers targeting a gene of interest are used with a non-cutting Cas9 that is fused to a repressive domain. This represses the expression of the selected gene.

  • CROP-Seq: An assay scheme for pooled CRISPRi and CRISPRa experiments with single cell Gene Expression readout. See Datlinger et al., Nature Methods 2017

  • Dextramer: Refers to a Feature Barcode reagent consisting of multiple copies of a peptide-MHC (p-MHC) complex conjugated to a Dextran backbone, coupled to a DNA oligonucleotide carrying a Feature Barcode that identifies the peptide-MHC complex. The p-MHC complex is the antigen of a T-Cell Receptor. Dextramers compatible with 10x Genomics Feature Barcode Technology are supplied by Immudex.

  • Feature: A unique type of countable molecule. Can refer to a gene, a barcoded antibody, a CRISPR Guide RNA or another barcoded reagent. Each feature is either a gene declared in the transcriptome reference or a feature barcode declared in the feature reference file. Corresponds to a row in the Count Matrix.

  • Feature Barcode: The subsequence of a Feature Barcode read that uniquely identifies the identity of the Feature Barcode reagent.

  • Feature Barcode Antibody (or Antibody): Refers to a Feature Barcode reagent consisting of an antibody with high affinity to a known Cell Surface Protein coupled to a Feature Barcode oligonucleotide that identifies the antibody. These reagents are used to quantify the expression of cell surface proteins. For example, the TotalSeq™-B product line is a family of Feature Barcode antibodies that are compatible with the Single Cell 3' v3 solution.

  • Feature Reference: A CSV file declaring the name, read layout, and barcode sequence of the all the Feature Barcode reagents in use in an experiment. A Feature Reference CSV must be provided to cellranger count when using Feature Barcode Technology. See the Feature Reference Documentation for details.

  • Guide RNA (or sgRNA, or Single Guide RNA): The Guide RNA, along with a Cas9 enzyme form the CRISPR system. The protospacer region of the Guide RNA recognizes a particular sequence in the genome.

  • p-MHC (or Peptide-MHC): An antigen-presenting MHC gene, bound to a displayed peptide. These complexes are recognized by T cell receptors in the adaptive immune system. Dextramers are Feature Barcode capable p-MHC reagent technology.

  • Perturb-Seq: The original demonstration of a pooled CRISPRi assay with a single-cell Gene Expression readout, using barcodes to identify which CRISPRi perturbations were present in each cell. See Dixit et al., Cell 2016.

  • Cell Multiplexing: The labeling of a given cell (or nuclei) sample with a molecular tag and subsequently mixing this sample with other labeled samples. Introduced in Cell Ranger 6.0.

  • Cell Multiplexing Oligo (CMO): A specific type of feature barcode used to tag cells prior to pooling in a single GEM well.

  • Multiplet: A cell-associated barcode containing more than one cell. Multiplets that are assigned more than 1 CMO are detected and filtered out.

  • Physical Library: A sequencing library produced from a single GEM well.

  • Singlet: A cell-associated barcode assigned exactly one CMO. Only these are assigned to samples.

  • 10x GEM Barcode: The barcode associated with the 10x Genomics gel bead. This barcode identifies single cells.

  • Probe Barcode: The unique barcode on the right hand side (RHS) probe. This barcode identifies Gene Expression library samples (i.e., BC001).

  • Antibody Multiplexing Barcode: The unique barcode associated with the Antibody Feature Barcode oligonucleotide (for multiplexed experiments, TotalSeq™-C only). This barcode identifies Antibody Capture library samples (i.e., AB001).

  • Probe Filter: A column within the whole-transcriptome Probe Set reference file declaring the gene panel used for a Flex experiment. By default, probes predicted to have some off-target activity to homologous genes or sequences are excluded from analysis. Users can include UMI counts from all probes, including those with potential off-target activity, by setting the filter-probes field to false in the multi config file.

  • Probe Set: A whole-transcriptome reference file declaring the gene panel used for a Flex experiment, which specifies detailed information about the genes which are targeted by each probe. This file must be provided to cellranger multi via the probe-set field in the multi config file.

  • RNA-Templated Ligation: Flex is designed around a strict probe pairing framework. For each target locus, when two half-probe sequences bind to the proper locus and ligate together during the assay, a countable barcode-UMI-probe product is made.

  • Antigen Capture: BEAM (Barcode Enabled Antigen Mapping) is an antigen screening workflow that empowers the rapid discovery of antigen-specific B and T cells. In the Cell Ranger pipeline, BEAM libraries are called Antigen Capture libraries. Libraries processed using the BEAM-Ab workflow are called BCR Antigen Capture, whereas those prepared using the BEAM-T workflow are called TCR Antigen Capture.

  • Antigen specificity score: refers to the binding propensity of an antigen of interest relative to its control. If an Antigen Capture library is included in your analysis, Cell Ranger calculates an antigen specificity per barcode, as described in the Antigen Algorithms page.

  • BEAM-Ab: BEAM-Ab (Barcode Enabled Antigen Mapping for B cells) is an antigen screening workflow that empowers the rapid discovery of antigen-specific B cells.

  • BEAM-T: BEAM-T (Barcode Enabled Antigen Mapping for T cells) is an antigen screening workflow that empowers the rapid discovery of antigen-specific T cells.

  • BEAM conjugate: The BEAM conjugate is the core BEAM reagent. The BEAM assay kit comes with 16 BEAM conjugates, each composed of streptavidin, a fluorophore molecule (Phycoerythrin, PE), and a Feature Barcode oligonucleotide.

  • CDR3 (Complementarity-Determining Region 3): The three complementarity-determining regions are the amino acid sequences of a T or B cell receptor which are predicted to bind to an antigen. The nucleotide region encoding CDR3 spans the V(D)J junction, making it more diverse than the other CDRs. Therefore, CDR3 sequences are useful for identifing unique chains. See the algorithm page for details.

  • Chain: A peptide subunit of a T cell or B cell receptor. A typical B cell receptor (BCR or its secreted antibody version) is composed of two immunoglobulin (Ig) heavy chains and two Ig light chains. A typical T cell receptor (TCR) is composed of either two alpha chains and two beta chains or two gamma chains and two delta chains.

  • Clonopoiesis: The production of T or B cell clonotypes.

  • Clonotype: A set of adaptive immune cells that are the clonal progeny of a fully recombined, unmutated common ancestor. T cell clonotypes are generally distinguished by the nucleotide sequence of the rearranged TCR, which does not undergo somatic hypermutation (SHM) in the majority of vertebrate species. B cell clonotypes are commonly divergent from each other at the nucleotide level, as described above. For this reason, B cell clonotypes also frequently contain multiple exact subclonotypes (see below).

  • Exact subclonotype: A subset of cells within a clonotype that share identical immune receptor sequences at the nucleotide level, spanning the entirety of the V, D, and J genes and the V(D)J junction. Exact subclonotypes share the same V, D, J, and C gene annotations (e.g. cells that have identical V(D)J sequences but different C genes or isotypes are split into distinct exact subclonotypes).

  • Consensus: The consensus sequence for a given clonotype chain is the sequence of that chain in the first exact subclonotype.

  • Contig: A contiguous sequence of bases produced by assembly.

  • Dataset: A set of Cell Ranger outputs belonging to a set of single cells originating from the same GEM well.

  • Donor: An individual from whom adaptive immune cells (T cells, B cells) are collected (e.g. a sister and a brother would each be considered unique donors for the purposes of V(D)J aggregation).

  • Donor reference: The set of inferred germline sequences of one or more V, D, J, or constant (C) gene segments based on common mutations shared between single T and B cells from a single donor.

  • Doublet: A cell-associated barcode containing two cells.

  • Foursie: A clonotype or exact subclonotype having exactly four chains. Foursies are rarely true biological events.

  • Full-length: A contig is full-length if it matches the initial part of a V gene, continues on, and ultimately matches the terminal part of a J gene.

  • N50: The N50 of a sorted list of numbers is the midway point by weight. Example:


There are implementation differences for exactly how this is computed but they matter little when the list is long. Unlike the mean and median, the N50 discounts the contribution of many small numbers. That is why people use it!

  • N-statistic: The N-statistics, such as N50 or N99, are measures of centrality often used in genomics because they are somewhat robust to contamination by large numbers of low-value elements. In particular, the NXX is the value of the smallest element in the subset comprising the fewest and largest members such that the sum of the values of the subset is at least XX% of the total sum of the values of the data set. A larger value of an N-statistic indicates that a larger proportion of the total can be accounted for by large individual values, and for a given data set and YY greater than XX, the value for NYY will be less than or equal to the value for NXX. Thus, the N50 is essentially a weighted median.

  • Origin: The specific source from which a dataset of cells is derived. This could be a timepoint (pre- or post-treatment or vaccination or time A/B/C), a tissue (PBMC, tumor, lung), or other metadata (healthy, diseased, condition). Origins must be unique to each donor. Replicates (e.g. multiple libraries from the same population of cells) may share origins within a donor, which triggers additional replicate-based filtering.

  • Productive Contig: are described on the algorithm page