Cell Ranger Feature Reference CSV

Important

Cell Ranger v4.0 and later offers support for an untethered Feature Barcode pattern, (BC), in the Feature Reference CSV. This feature allows users to specify the sequence of the Feature Barcode without the need to specify its specific location (tether) on the read. However, it is important to note that utilizing the untethered pattern, particularly in experiments with a large number of guide RNAs, may slow down your Cell Ranger run. To ensure optimal performance, it is recommended to use a tether whenever feasible.

A Feature Reference CSV file is an essential file for detailing the unique Feature Barcode sequence and its location in the sequencing read. This CSV should contain columns for the feature name, identifier, the corresponding Feature Barcode sequence, and a pattern to extract this sequence from the read sequence.

This CSV file is a required input for the cellranger count and cellranger multi pipelines when processing Feature Barcode data. For a complete list of input files required to run specific Cell Ranger pipelines, please refer to the List of inputs page.

For cellranger count, the CSV file should be specified using the --feature-ref option. An example cellranger count command with this flag is provided on the count page.
For cellranger multi, a path to the CSV should be included in the [feature] section of the multi config CSV.

Each line in the CSV corresponds to one unique Feature Barcode. The CSV can only contain ASCII characters to ensure compatibility.

A typical Feature Reference CSV looks like this:


id,name,read,pattern,sequence,feature_type
CD3,CD3_TotalSeqB,R2,5PNNNNNNNNNN(BC),AACAAGACCCTTGAG,Antibody Capture
CD4,CD4_TotalSeqB,R2,5PNNNNNNNNNN(BC),TACCCGTAATAGCGT,Antibody Capture
CD8a,CD8a_TotalSeqB,R2,5PNNNNNNNNNN(BC),ATTGGCACTCAGATG,Antibody Capture
CD14,CD14_TotalSeqB,R2,5PNNNNNNNNNN(BC),GAAAGTCAAAGCACT,Antibody Capture
CD15,CD15_TotalSeqB,R2,5PNNNNNNNNNN(BC),ACGAATCAATCTGTG,Antibody Capture
CD16,CD16_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTCTTTGTCAGTGCA,Antibody Capture
CD56,CD56_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTTGTCCGACAATAC,Antibody Capture
CD19,CD19_TotalSeqB,R2,5PNNNNNNNNNN(BC),TCAACGCTTGGCTAG,Antibody Capture

This section describes the columns in the Feature Reference CSV file. Several example files are provided below.

Unique ID used to track feature counts. May only include ASCII characters and exclude whitespaces, slashes, quotes, or commas. Each ID must be unique and must not overlap with any gene identifier from the transcriptome.

Human-readable name for this feature. May only include ASCII characters and exclude whitespaces, slashes, quotes, or commas. This name will be displayed in the Loupe Browser Active Feature list.

Specifies which RNA sequencing read contains the Feature Barcode sequence. Must be R1 or R2. Note: in most cases, R2 is the appropriate choice.

The pattern field of the Feature Reference defines how to locate the Feature Barcode within a read. The Feature Barcode may appear at a known offset with respect to the start or end of the read or may appear at a fixed position relative to a known anchor sequence. The pattern column can consist of a combination of these elements:

5P: denotes the beginning of the read sequence. May appear zero or one time, and must be at the beginning of the pattern. Only 5P or 3P may appear, not both (^ may be used instead of 5P).
3P: denotes the end of the read sequence. May appear zero or one time, and must be at the end of the pattern ($ may be used instead of 3P).
N: denotes an arbitrary base.
A, C, G, T: denotes a fixed base that must match the read sequence exactly.
(BC): denotes the Feature Barcode sequence as specified in the sequence column of the Feature Reference. Must appear exactly once in the pattern.

Any constant sequences made up of A, C, G, and T in the pattern must match exactly in the read sequence. Any N in the pattern is allowed to match a single arbitrary base. A modest number of fixed bases should be used to minimize the chance of a sequencing error disrupting the match. The fixed sequence should also be long enough to uniquely identify the position of the Feature Barcode. For feature types that require a non-N anchor, we recommend 12bp-20bp of constant sequence.

The extracted Feature Barcode sequence is aligned to the Feature Reference, and up to one base mismatch is allowed. The extracted Feature Barcode sequences are corrected up to a Hamming distance of one base using the 10x Genomics barcode correction algorithm.

Nucleotide barcode sequence associated with this feature, e.g., antibody barcode or sgRNA protospacer sequence.

Specifies the type of feature being analyzed. Ensure that each feature_type in the Feature Reference matches a corresponding library_type in the Libraries CSV (for cellranger count) or feature_types in [libraries] section of the multi config CSV (for cellranger multi). FASTQ data noted in the Libraries CSV file under a library_type that aligns with the feature_type will be analyzed for occurrences of this feature, linking library setup and feature detection accurately.

See available options for count and multi pipelines. This field is case-sensitive.

Only relevant for BEAM-T (TCR Antigen Capture). Defines the MHC allele associated with each antigen included in the experiment. See the Feature Reference section on the Antigen Capture page for more details.

TotalSeq™-B is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' assay. The Feature Barcode sequence appears at a fixed position (10th base) on Read 2 (R2). The pattern field for a typical TotalSeq™-B Feature Reference should be 5PNNNNNNNNNN(BC).

Here is an example Feature Reference CSV:


id,name,read,pattern,sequence,feature_type
CD3,CD3_TotalSeqB,R2,5PNNNNNNNNNN(BC),AACAAGACCCTTGAG,Antibody Capture
CD4,CD4_TotalSeqB,R2,5PNNNNNNNNNN(BC),TACCCGTAATAGCGT,Antibody Capture
CD8a,CD8a_TotalSeqB,R2,5PNNNNNNNNNN(BC),ATTGGCACTCAGATG,Antibody Capture
CD14,CD14_TotalSeqB,R2,5PNNNNNNNNNN(BC),GAAAGTCAAAGCACT,Antibody Capture
CD15,CD15_TotalSeqB,R2,5PNNNNNNNNNN(BC),ACGAATCAATCTGTG,Antibody Capture
CD16,CD16_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTCTTTGTCAGTGCA,Antibody Capture
CD56,CD56_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTTGTCCGACAATAC,Antibody Capture
CD19,CD19_TotalSeqB,R2,5PNNNNNNNNNN(BC),TCAACGCTTGGCTAG,Antibody Capture
CD25,CD25_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTGCATTCAACAGTA,Antibody Capture
CD45RA,CD45RA_TotalSeqB,R2,5PNNNNNNNNNN(BC),GATGAGAACAGGTTT,Antibody Capture
CD45RO,CD45RO_TotalSeqB,R2,5PNNNNNNNNNN(BC),TGCATGTCATCGGTG,Antibody Capture
PD-1,PD-1_TotalSeqB,R2,5PNNNNNNNNNN(BC),AAGTCGTGAGGCATG,Antibody Capture
TIGIT,TIGIT_TotalSeqB,R2,5PNNNNNNNNNN(BC),TGAAGGCTCATTTGT,Antibody Capture
CD127,CD127_TotalSeqB,R2,5PNNNNNNNNNN(BC),ACATTGACGCAACTA,Antibody Capture
IgG2a,IgG2a_control_TotalSeqB,R2,5PNNNNNNNNNN(BC),CTCTATTCAGACCAG,Antibody Capture
IgG1,IgG1_control_TotalSeqB,R2,5PNNNNNNNNNN(BC),ACTCACTGGAGTCTC,Antibody Capture
IgG2b,IgG2b_control_TotalSeqB,R2,5PNNNNNNNNNN(BC),ATCACATCGTTGCCA,Antibody Capture

Please refer to BioLegend for the latest conjugated Feature Barcode information.

See this publicly available example dataset run with a TotalSeq™-B Feature Reference CSV.

TotalSeq™-C is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 5' assay. The Feature Barcode sequence appears at a fixed position (10th base) on Read 2 (R2). The pattern field for a typical TotalSeq™-C Feature Reference should be 5PNNNNNNNNNN(BC).

Here is an example Feature Reference CSV:


id,name,read,pattern,sequence,feature_type
CD3,CD3_TotalSeqC,R2,5PNNNNNNNNNN(BC),CTCATTGTAACTCCT,Antibody Capture
CD19,CD19_TotalSeqC,R2,5PNNNNNNNNNN(BC),CTGGGCAATTACTCG,Antibody Capture
CD45RA,CD45RA_TotalSeqC,R2,5PNNNNNNNNNN(BC),TCAATCCTTCCGCTT,Antibody Capture
CD4,CD4_TotalSeqC,R2,5PNNNNNNNNNN(BC),TGTTCCCGCTCAACT,Antibody Capture
CD8a,CD8a_TotalSeqC,R2,5PNNNNNNNNNN(BC),GCTGCGCTTTCCATT,Antibody Capture
CD14,CD14_TotalSeqC,R2,5PNNNNNNNNNN(BC),TCTCAGACCTCCGTA,Antibody Capture
CD16,CD16_TotalSeqC,R2,5PNNNNNNNNNN(BC),AAGTTCACTCTTTGC,Antibody Capture
CD56,CD56_TotalSeqC,R2,5PNNNNNNNNNN(BC),TTCGCCGCATTGAGT,Antibody Capture
CD25,CD25_TotalSeqC,R2,5PNNNNNNNNNN(BC),TTTGTCCTGTACGCC,Antibody Capture
CD45RO,CD45RO_TotalSeqC,R2,5PNNNNNNNNNN(BC),CTCCGAATCATGTTG,Antibody Capture
PD-1,PD-1_TotalSeqC,R2,5PNNNNNNNNNN(BC),ACAGCGCCGTATTTA,Antibody Capture
TIGIT,TIGIT_TotalSeqC,R2,5PNNNNNNNNNN(BC),TTGCTTACCGCCAGA,Antibody Capture
IgG1,IgG1_control_TotalSeqC,R2,5PNNNNNNNNNN(BC),GCCGGACGACATTAA,Antibody Capture
IgG2a,IgG2a_control_TotalSeqC,R2,5PNNNNNNNNNN(BC),CTCCTACCTAAACTG,Antibody Capture
IgG2b,IgG2b_control_TotalSeqC,R2,5PNNNNNNNNNN(BC),ATATGTATCACGCGA,Antibody Capture
CD127,CD127_TotalSeqC,R2,5PNNNNNNNNNN(BC),GTGTGTTGTCCTATG,Antibody Capture
CD15,CD15_TotalSeqC,R2,5PNNNNNNNNNN(BC),TCACCAGTACCTAGT,Antibody Capture

Please refer to BioLegend for the latest conjugated Feature Barcode information.

See this publicly available example dataset run with a TotalSeq™-C Feature Reference CSV.

The Feature Reference for Immudex's dMHC Dextramer® libraries with dCODE Dextramers has the same feature barcode pattern as TotalSeq™-C. Use "Antibody Capture" in the feature_type column for dextramer or multimer reagents. Therefore, the same Feature Reference example for TotalSeq™-C can also be used for MHC Dextramer® libraries.

To analyze Barcode Enabled Antigen Mapping (BEAM) libraries, visit the corresponding 5' Immune Profiling page.

TotalSeq™-A is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v2 and Single Cell 3' v3 kits. The Feature Barcode sequence appears at the start of Read 2 (R2). The pattern field for a typical TotalSeq™-A Feature Reference should be 5P(BC).

Here is an example Feature Reference CSV:


id,name,read,pattern,sequence,feature_type
TIGIT,TIGIT_TotalA,R2,^(BC),TTGCTTACCGCCAGA,Antibody Capture
CD279,CD279_TotalA,R2,^(BC),ACAGCGCCGTATTTA,Antibody Capture
CD127,CD127_TotalA,R2,^(BC),GGCACCCATGTCTTT,Antibody Capture
CD56,CD56_TotalA,R2,^(BC),TTCGCCGCATTGAGT,Antibody Capture
CD45RO,CD45RO_TotalA,R2,^(BC),CTCCGAATCATGTTG,Antibody Capture
CD45RA,CD45RA_TotalA,R2,^(BC),TCAATCCTTCCGCTT,Antibody Capture
CD45,CD45_TotalA,R2,^(BC),TAGCGTGGAGTAGTG,Antibody Capture
CD25,CD25_TotalA,R2,^(BC),TTTGTCCTGTACGCC,Antibody Capture
CD19,CD19_TotalA,R2,^(BC),CTGGGCAATTACTCG,Antibody Capture
CD16,CD16_TotalA,R2,^(BC),AAGTTCACTCTTTGC,Antibody Capture
CD15,CD15_TotalA,R2,^(BC),TTGGACGTGCGATCT,Antibody Capture
CD14,CD14_TotalA,R2,^(BC),TCTCAGACCTCCGTA,Antibody Capture
CD8A,CD8A_TotalA,R2,^(BC),GCTGCGCTTTCCATT,Antibody Capture
CD4,CD4_TotalA,R2,^(BC),TGTTCCCGCTCAACT,Antibody Capture
CD3,CD3_TotalA,R2,^(BC),CTCATTGTAACTCCT,Antibody Capture

Although TotalSeq™-A can be used with the CITE-Seq assay, CITE-Seq is not a 10x Genomics-supported assay. Please contact New York Genome Center or BioLegend for assistance with the assay or software.

Please refer to BioLegend for the latest conjugated Feature Barcode information.

A Feature Reference CSV for sample hashing with Antibody Capture is similar to the one used for the corresponding antibody type. For instance, if you are using TotalSeq™-A for hashing, refer to the example Feature Reference CSV for TotalSeq™-A. If your experiment involves both an Antibody Capture library and hashtag oligos for sample hashing, you must include both in the same Feature Reference CSV.

Here is an example Feature Reference CSV:


id,name,read,pattern,sequence,feature_type
TotalSeqB_Hashtag_1,Sample1,R2,5PNNNNNNNNNN(BC),GTCAACTCTTTAGCG,Antibody Capture
TotalSeqB_Hashtag_2,Sample2,R2,5PNNNNNNNNNN(BC),TGATGGCCTATTGGG,Antibody Capture
TotalSeqB_Hashtag_3,Sample3,R2,5PNNNNNNNNNN(BC),TTCCGCCTCTCTTTG,Antibody Capture
TotalSeqB_Hashtag_4,Sample4,R2,5PNNNNNNNNNN(BC),AGTAAGTTCAGCGTA,Antibody Capture
CD3_TotalSeqB,CD3,R2,5PNNNNNNNNNN(BC),AACAAGACCCTTGAG,Antibody Capture
CD4_TotalSeqB,CD4,R2,5PNNNNNNNNNN(BC),TACCCGTAATAGCGT,Antibody Capture
CD8a_TotalSeqB,CD8a,R2,5PNNNNNNNNNN(BC),ATTGGCACTCAGATG,Antibody Capture
CD14_TotalSeqB,CD14,R2,5PNNNNNNNNNN(BC),GAAAGTCAAAGCACT,Antibody Capture
CD15_TotalSeqB,CD15,R2,5PNNNNNNNNNN(BC),ACGAATCAATCTGTG,Antibody Capture
CD16_TotalSeqB,CD16,R2,5PNNNNNNNNNN(BC),GTCTTTGTCAGTGCA,Antibody Capture

The first four rows (after the header) correspond to the hashtag oligos used for sample multiplexing. The subsequent rows provide information for the Antibody Capture Feature Barcode library.

If you do not have an Antibody Capture library, you only need the rows containing hashtag oligo information:


id,name,read,pattern,sequence,feature_type
TotalSeqB_Hashtag_1,Sample1,R2,5PNNNNNNNNNN(BC),GTCAACTCTTTAGCG,Antibody Capture
TotalSeqB_Hashtag_2,Sample2,R2,5PNNNNNNNNNN(BC),TGATGGCCTATTGGG,Antibody Capture
TotalSeqB_Hashtag_3,Sample3,R2,5PNNNNNNNNNN(BC),TTCCGCCTCTCTTTG,Antibody Capture
TotalSeqB_Hashtag_4,Sample4,R2,5PNNNNNNNNNN(BC),AGTAAGTTCAGCGTA,Antibody Capture

You can download the example Feature Reference CSV for samples hashed with TotalSeq™-B Antibody Capture.

Example multi-config CSVs for hashing with Antibody Capture can be found on the Sample Multiplexing with Cell Ranger multi page.

For additional guidance on hashing with Antibody Capture, visit the What is Sample Multiplexing page.

Proteintech Genomics (PTG) provides a line of antibody cocktails targeting intracellular and cell surface proteins, fully compatible with the 10x Genomics Flex platform.

For PTG-derived Antibody Capture libraries, the Feature Barcode sequence appears at the start of Read 2 (R2) read. The pattern field for a typical PTG Feature Reference should be 5P(BC).

The sequencing configuration for PTG-derived Antibody Capture libraries is detailed in this Knowledge Base article.

Cell Ranger can also analyze mixed Antibody Capture libraries containing both BioLegend and PTG antibody-labeled cells, provided that the library is sequenced using the Read 1 sequencing configuration (Read 1: 48 cycles; i7 index: 10 cycles; i5 index: 10 cycles; Read 2: 50 cycles).

Here is an example Feature Reference CSV:


id,name,read,pattern,sequence,feature_type
POU2AF1,POU2AF1_PTG,R2,^(BC),GGTATCCGCAAGCGT,Antibody Capture
VIM,IM_PTG,R2,^(BC),ACATGCCTAGCTCCG,Antibody Capture
AHNAK,AHNAK_PTG,R2,^(BC),CTGCGTACAGGTGGA,Antibody Capture
BACH1,BACH1_PTG,R2,^(BC),GCCATCACGGCACGT,Antibody Capture
SYK,SYK_PTG,R2,^(BC),CGTGATGCGCTGACG,Antibody Capture

In CRISPR Guide Capture assays, the sequence is the CRISPR protospacer sequence. The protospacer is followed by a downstream constant sequence in the guide RNA, which serves as an anchor to identify the location of the protospacer. We recommend using a 12bp-20bp constant sequence that is uniquely identifiable but short enough to minimize the likelihood of disruption by sequencing errors.

The example Feature Reference CSV files list six guide RNA features, each with six distinct barcode/protospacer sequences (sequence column). The pattern column has the same pattern for all six features. We use the target_gene_id and target_gene_name columns to declare the target gene of each guide RNA, for use in downstream CRISPR perturbation analysis. Two guides are declared with target_gene_id as Non-Targeting. Cells containing Non-Targeting guides will be used as controls for CRISPR perturbation analysis. The four remaining guides target two genes.

The Feature Barcode sequence appears on Read 2 (R2). The pattern sequence differs between 3' and 5' CRISPR Guide Capture libraries. Examples are provided below.

3' CRISPR Guide Capture

Here is an example Feature Reference for 3' a CRISPR Guide Capture library:


id,name,read,pattern,sequence,feature_type,target_gene_id,target_gene_name
ACTR8-1,ACTR8-1,R2,(BC)GTTTAAGAGCTAAGCTGGAA,GAAGGGCGGCGAGAAGGAGA,CRISPR Guide Capture,ENSG00000113812,ACTR8
ACTR8-2,ACTR8-2,R2,(BC)GTTTAAGAGCTAAGCTGGAA,GAGAACGGAAAGGAGAAGGG,CRISPR Guide Capture,ENSG00000113812,ACTR8
BCL2-1,BCL2-1,R2,(BC)GTTTAAGAGCTAAGCTGGAA,GGAGGAGAAGATGCCCGGTG,CRISPR Guide Capture,ENSG00000171791,BCL2
BCL2-2,BCL2-2,R2,(BC)GTTTAAGAGCTAAGCTGGAA,TGTACTTCATCACTATCTCC,CRISPR Guide Capture,ENSG00000171791,BCL2
NEG_CTRL-1,NEG_CTRL-1,R2,(BC)GTTTAAGAGCTAAGCTGGAA,GACCGGGGGGGTGCGATGTA,CRISPR Guide Capture,Non-Targeting,Non-Targeting
NEG_CTRL-2,NEG_CTRL-2,R2,(BC)GTTTAAGAGCTAAGCTGGAA,GTGTACTAGTGACGACTATA,CRISPR Guide Capture,Non-Targeting,Non-Targeting

You can download the example 3' CRISPR Feature Reference CSV.

5' CRISPR Guide Capture

Sequencing reads are generated from the 5' end to the 3' end. In 5' CRISPR Guide Capture libraries, the guide RNA sequences are captured in the reverse orientation relative to the sequencing read. As a result, the guide RNA sequence appears in the reverse complement form within the read. To accurately match the guide RNA sequence in the read, the pattern must be reverse complemented. This ensures that the correct guide RNA sequences are identified during Cell Ranger analysis.

Here is an example Feature Reference for 5' a CRISPR Guide Capture library:


id,name,read,pattern,sequence,feature_type,target_gene_id,target_gene_name
ACTR8-1,ACTR8-1,R2,TTCCAGCATAGCTCTTAAAC(BC),GAAGGGCGGCGAGAAGGAGA,CRISPR Guide Capture,ENSG00000113812,ACTR8
ACTR8-2,ACTR8-2,R2,TTCCAGCATAGCTCTTAAAC(BC),GAGAACGGAAAGGAGAAGGG,CRISPR Guide Capture,ENSG00000113812,ACTR8
BCL2-1,BCL2-1,R2,TTCCAGCATAGCTCTTAAAC(BC),GGAGGAGAAGATGCCCGGTG,CRISPR Guide Capture,ENSG00000171791,BCL2
BCL2-2,BCL2-2,R2,TTCCAGCATAGCTCTTAAAC(BC),TGTACTTCATCACTATCTCC,CRISPR Guide Capture,ENSG00000171791,BCL2
NEG_CTRL-1,NEG_CTRL-1,R2,TTCCAGCATAGCTCTTAAAC(BC),GACCGGGGGGGTGCGATGTA,CRISPR Guide Capture,Non-Targeting,Non-Targeting
NEG_CTRL-2,NEG_CTRL-2,R2,TTCCAGCATAGCTCTTAAAC(BC),GTGTACTAGTGACGACTATA,CRISPR Guide Capture,Non-Targeting,Non-Targeting

You can download the example 5' CRISPR Feature Reference CSV.

CRISPR Guide Capture and Antibody Capture

The structure of a Feature Reference CSV that includes both CRISPR Guide Capture and Antibody Capture libraries is similar to a CRISPR-only Feature Reference CSV. To incorporate Antibody Capture information, you can append the relevant entries to the bottom of the CSV (as shown below). For Antibody Capture libraries, you can ignore the target_gene_id and target_gene_name columns.


id,name,read,pattern,sequence,feature_type,target_gene_id,target_gene_name
RAB1A-1,RAB1A-1,R2,TTCCAGCATAGCTCTTAAAC(BC),ATGGCATCATAGTTGTGTAT,CRISPR Guide Capture,ENSG00000138069,RAB1A
Non_Target-1,Non_Target-1,R2,TTCCAGCATAGCTCTTAAAC(BC),ATATCAACCGAACGACTGCC,CRISPR Guide Capture,Non-Targeting,Non-Targeting
CD3,CD3,R2,^NNNNNNNNNN(BC)NNNNNNNNN,CTCATTGTAACTCCT,Antibody Capture,,
CD4,CD4,R2,^NNNNNNNNNN(BC)NNNNNNNNN,TGTTCCCGCTCAACT,Antibody Capture,,