A Feature Reference CSV file is required when processing Visium CytAssist Protein Expression data. Space Ranger v2.1 and later comes bundled with Visium_Human_Immune_Cell_Profiling_Panel_v1.0.csv
which contains the protein target information and unique oligo sequence for each of the antibodies present in the panel. Each line of the CSV declares one unique antibody. The Feature Reference CSV file is passed to spaceranger count
with the --feature-ref
flag. Please note that the CSV may not contain characters outside of the ASCII range.
This table describes the columns in the bundled Feature Reference CSV file.
Column Name | Description |
---|---|
id | Unique ID for the feature. Gene symbol and suffix integer if multiple antibodies targeting the same protein, e.g., for two antibodies targeting SDC1: SDC1*1 and SDC1_2. The id will be used in the feature-barcode matrix and web_summary.html. |
name | Human-readable name for this feature. Must not contain spaces. This name will be displayed in Loupe Browser. |
read | Specifies which RNA sequencing read contains the Feature Barcode sequence. Must be R1 or R2 . Note: in most cases R2 is the correct read. |
pattern | Specifies how to extract the Antibody Barcode sequence from the read. See the Barcode Extraction Pattern section below for details. |
sequence | Nucleotide barcode sequence associated with this feature, e.g., antibody barcode. |
feature_type | Type of the feature the value for which is Antibody Capture in this file. See the Feature and library types section for details on allowed values of this field. FASTQ data specified in the library CSV file with a library_type that matches the feature_type will be scanned for occurrences of this feature. Each feature type in the feature reference must match a library_type entry in the libraries CSV file. This field is case sensitive. |
isotype_control | True/False indicating whether the antibody is an isotype control. |
secondary_name | _Optional*. Secondary human-readable name for this feature. Must not contain spaces. This name will also be displayed in Loupe Browser. For antibody capture, this column should contain the common name, while the name column should contain the official protein name. |
The pattern
field of the feature reference defines how to locate the Antibody Barcode within a read. The Antibody Barcode may appear at a known offset with respect to the start or end of the read or may appear at a fixed position relative to a known anchor sequence. The pattern
column can be made up of a combination of these elements:
- 5P: denotes the beginning of the read sequence. May appear zero or one times, and must be at the beginning of the pattern. Only 5P or 3P may appear, not both (^ may be used instead of 5P).
- 3P: denotes the end of the read sequence. May appear zero or one times, and must be at the end of the pattern ($ may be used instead of 3P).
- N: denotes an arbitrary base. - A, C, G, T: denotes a fixed base that must match the read sequence exactly.
- (BC): denotes the Antibody Barcode sequence as specified in the
sequence
column of the feature reference. Must appear exactly once in the pattern.
Any constant sequences made up of A, C, G, and T in the pattern must match exactly in the read sequence. Any N in the pattern is allowed to match a single arbitrary base. A modest number of fixed bases should be used to minimize the chance of a sequencing error disrupting the match. The fixed sequence should also be long enough to uniquely identify the position of the Antibody Barcode. For feature types that require a non-N anchor, 10x recommends 12-20 bp of constant sequence.
The extracted Antibody Barcode sequence is aligned to the feature reference and up to one base mismatch is allowed. The extracted Antibody Barcode sequences are corrected up to a Hamming distance of one base with the 10x Genomics barcode correction algorithm used for correcting spatial barcodes.