Space Ranger BAM

The spaceranger count pipeline outputs an indexed BAM file containing position-sorted reads aligned to the genome and transcriptome, as well as unaligned reads. BAM files can be used for troubleshooting reads that were unaligned or converting BAM files back to FASTQ files.

Each read in this BAM file has Visium barcode and UMI information attached. The following assumes basic familiarity with the BAM format. The Type column refers to the type of value stored in the tag (see the SAM/BAM standard) documentation for details.

Although BAM files are generated by default, this behavior can be modified by providing the --no-bam flag to the spaceranger count run.

Space Ranger v2.0 and later, if the slide serial number and capture area were supplied to the Space Ranger count pipeline, they are captured in the BAM file header under the CO tag.

Visium barcode, UMI, and other information for each read is stored as TAG fields:

Tag	Type	Description
`CB`	Z	Visium barcode that is error-corrected and confirmed against a list of known barcode sequences.
`CR`	Z	Visium barcode sequence as reported by the sequencer.
`CY`	Z	Visium barcode read quality. Phred scores as reported by the sequencer.
`UB`	Z	Visium UMI sequence that is error-corrected among other UMIs with the same barcode and gene alignment.
`UR`	Z	Visium UMI sequence as reported by the sequencer.
`UY`	Z	Visium UMI sequence read quality. Phred scores as reported by the sequencer.
`BC`	Z	Sample index read.
`QT`	Z	Sample index read quality. Phred scores as reported by the sequencer.
`TR`	Z	Trimmed sequence. Trailing sequence (if any) following the barcode and UMI on Read 1.
`1R`	Z	Visium HD and HD 3' only. Full Read 1 sequence as reported by the sequencer.
`1Y`	Z	Visium HD and HD 3' only. Read 1 read quality. Phred scores as reported by the sequencer.
`sb`	Z	Visium HD and HD 3' only. Barcode name, following the format `s_{bin size}um_{row}\{col}`, where `row`and`col`are 0-indexed. See the`tissue_positions.parquet` file within the binned outputs (2 µm) for corresponding row, column, and tissue image positions.

The barcode CB tag (Visium v1/v2) or the barcode name tag sb (Visium HD/HD 3') includes a suffix with a dash separator followed by a number:

AACACTTGGCAAGGAA-1

If present, this number will always be one (1) in the current Space Ranger output.

The following tags are also present on reads that mapped to the genome and overlapped an exon by at least one base pair. Reads aligned to the transcriptome across exon junctions in the genome have a large gap in its CIGAR string, such as 35M225N64M. A read may align to multiple transcripts and genes, but it is only considered confidently mapped to the transcriptome if it mapped to a single gene. Note that the BAM files contain all UMIs, including those mapped to barcodes outside of tissue, unlike the filtered matrices. Space Ranger modifies MAPQ values; see the MM tag below.

Tag	Type	Description
`TX`	Z	Present in reads aligned to the same strand as the transcripts in this semicolon-separated list that are compatible with this alignment. Transcripts are specified with the `transcript_id` key in the reference GTF attribute column. The format of each entry is `[transcript_id],[strand],[pos],[cigar]`. `strand` is `+` as reads with this annotation were correctly aligned in the expected orientation (in contrast to the `AN` tag below, where the strand is `-` to indicate antisense alignments). `pos` is the alignment offset in transcript coordinates, and `cigar` is the CIGAR string in transcript coordinates.
`AN`	Z	Same as the TX tag, but for reads that are aligned to the antisense strand of annotated transcripts i.e. with `-` values for the strand identifier.
`GX`	Z	Semicolon-separated list of gene IDs that are compatible with this alignment. Gene IDs are specified with the `gene_id` key in the reference GTF attribute column.
`GN`	Z	Semicolon-separated list of gene names that are compatible with this alignment. Gene names are specified with `gene_name` key in the reference GTF attribute column.
`mm`	i	Set to 1 if the genome-aligner (STAR) originally gave a MAPQ < 255 (it multi-mapped to the genome) and Space Ranger changed it to 255 because the read overlapped exactly one gene.
`RE`	A	Single character indicating the region type of this alignment (E = exonic, N = intronic, I = intergenic).
`pr`	Z	For Visium FFPE, a semicolon-separated list of probe IDs: one probe ID if both read halves align to the same probe, and two probe IDs if each read half aligns to a different probe, or NA if a read half does not align to a probe.
`pa`	i	The number of poly-A nucleotides trimmed from the 3' end of read 2. Up to 10% mismatches are permitted.
`ts`	i	The number of template switch oligo (TSO) nucleotides trimmed from the 5' end of read 2. Up to 3 mismatches are permitted. The 30-bp TSO sequence is `AAGCAGTGGTATCAACGCAGAGTACATGGG`.
`xf`	i	Extra alignment flags. The bits of this tag are interpreted as follows: 1 - The read is confidently mapped to the transcriptome. 2 - The read maps to a feature that the majority of other reads with this UMI did not. 4 - This read pair maps to a discordant pair of genes, and is not treated as a UMI count. 8 - This read is representative of a molecule and is treated as a UMI count. 16 - This read maps to exactly one feature and is identical to bit 1 for transcriptomic reads. Notably, this bit is set for a feature barcode read, while bit 1 is not. 32 - This read was not analyzed due to high sequencing depth and subsampling for Targeted Spatial Gene Expression.

Some aspects of the BAM file are particular to formalin fixed paraffin embedded (FFPE) samples.

A read that multimaps to the reference transcriptome, but maps uniquely to a single probe, is confidently mapped to that probe and its gene. The mapping quality has the following meaning:

MAPQ	Description
`255`	Both read halves map to the same probe.
`3`	Each read half maps to a different probe.
`1`	One read half maps to a probe and the other half does not.
`0`	Neither read half maps to a probe.

The BAM tag pr:Z represents the value of the probe_id in the probe_set.csv file. In addition, the BAM tag fx:Z represents the feature identifier which for FFPE Gene Expression has the same value as GX:Z tag, i.e., the gene ID.

Space Ranger v2.1 introduced support for Protein Expression. Sequencing data passed in as a Protein Expression library type is not aligned to the genome. The BAM file will contain unaligned records for these reads, with the following tags representing the Antibody Barcode sequence extracted from the read, and the feature reference it was matched to, if any. The BAM read sequence will contain all the bases outside of the barcode and UMI regions.

Tag	Type	Description
`fb`	Z	Visium Antibody Barcode sequence that is error-corrected and confirmed against known feature barcode sequences from the feature reference.
`fr`	Z	Visium Antibody Barcode sequence as reported by the sequencer.
`fq`	Z	Visium Antibody Barcode read quality. Phred scores as reported by the sequencer.
`fx`	Z	Feature identifier matched to this Antibody Barcode read. Specified in the `id` column of the `feature_reference.csv` file.

BAM barcode tags

BAM alignment tags

Visium FFPE tags

Protein barcode tags