BAM (Binary Alignment Map) files are a binary format used to store sequencing data aligned to a reference genome.
The cellranger count
pipeline generates an indexed BAM file named possorted_genome_bam.bam. This file contains position-sorted reads aligned to the genome and transcriptome, along with unassigned reads. If Feature Barcode libraries are included in the analysis, the BAM file will also include both aligned and unaligned records for each library type (e.g., Antibody Capture, CRISPR Guide Capture, 3' Cell Multiplexing, Antigen Capture).
The cellranger multi
pipeline produces two BAM files. It separates the reads into unassigned reads, which are stored in unassigned_alignments.bam, and reads that are assigned to samples, which are stored in sample_alignments.bam.
BAM files can be used for troubleshooting reads that were unassigned or for converting BAM files back to FASTQ files. Each read in this BAM file has Chromium cellular and molecular barcode information attached. The following tables assume basic familiarity with the BAM format. The Type column refers to the type of value stored in the tag (see the SAM/BAM standard) documentation for details.
Chromium cellular and molecular barcode information for each read is stored in the following TAG
fields:
Tag | Type | Description |
---|---|---|
CB | Z | Chromium cellular barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences. For multiplex Flex, the cellular barcode is a combination of the 10x GEM Barcode and Probe Barcode sequences. |
CR | Z | Chromium cellular barcode sequence as reported by the sequencer. For multiplex Flex, the cellular barcode is a combination of the 10x GEM Barcode and Probe Barcode sequences. |
CY | Z | Chromium cellular barcode read quality. For multiplex Flex, the cellular barcode is a combination of the 10x GEM Barcode and Probe Barcode sequences. Phred scores as reported by sequencer. |
UB | Z | Chromium molecular barcode sequence that is error-corrected among other molecular barcodes with the same cellular barcode and gene alignment. |
UR | Z | Chromium molecular barcode sequence as reported by the sequencer. |
UY | Z | Chromium molecular barcode read quality. Phred scores as reported by sequencer. |
TR | Z | Trimmed sequence. For the Single Cell 3' v1 chemistry, this is trailing sequence following the UMI on Read 2. For the Single Cell 3' v2 chemistry, this is trailing sequence following the cell and molecular barcodes on Read 1. |
RG | Z | Identifies the read group, indicating the library source of each read. |
The cell barcode CB
tag includes a suffix with a dash separator followed by a number:
AAACCCAAGGAGAGTA-1
This number denotes the GEM well, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM well channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM well channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.
The following tags will also be present on reads that mapped to the genome and overlapped an exon by at least one base pair. Reads aligned to the transcriptome across exon junctions in the genome will have a large gap in their CIGAR string i.e., 35M225N64M. A read may align to multiple transcripts and genes, but it is only considered confidently mapped to the transcriptome if it is mapped to a single gene (see this page for methods to check for multi-mapped reads). Cell Ranger modifies MAPQ values; see the mm tag below.
Tag | Type | Description |
---|---|---|
TX | Z | Present in reads aligned to the same strand as the transcripts in this semicolon-separated list that are compatible with this alignment. Transcripts are specified with the transcript_id key in the reference GTF attribute column. The format of each entry is [transcript_id] ,[strand][pos] ,[cigar] . strand is + as reads with this annotation were correctly aligned in the expected orientation (in contrast to the AN tag below, where the strand is - to indicate antisense alignments). pos is the alignment offset in transcript coordinates, and cigar is the CIGAR string in transcript coordinates. |
AN | Z | Present for reads that are aligned to the antisense strand of annotated transcripts. If intron counts are not included (with include-introns=false ), this tag is the same as the TX tag but with - values for the strand identifier. If introns are included (include-introns=true ), the AN tag contains the corresponding antisense gene identifier values (starting with ENSG) rather than transcript identifier values (starting with ENST). |
GX | Z | Semicolon-separated list of gene IDs that are compatible with this alignment. Gene IDs are specified with the gene_id key in the reference GTF attribute column. |
GN | Z | Semicolon-separated list of gene names that are compatible with this alignment. Gene names are specified with gene_name key in the reference GTF attribute column. |
mm | i | Set to 1 if the genome-aligner (STAR) originally gave a MAPQ < 255 (it multi-mapped to the genome) and Cell Ranger changed it to 255 because the read overlapped exactly one gene. |
RE | A | Single character indicating the region type of this alignment (E = exonic, N = intronic, I = intergenic). |
pa | i | The number of poly-A nucleotides trimmed from the 3' end of read 2. Up to 10% mismatches are permitted. |
pr | Z | For Flex, a semicolon-separated list of probe IDs: one probe ID if both read halves align to the same probe, and two probe IDs if each read half aligns to a different probe, or NA if a read half does not align to a probe. |
ts | i | The number of template switch oligo (TSO) nucleotides trimmed from the 5' end of read 2. Up to 3 mismatches are permitted. The 30-bp TSO sequence is AAGCAGTGGTATCAACGCAGAGTACATGGG . |
xf | i | Extra alignment flags. The bits of this tag are interpreted as follows: 1 - The read is confidently mapped to the transcriptome 2 - This read's barcode, UMI, and feature combination was discarded in favor of a different feature with higher read support 4 - This read pair maps to a discordant pair of genes, and is not treated as a UMI count 8 - This read is representative for a molecule and is treated as a UMI count 16 - This read maps to exactly one feature, and is identical to bit 1 for transcriptomic reads. Notably, this bit is set for a Feature Barcode read, while bit 1 is not 32 - This read was removed by targeted UMI filtering. |
Some aspects of the BAM file are particular to Flex samples. The mapping quality (MAPQ) values described here do not apply to 3' or 5' Gene Expression data.
A read that multi-maps to the reference transcriptome, but maps uniquely to a single probe, is considered confidently mapped to that probe and its gene. The mapping quality has the following definition:
MAPQ | Description |
---|---|
255 | Both read halves map to the same probe. |
3 | Each read half maps to a different probe. |
1 | One read half maps to a probe and the other half does not |
0 | Neither read half maps to a probe |
The following tags represent the Feature Barcode sequence extracted from the read and the feature reference it was matched to, if any. Sequencing reads passed in as a Feature Barcode library type are not aligned to the genome and the BAM file will contain unassigned records for these reads. See the Feature Barcode Extraction Pattern section for more details about the alignment algorithm. The BAM read sequence will contain all the bases outside of the cell barcode and UMI regions.
Tag | Type | Description |
---|---|---|
fb | Z | Chromium Feature Barcode sequence that is error-corrected and confirmed against known Feature Barcode sequences from the feature reference. |
fr | Z | Chromium Feature Barcode sequence as reported by the sequencer. |
fq | Z | Chromium Feature Barcode read quality. Phred scores as reported by sequencer. |
fx | Z | Feature identifier matched to this Feature Barcode read. Specified in the id column of the feature reference. |
This Analysis Guide tutorial walks users through the process of identifying records in the BAM file that contribute to UMI counting: Navigating 10x Genomics Barcoded BAM Files. Note: 10x Genomics does not provide support for community-developed tools and makes no guarantees regarding their function or performance. Please contact tool developers with any questions. If you have feedback about Analysis Guides, please email [email protected].