The cellranger
pipeline outputs unfiltered (raw) and filtered feature-barcode matrices in two file formats: the Market Exchange Format (MEX), which is described on this page, and Hierarchical Data Format (HDF5), which is described in detail here.
Each element of the feature-barcode matrix is the number of UMIs associated with a feature (row) and a barcode (column):
Type | Description |
---|---|
Unfiltered feature-barcode matrix | Contains every barcode from the fixed list of known-good barcode sequences that has at least one read. This includes background and cell-associated barcodes. count: outs/raw_feature_bc_matrix/ multi: outs/multi/count/raw_feature_bc_matrix/ |
Filtered feature-barcode matrix | Contains only detected cell-associated barcodes. count: outs/filtered_feature_bc_matrix/ multi: outs/per_sample_outs/count/sample_filtered_feature_bc_matrix/ |
For sparse matrices, the matrix is stored in the Market Exchange Format (MEX). It contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. For example, the matrices output may look like:
cd /home/jdoe/runs/sample345/outs
tree filtered_feature_bc_matrix
filtered_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
0 directories, 3 files
Features correspond to row indices. For each feature, the feature ID and name are stored in the first and second column of the (unzipped) features.tsv.gz
file, respectively. The third column identifies the type of feature, which will be one of Gene Expression
, Antibody Capture
, CRISPR Guide Capture
, Multiplexing Capture
, or CUSTOM
, depending on the feature type. Below is a minimal example features.tsv.gz
file showing data collected for three genes and two antibodies.
gzip -cd filtered_feature_bc_matrix/features.tsv.gz
ENSG00000141510 TP53 Gene Expression
ENSG00000012048 BRCA1 Gene Expression
ENSG00000139687 RB1 Gene Expression
CD3_GCCTGACTAGATCCA CD3 Antibody Capture
CD19_CGTGCAACACTCGTA CD19 Antibody Capture
For Gene Expression
data, the ID corresponds to gene_id
in the annotation field of the reference GTF. Similarly, the name corresponds to gene_name
in the annotation field of the reference GTF. If no gene_name
field is present in the reference GTF, gene name is equivalent to gene ID. Similarly, for Antibody Capture
and CRISPR Guide Capture
data, the id
and name
are taken from the first two columns of the Feature Reference CSV file.
For multi-species experiments, gene IDs and names are prefixed with the genome name to avoid name collisions between genes of different species e.g., GAPDH becomes hg19_GAPDH
and Gm15816 becomes mm10_Gm15816
.
Barcode sequences correspond to column indices:
gzip -cd filtered_feature_bc_matrices/barcodes.tsv.gz
AAACCCAAGGAGAGTA-1
AAACGCTTCAGCCCAG-1
AAAGAACAGACGACTG-1
AAAGAACCAATGGCAG-1
AAAGAACGTCTGCAAT-1
AAAGGATAGTAGACAT-1
AAAGGATCACCGGCTA-1
AAAGGATTCAGCTTGA-1
AAAGGATTCCGTTTCG-1
AAAGGGCTCATGCCCT-1
Each barcode sequence includes a suffix with a dash separator followed by a number:
AAACCCAAGGAGAGTA-1
More details on the barcode sequence format are available in the barcoded BAM section.
R and Python support the MEX format and sparse matrices can be used for more efficient manipulation.
For suggestions on downstream analysis with 3rd party R and Python tools, see the 10x Genomics Analysis Guides resource.
The R package Matrix supports loading MEX format data, and can be easily used to load the sparse feature-barcode matrix.
Cell Ranger represents the feature-barcode matrix using sparse formats (only the nonzero entries are stored) in order to minimize file size. All of our programs, and many other programs for gene expression analysis, support sparse formats.
However, certain programs (e.g. Excel) only support dense formats (where every row-column entry is explicitly stored, even if it's a zero). Here are a few methods for converting feature-barcode matrices to CSV:
Load matrices into Python
The csv
, os
, gzip
, and scipy.io
modules can be used to load a feature-barcode matrix into Python.
mat2csv
You can convert a feature-barcode matrix to dense CSV format using the cellranger mat2csv
command.
This command takes two arguments - an input matrix generated by Cell Ranger (either an HDF5 file or a MEX directory), and an output path for the dense CSV. For example, to convert a matrix from a pipestance named sample123
in the current directory, either of the following commands would work:
# Convert from MEX
cellranger mat2csv sample123/outs/filtered_feature_bc_matrix sample123.csv
# Or, convert from HDF5
cellranger mat2csv sample123/outs/filtered_feature_bc_matrix.h5 sample123.csv
You can then load sample123.csv
into Excel.
Shell commands
Please see this Q&A article for shell commands to convert MEX files to CSV. This method creates a single file that is sparse (zeroes are ignored).