New feature: Fixed RNA Profiling with multiplex Antibody Capture
-
Cell Ranger v7.2 is required for analysis of Fixed RNA Profiling data with multiplexed Gene Expression and Antibody Capture libraries. Instructions for running the
cellranger multi
subcommand are described in the running multi pipeline page. Output files are described in the Understanding Outputs section. The Fixed RNA Profiling algorithms section includes descriptions of the new methods that were developed for processing multiplexed Gene Expression and Antibody Capture data. -
New probe-level count matrix output files for FRP:
raw_probe_bc_matrix.h5
andsample_raw_probe_bc_matrix.h5
. -
The
frp_gem_barcode_overlap.csv
probe overlap file now contains content for Antibody Multiplexing Barcodes.
5’ Immune Profiling
-
New feature: Cell Ranger v7.2 supports the aggregation of BEAM (Antigen Capture) libraries with
cellranger aggr
to combine and normalize the calculation of antigen specificity scores across multiple (or large samples) split across wells. -
The outputs of
cellranger aggr
for 5' Immune Profiling libraries now include theairr_rearrangement.tsv
(not produced in previous versions of this pipeline). -
A bug that caused all alignments in the
consensus.bam
andconcat_ref.bam
files to have their POS field set to the default value of 1 has been fixed. -
The
all_contig_annotations.json
output file has an additional field calledjunction_support
that is a map of{reads: x, umis: y}
supporting the junction region of a contig. This information is generated by thecellranger vdj
assembler for productive contigs in reference-assisted assembly (or valid contigs in de novo assembly) and used for confidence determination and cell filtering.
General updates and bug fixes
-
Targeted Gene Expression analysis deprecated:
count
andmulti
pipelines in v7.2 and later do not support the analysis of Targeted Gene Expression libraries. -
NovaSeq X demultiplexing support with bcl2fastq v2.20 dependency.
-
An new command line argument
--outputs-dir
is available forcount
,multi
,mkref
,mkvdjref
, andvdj
pipelines to specify a custom output directory. -
Batch effect score calculations modified to normalize and scale with the number of cells in the dataset.
-
Cell Ranger now performs a preflight check to verify that the prebuilt reference file is complete. The use of an incomplete reference (due to incomplete download or corrupt file) produces an error message.
-
BAM tags update: MM changed to mm to enable compatibility with IGV.
-
BAM files now have @PG header noting which pipeline version was used to generate the file.
-
Improvements to the cellranger aggr web summary:
-
Layout improved.
-
Metrics table per library.
-
Provides aggregation summary for BEAM (Antigen Capture) libraries.
-
-
Improvements to the cellranger multi web summary:
-
In the Feature Barcode Expression Metrics table of the cells tab, the
Median UMI counts per cell metric
has been renamed in the Antibody tab:-
Antibody Capture:
Median antibody UMI counts per cell
. -
CRISPR, Antigen Capture, and Custom library: remains unchanged.
-
-
Antibody histogram has moved from the library tab to the cells tab.
-
A per-sample antibody barcode rank plot has been added for 3’ Cell Multiplexing web summaries.
-
FRP new metrics to capture the fraction of reads mapping to two different halves of probes:
Reads half-mapped to probe set
,Reads split-mapped to probe set
.
-
-
Improvement to the Batch Effect Score (BES) calculation: normalization and scaling has changed from using √N nearest neighbors to 0.01*N nearest neighbors (with N being number of cells).
Changes that apply to Fixed RNA Profiling analysis
-
v1.0.1 probe set reference CSVs for human and mouse have a new region column, which indicates whether a probe spans a splice junction by at least 10 bp (spliced) or not (unspliced).
-
When the v1.0.1 probe set reference CSV is used in Cell Ranger v7.1, the web summary and
metrics_summary.csv
files will include genomic DNA metrics. The region column information is used to calculate these metrics. -
The molecule_info.h5 files include the probe to which each molecule is mapped.
-
In the probe set reference CSV, the
included
column is set toFALSE
for all deprecated probes.
General improvements
-
Calling cell barcode improvement: The auto-estimated
expect-cells
upper range has been restricted to the lower range of the EmptyDrops method. Previously, the upper range was 262,000 cells. The new upper range is 45,000 for single cell gene expression analyses. The exception is super-loaded multiplex Fixed RNA Profiling analyses, where the range is calculated as:max(45,000, number of probe barcodes × 22,500)
. -
Improvement to the Batch Effect Score (BES) calculation to use √n nearest neighbors, where n is the total number of cells, instead of 100. Cells are no longer subsampled to 10%.
-
A new compression format,
.tar.xz
, is available on the Cell Ranger downloads page. The smaller file size enables faster download. -
Cell Ranger 7.1 introduces a new subcommand,
cellranger multi-template
, which provides descriptions for all multi config CSV parameters and produces a config CSV template. Runcellranger multi-template -h
for help. -
The
ARC-v1
chemistry may be used to analyze only the Gene Expression library portion of a Multiome ATAC + Gene Expression experiment. -
The
aggregate_barcodes.csv
output file for Antibody Capture analyses is no longer stored in aantibody_analysis/
sub-directory. In the cellranger multi pipeline, it is found inouts/per_sample_outs/<sample_name>/count/aggregate_barcodes.csv
. In thecellranger count
pipeline, it is found in outs/aggregate_barcodes.csv
. For both pipelines, the CSV file is only generated if antibody aggregates are detected. -
The web summaries for Antibody Capture libraries include a Distribution of Antibody Counts plot to show the relative composition of antibody counts for antibodies with at least one UMI.
Bug fixes
-
Fixed a bug in the OrdMag algorithm, which could result in all barcodes being called as cells when there are very few cells.
-
Improvement to 3' Cell Multiplexing tag assignment for samples with a large number of zero CMO UMI counts.
-
Fixed a 3' Cell Multiplexing t-SNE plot bug where the plots were generated assuming a full set of CMOs in
cmo-set
instead of those used in the experiment. -
Fixed conditions resulting in negative gap errors.
-
Fixed a bug that causes pipestance failure with an IOError message: "directory
%s
exists but it can not be written".
The updates are explained further in these Knowledge Base articles:
- My samples are analyzed with Cell Ranger v7.0. Should I rerun analysis using the latest Cell Ranger v7.1?
- What are the cell calling updates in Cell Ranger v7.1 and its impact on Single Cell Gene Expression data?
- Should I upgrade Cell Ranger to v7.1 for cell multiplexing analysis?
New feature: Barcode Enabled Antigen Mapping (BEAM) or Antigen Capture
-
Cell Ranger 7.1 is required for the analysis of BEAM libraries. Instructions for running cellranger multi are described in the Antigen Capture page. The new
feature_type
, antigen capture, the Feature Reference CSV that specifies the list of antigens (and MHC alleles) included in the experiment, and all the antigen specific customizable parameters in a multi config CSV are described in detail. Example multi config CSV for TCR and BCR Antigen Capture libraries are also provided. -
The algorithms section includes a page called Antigen Algorithms with a description of the new methods developed for processing Antigen Capture (BEAM) data.
-
If an Antigen Capture library is included, some new/updated output files are generated (described in the Understanding Outputs section):
antigen_specificity_scores.csv
(new file)per_barcode.csv
(new file)aggregate_barcodes.csv
(updated location and format)
Changes that apply to 5' Immune Profiling analysis
-
V(D)J cell calling improvement: If a Gene Expression library is present, the V(D)J cell calling algorithm does not filter out two or more clonotypes that have identical chains. This helps improve V(D)J cell calling, especially for transgenic strains. This change does not apply to V(D)J datasets in the absence of a Gene Expression library.
-
The Human V(D)J reference has been updated to exclude the following genes:
- IGHV4-30-2
- IGKV1D-33
- IGKV1D-37
- IGKV1D-39
- IGKV2D-28
These genes have counterparts with identical V, D, J, and C gene sequences, but differ in the length of their 5' UTRs. Removing duplicates improves clonotype assignment.
Updates are explained further in this Knowledge Base article: What are the major updates in Cell Ranger v7.1 that impacts V(D)J data?
Bug fixes
-
Fixed a bug where an upgrade to Illumina NovaSeq control software v1.8 (reagent name change in recipe XML file) resulted in a silent
cellranger mkfastq
error and a significant number of reads going intoUndetermined/
because the orientation of i5 (Index2) could not be autodetected. -
Improved Deplex Error message for
cellranger multi
when no valid cell multiplexing tags are detected in the Multiplexing Capture library. Common failure modes are provided to help with troubleshooting. -
Updated Fixed RNA Profiling web summary metric names and definitions for consistency.
New feature: Fixed RNA Profiling
- Cell Ranger 7.0 is required for analysis of Fixed RNA Profiling data. Instructions for running the
cellranger multi
subcommand are described in the running multi pipeline page. This includes a new option,probe-set
, to specify the probe set CSV file. Output files are described in the Understanding Outputs section. The Fixed RNA Profiling algorithms section includes descriptions of the new methods that were developed for processing Fixed RNA Profiling data.
Major updates
-
To maximize sensitivity for whole transcriptome 3’/5’ Single Cell Gene Expression and 3’ Cell Multiplexing experiments, introns will be included in the analysis by default for
cellranger count
andmulti
. There will be an informational alert in the count and multi web summaries to indicate that intronic reads were included in your analysis. While not recommended, users can exclude introns by settinginclude-introns=false
in Cell Ranger. This change does not apply to the 3’/5’ Targeted Gene Expression or Fixed RNA Profiling assays, as both target exonic sequences. Learn more. -
CRISPR Guide Capture libraries can be aggregated with
cellranger aggr
. This addition allows users to combine large CRISPR assays across multiple GEM wells. There are no changes toaggr
inputs – the presence of CRISPR libraries in themolecule_info.h5
input files enables CRISPR aggregation. Normalization is enabled by default for both Gene Expression and CRISPR libraries; changes to the normalization parameters affect both libraries. Protospacer calling is performed again on the combined data included in thecellranger aggr
run. CRISPR aggregation generates thecrispr_analysis/
folder in theouts/
directory. The structure of the crispr_analysis folder is similar to the CRISPR outputs fromcount
.
General improvements
-
Users no longer need to specify
expect-cells
forcellranger count
andmulti
pipelines due to improvements in the gene expression cell calling algorithm. The expected number of cells can either be auto-estimated (recommended) or users can still provide a reasonable estimate toexpect-cells
. -
The new
check-library-compatibility
option allows users to disable the default check for 10x Barcode overlap when multiple libraries are specified for cellranger count and multi (3' Gene Expression, 5' Immune Profiling). -
For 3’ Cell Multiplexing analysis in
cellranger multi
, users can override Cell Ranger’s cell calling and tag calling algorithms with the custom cell assignment input file specified by thebarcode-sample-assignment
option in the multi config CSV file. -
Modifications to the 3’ Cell Multiplexing CMO tag calling algorithm enable users to recover viable singlet data from “blank” assignments.
-
The following per-sample output files from
cellranger multi
have been renamed:
Cell Ranger 6.1.2 outputs | Cell Ranger 7.0 outputs |
---|---|
cloupe.cloupe | sample_cloupe.cloupe |
sample_barcodes.csv | sample_filtered_barcodes.csv |
sample_feature_bc_matrix | sample_filtered_feature_bc_matrix |
sample_feature_bc_matrix.h5 | sample_filtered_feature_bc_matrix.h5 |
- Secondary analysis outputs will be named to reflect which library they are specific to (
gene_expression_*
,antibody_capture_*
,crispr_guide_capture_*
,multiplexing_capture_*
). The secondary analysis clustering, PCA/t-SNE/UMAP, and differential gene expression outputs are supported for Gene Expression and Antibody Capture libraries, while PCA/t-SNE/UMAP outputs are supported for CRISPR Guide Capture and Cell Multiplexing libraries. For example:
└── analysis
└── pca
├── antibody_capture_10_components/
└── gene_expression_10_components/
-
The
cellranger count
web summary “Analysis” tab has been renamed to “Gene Expression”. There is an “Antibody” tab for Antibody Capture analysis, which includes a t-SNE projection plot by clustering and a histogram of antibody counts. -
The
cellranger multi
web summary (3' Gene Expression, 5' Immune Profiling) “Sample” view has been renamed to “Cells”. The “Antibody” tab includes a t-SNE projection plot by clustering. The mapping metrics, sequencing saturation plot, and median genes per cell plot are displayed on the “Library” view (previously appeared on “Sample” and “Library” view). -
Cell Ranger can now ingest FASTQs with a quality score up to the full supported range (93 instead of 41).
Bug fixes
-
Improved error messages and better handling of poorly formatted inputs in
cellranger mkref
. Enable users to generate references for analyses with large genomes containing chromosomes longer than 512 Mbp.cellranger count
andmulti
pipelines will output a.csi
BAM index file instead of .bai in these cases. -
Fixed a bug that resulted in a segmentation fault error when mapping to references that contain small contigs, for example, the rabbit genome.
-
Removed the
Inconsistent Throughput Detected
alert in web summary when it should not appear. -
Fixed a bug where
vdj
pipeline failed for specific CentOS/RHEL 7 kernels. -
Bundles the latest version of
bamtofastq
(v1.4.1) in Cell Ranger 7.0 tarball. -
Fixed a bug where
bamtofastq
failed if the R1 read length was >26bp.
Changes that apply to 5' Immune Profiling analysis
-
Support for gamma-delta libraries: The
cellranger multi
pipeline can process T cell receptor (TCR) libraries enriched for gamma (TRG) and delta (TRD) chains. 10x Genomics does not officially support TRG/D analysis with a reagent kit. Please note that, only CDR3 annotation is available for TRG/D, and the quality of annotations cannot be guaranteed. Users must specifyVDJ-T-GD
as thefeature_type
in thecellranger multi
config CSV as TRG/D chains cannot be autodetected. Aweb_summary
alert is displayed to indicate the use of an unsupported workflow. No TRG/D analysis is available via thecellranger vdj
pipeline. -
V(D)J Reference updated: The recommended V(D)J reference packages for human and mouse have been updated from version 5.0 to 7.0. The changes to the V(D)J reference sequences are listed below:
HUMAN:
-
Added human IGHV3-9
-
For two genes that are identical except for extra bases on the 3' end, only the longer version was retained. List of affected genes:
IGHA1 ENST00000390547 IGHD ENST00000390556 IGHD ENST00000390556 IGHD1-1
ENST00000454908 IGHD1-14 ENST00000451044 IGHD1-20 ENST00000450276 IGHD1-26
ENST00000390567 IGHD1-7 ENST00000430425 IGHD1/OR15-1A ENST00000605284 IGHD2-15
ENST00000390578 IGHD2-2 ENST00000390591 IGHD2-21 ENST00000390572 IGHD2-8
ENST00000390585 IGHD2/OR15-2A ENST00000603077 IGHD3-10 ENST00000390583
IGHD3-16 ENST00000390577 IGHD3-22 ENST00000390571 IGHD3-3 ENST00000390590
IGHD3-9 ENST00000390584 IGHD3/OR15-3A ENST00000604950 IGHD4-11 ENST00000431440
IGHD4-17 ENST00000431870 IGHD4-23 ENST00000437320 IGHD4/OR15-4A
ENST00000603326 IGHD5-12 ENST00000390581 IGHD5-18 ENST00000390575 IGHD5-24
ENST00000390569 IGHD5/OR15-5A ENST00000604642 IGHD6-13 ENST00000390580
IGHD6-19 ENST00000390574 IGHD6-25 ENST00000452198 IGHD6-6 ENST00000454691
IGHD7-27 ENST00000439842 IGHG1 ENST00000390542 IGHG1 ENST00000390548 IGHG1
ENST00000390549 IGHG2 ENST00000390545 IGHG3 ENST00000390551 IGHG4
ENST00000390543 IGHJ1 ENST00000390565 IGHM ENST00000390559 IGHV1-18
ENST00000390605 IGHV1-2 ENST00000390594 IGHV1-24 ENST00000390610 IGHV1-3
ENST00000390595 IGHV1-45 ENST00000390621 IGHV1-46 ENST00000390622 IGHV1-58
ENST00000390628 IGHV1-69 ENST00000390633 IGHV1-69-2 ENST00000615784 IGHV2-26
ENST00000390611 IGHV2-5 ENST00000390597 IGHV2-70D ENST00000390634 IGHV3-11
ENST00000390601 IGHV3-13 ENST00000390602 IGHV3-15 ENST00000390603 IGHV3-16
ENST00000390604 IGHV3-20 ENST00000390606 IGHV3-21 ENST00000390607 IGHV3-23
ENST00000390609 IGHV3-30 ENST00000603660 IGHV3-35 ENST00000390617 IGHV3-38
ENST00000390618 IGHV3-43 ENST00000434710 IGHV3-48 ENST00000390624 IGHV3-49
ENST00000390625 IGHV3-53 ENST00000390627 IGHV3-64 ENST00000454421 IGHV3-66
ENST00000390632 IGHV3-7 ENST00000390598 IGHV3-72 ENST00000433072 IGHV3-73
ENST00000390636 IGHV3-74 ENST00000424969 IGHV4-28 ENST00000390612 IGHV4-34
ENST00000390616 IGHV4-39 ENST00000390619 IGHV4-4 ENST00000455737 IGHV4-59
ENST00000390629 IGHV4-61 ENST00000390630 IGHV5-51 ENST00000390626 IGHV6-1
ENST00000390593 IGKV1-12 ENST00000480492 IGKV1-16 ENST00000479981 IGKV1-17
ENST00000490686 IGKV1-27 ENST00000498435 IGKV1-33 ENST00000473726 IGKV1-37
ENST00000465170 IGKV1-39 ENST00000498574 IGKV1-5 ENST00000496168 IGKV1-6
ENST00000464162 IGKV1-8 ENST00000495489 IGKV1-9 ENST00000493819 IGKV2-24
ENST00000484817 IGKV2-28 ENST00000482769 IGKV2-30 ENST00000468494 IGKV3-11
ENST00000483158 IGKV3-15 ENST00000390252 IGKV3-20 ENST00000492167 IGKV3-7
ENST00000390247 IGKV3D-7 ENST00000443397 IGKV5-2 ENST00000390244 IGKV6-21
ENST00000390256 IGLV1-36 ENST00000390301 IGLV1-40 ENST00000390299 IGLV1-44
ENST00000628287 IGLV2-33 ENST00000390302 IGLV3-32 ENST00000390303 IGLV5-37
ENST00000390300 IGLV7-43 ENST00000390298 TRBD1 ENST00000631435 TRBJ1-1
ENST00000634213 TRBJ1-2 ENST00000631745 TRBJ1-3 ENST00000633780 TRBJ1-4
ENST00000632041 TRBJ1-5 ENST00000634000 TRBJ2-1 ENST00000390412 TRBJ2-2
ENST00000390413 TRBJ2-2P ENST00000390414 TRBJ2-3 ENST00000390415 TRBJ2-4
ENST00000390416 TRBJ2-5 ENST00000390417 TRBJ2-6 ENST00000390418 TRBV10-1
ENST00000390364 TRBV11-1 ENST00000390367 TRBV11-3 ENST00000611787 TRBV12-3
ENST00000620569 TRBV13 ENST00000614171 TRBV14 ENST00000617639 TRBV15
ENST00000616518 TRBV16 ENST00000620773 TRBV23-1 ENST00000390396 TRBV27
ENST00000390399 TRBV28 ENST00000390400 TRBV29-1 ENST00000422143 TRBV3-1
ENST00000390387 TRBV4-2 ENST00000390392 TRBV5-1 ENST00000390381 TRBV5-6
ENST00000390375 TRBV5-7 ENST00000390378 TRBV6-1 ENST00000390353 TRBV6-5
ENST00000390368 TRBV7-1 ENST00000547918 TRBV7-7 ENST00000390377 TRGJ1
ENST00000390337
MOUSE
- Added missing mouse TRGV and TRGC genes
TRGC1 ENSMUST00000103558 TRGC2 ENSMUST00000103561 TRGC3 ENSMUST00000198163
TRGC4 ENSMUST00000179181 TRGV1 ENSMUST00000103564 TRGV3 ENSMUST00000198663
TRGV4 ENSMUST00000103554 TRGV5 ENSMUST00000199017 TRGV6 ENSMUST00000198330
TRGV7 ENSMUST00000103553
- For two genes that are identical except for extra bases on the 3' end, only the longer version was retained. List of affected genes:
IGHD2-5 ENSMUST00000178549 IGHD5-2 ENSMUST00000179166 TRAV11D
ENSMUST00000103648 TRAV12D-1 ENSMUST00000181360 TRAV12D-2 ENSMUST00000197007
TRAV13D-2 ENSMUST00000197954 TRAV14D-1 ENSMUST00000181038 TRAV14D-2
ENSMUST00000196802 TRAV15D-2-DV6D-2 ENSMUST00000199800 TRAV3D-3
ENSMUST00000196023 TRAV4D-3 ENSMUST00000103592 TRAV4D-4 ENSMUST00000103600
TRAV5D-4 ENSMUST00000179701 TRAV6-6 ENSMUST00000103584 TRAV7-2
ENSMUST00000103636 TRAV7D-5 ENSMUST00000197128 TRAV9D-2 ENSMUST00000199746
TRAV9D-3 ENSMUST00000178252
-
V(D)J web summary: In the
web_summary.html
file produced bycellranger vdj
, the Analysis tab has been renamed to VDJ. -
The default for the
fiveprime_multiplexing
parameter in thecellranger-7.0.0/lib/bin/parameters.toml
file has changed toTrue
.
Bug fixes
-
Improved handling of memory requests for large genomes.
-
Fixes an issue in the
molecule_info.h5
file, where if a species was not present in a barnyard run, it was omitted from the genome information. -
Reduces the size of the bundled reference included in
cellranger testrun
. -
Adds additional guidance in the web summary when there is a low fraction of targeted genes enriched.
-
Fixes a metric issue where aggregate antibodies could be double-counted.
-
Unsets additional sysconfig environment variables prior to pipeline execution, which may otherwise interfere with the pipeline conda environment.
Bug Fixes
- Fix an issue where
cellranger vdj
could fail if executed by a user without a home directory.
New Feature: High Throughput (HT) for Chromium X
-
Cell Ranger 6.1 introduces support for the 3' and 5' High Throughput (HT) kits with 16 channels per chip, allowing users to process 2,000-20,000 cells per channel (3' and 5') or 2,000-60,000 cells per channel with CellPlex (3' only). HT kits are only compatible with Chromium X, which is backwards compatible with all 10x Genomics dual indexed assays. For more information see What is HT?
-
Cell Ranger 6.1 includes a new throughput detection algorithm to detect HT samples in 3' CellPlex data as described in the CellPlex algorithms page. In the event that chemistry detection fails, it can be overridden with the option (e.g.,
--chemistry=SC3Pv3HT
) incellranger count
, or in thecellranger multi
CSV file to detect HT samples when 3' CellPlex libraries are run. -
Minor changes to web_summary.html include a new alert when the user specifies
--chemistry=SC3Pv3HT
, but Cell Ranger detects otherwise. HT will be appended to the detected chemistry if the pipestance was a multiplexing run. -
Note that with HT chemistry and 3' CellPlex it is now possible to run 60,000 cells per GEM well, and aggregate million-cell datasets. Larger datasets will require additional memory beyond our stated minimum requirement of 64 GB. See the 3' system requirements page for more details and time trial data.
General Improvements
-
Numerous performance optimizations have been made, especially for pipeline stages that iterated over
molecule_info.h5
files, such ascellranger aggr
, but also memory allocation improvements. We have seen up to 2-3x speed improvements forcellranger aggr
. -
Changed certain parameters in the cell calling algorithm for improved results. Changed the Empty Drops stage for multispecies experiments to call cells using only the UMI counts for each species separately.
-
Raw feature barcode matrices are no longer output by
cellranger aggr
. It is no longer possible to specify--force-cells
of an aggr output incellranger reanalyze
with more cells than were originally called. -
The secondary analysis implementation is now shared with Loupe Browser's Filtering and Reclustering Wizard. These changes improve the performance of most stages, in either time (t-SNE) or memory (PCA). There will be changes in outputs compared to previous versions, reflecting either slight variations in outputs (PCA, t-SNE), or as if a different randomized seed had been chosen (graph clustering, UMAP).
-
Starting from Cell Ranger 6.1, antibody histograms of UMI counts are shown on Library tab of the
web_summary.html
, and protein aggregate barcodes are provided asaggregate_barcodes.csv
. These are meant to help feature barcoding users diagnose issues of aggregating antibodies on cell surface proteins. -
Fixed a bug that caused Cell Ranger v4 and higher to ignore user-supplied parameters to
--nthreads
, defaulting to 1. Parallelization has been re-enabled in Cell Ranger 6.1.
Deprecating OS
- The recommended operating systems for Cell Ranger v6.1 are CentOS 7 or Ubuntu 14 Linux variants or newer. CentOS 6 and Ubuntu 12 are still supported but have been deprecated (unsupported for future Cell Ranger releases). Support may be dropped in future versions. See the OS support page for more details.
Bug fixes
-
Fixes additional issues with file copying on BeeGFS filesystems.
-
cellranger multi
: Adds an optionalmin-assignment-confidence
in the config CSV to allow adjustment of the Cell Multiplexing minimum assignment confidence threshold (default: 0.9). Decreasing the threshold will likely increase the number of singlets assigned to samples, but at the cost of potentially increasing the rate of mis-assignment. -
Adds a warning to the
cellranger multi
web_summary.html
if contaminant tags are detected in Cell Multiplexing experiments.
General improvements
-
The fetch-imgt script, to build an IMGT-compatible custom reference for Single Cell Immune Profiling data analysis, has been updated to be compatible with Python 3.
-
Cell Multiplexing analysis has been updated to be more memory-efficient.
Bug fixes
-
The [sample] section of the configuration CSV file is now required for Cell Multiplexing analysis.
-
Fixes an issue where
cellranger multi
would only accept a single VDJ library. -
Fixes an issue where
cellranger vdj
preflight would fail if custom primers were passed in. -
The
sample_id
information in Cell Ranger 6 aggr runs are now correctly propagated to Loupe Browser. -
Fixes an issue where
.vloupe
files fail to generate on some filesystems and operating systems. -
Fixes an issue in cluster mode where the pipeline could fail to correctly identify which jobs were still queued.
-
Fixes an issue where including an
aggr
csv inreanalyze
would cause the pipeline to exit. -
The "Number of reads for Custom Feature by Physical library ID" in the
multi
web summary and metrics summary is now rendered properly. -
Fixes an issue with file copying on BeeGFS filesystems.
New Feature: Cell Multiplexing
-
Cell Ranger 6.0 now supports analysis of Cell Multiplexing data for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Instructions for running the
cellranger multi
subcommand are described in the running multi page. A new Getting Started Tutorial is also available. The Cell Multiplexing algorithms include a new method to call singlets, multiplets, and empty drops. The output file structure has also changed to accommodate multiple samples multiplexed in a single GEM well. -
The aggr subcommand now supports analysis of
cellranger multi
outputs for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Further details are described in the running aggr page.
New Feature: LT (Low Throughput) support
- Cell Ranger 6.0 supports the analysis of data from 3' Gene Expression and Feature Barcode (Cell Surface Protein) LT (Low Throughput) kits.
Changes that apply to Gene Expression and Feature Barcode analysis
-
The column names for the
Aggregation CSV
file required by theaggr
sub-command have changed:library_id
has been changed tosample_id
andlibrary_outs
has been changed tosample_outs
. Further details are described in the running aggr page. -
The
molecule_info.h5
and unfiltered feature-barcode matrix files (raw_feature_bc_matrix
in H5 and MEX formats) will only contain barcodes with at least one read, rather than all barcodes in the whitelist. -
The change to the unfiltered feature-barcode matrix summarized in (4) above results in a subtle change to the distribution of UMI counts amongst background, i.e. non-cell barcodes, which results in minor changes to the results of the cell calling algorithm. This change occurs due to the second step that identifies non-ambient cell-barcodes as described in the algorithms page.
-
Cell Ranger 6.0 is the first Cell Ranger release to use Python 3.
Bug fixes and deprecations
-
A bug has been fixed in the graph-based clustering output: previously, in a sample with K clusters, the first K cell-associated barcodes (ordered as in the filtered feature-barcode matrix) may have been assigned incorrect cluster labels. This change does not affect the number of clusters output.
-
A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.
-
The
--qc
option has been deprecated fromcellranger mkfastq
. -
A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.
Changes that apply to 5' Immune Profiling analysis
In Cell Ranger 6.0, the following changes apply to joint analysis of Immune Profiling, Gene Expression, and Feature Barcode data with the multi
sub-command:
-
The structure of the
outs/
folder has been updated, as described in runningcellranger multi
. -
When running the
cellranger aggr
subcommand on samples that have Immune Profiling, Gene Expression, and/or Feature Barcode data analyzed with multi, thesample_outs
field now contains the path to the outputs for that sample (e.g.outs/per_sample_outs/sample_x
). Further details are described in running aggr.
Cell Ranger 6.0 also introduces some improvements and bug fixes related to the clonotype inference algorithms:
-
There are subtle changes to clonotyping heuristics that have little effect on overall behavior, but recover a small number of joins that were previously missed and might be critical for a particular experiment. These changes are described in terms of technical parameters to the algorithm, specifically raising the default for
MAX_DIFFS
from 50 to 55 and raising the default forMAX_CDR3_DIFFS
from 10 to 15. There were also compensatory changes to prevent the rate of false positive joins from increasing: the default forMAX_DEGRADATION
was lowered from 3 to 2, and the default forMAX_SCORE
was lowered from 1,000,000 to 500,000. For more details, visit enclone help. -
Single-chain clonotypes are now more likely to be merged with two-chain and three-chain clonotypes. This causes significantly more clonotypes to have single-chain exact subclonotypes.
-
Fixed a bug that caused failures on some very short (defective) V gene reference sequences.
-
The algorithm for deciding to use a donor reference allele now checks all donor reference alleles for all V genes having the same name as the one originally assigned to a contig. For more details, visit enclone help.
-
A doublet test has been added. This removes some exact subclonotypes that appear to represent doublets. Details are documented on the enclone pages. The typical effect is to remove some three-chain and four-chain clonotypes, with the fraction removed depending on the emperical doublet rate. In some cases, large, complex clonotypes are accurately split into multiple smaller clonotypes by this change.
-
There is no longer a restriction on the length of CDR3 sequences (previously maximum 27).
-
The Immune Profiling output file
all_contig_annotations.csv
contains new fieldsfwr1, ..., fwr4
andcdr1
,cdr2
, providing the amino acid sequences of framework and complementarity-determining regions (in addition tocdr3
, which was already present). The definitions used to define these regions are provided in the enclone features page. The corresponding nucleotide sequences are provided (e.g.fwr1_nt
). These fields are also provided in the fileconsensus_annotations.csv
, as are nucleotide start and end positions (e.g.fwr1_start
). -
The Immune Profiling output file
all_contig_annotations.csv
contains new fieldexact_subclonotype_id
providing the exact subclonotype ID to which the cell barcode was assigned. Details about exact subclonotypes can be found on the clonotype grouping page. -
The
--qc
option has been deprecated fromcellranger mkfastq
.
Bug fixes
-
Fixes an issue in aggr where files would fail to be copied on NFSv4 File Systems.
-
Fixes an issue in multi where r1-length and r2-length settings were ignored for
vdj
.
Changes that apply to Gene Expression and Feature Barcode analysis
-
Cell Ranger v5.0 introduces a
--no-bam
option that disables the generation of aligned BAMs for gene expression and feature barcode datasets. If you have no need for these files, then disabling their generation can significantly speed up the pipeline. -
Cell Ranger v5.0 introduces upgraded protein aggregation detection and filtering algorithm. By directly using the protein counts, more aggregate GEMs are detected and filtered out before proceeding with cell calling.
-
Cell Ranger v5.0 introduces an
--include-introns
option for counting intronic reads using 3’ Gene Expression and 5’ Gene Expression products. The usage of pre-mRNA references for counting intronic reads is now deprecated.- The
--include-introns
option, introduced in Cell Ranger 5.0, works by aligning reads to a normal reference transcriptome with STAR. After alignment, the reads mapping to introns are annotated and counted similarly to reads that are aligned to exons. Previously, the pre-mRNA reference strategy implemented with Cell Ranger 4.0 and earlier involves alignment to a modified reference transcriptome that categorizes intronic regions as exonic. There are slight differences in read alignments produced by the STAR aligner when a pre-mRNA reference is used compared to a normal reference using--include-introns
. These differences result in small overall differences in counted UMIs for intron-mode compared to pre-mRNA-reference.
- The
-
Ported a fix from upstream
IRLBA
that fixes incorrect behavior in rare circumstances. -
On some Linux distributions, NFS implementations would surface an improper error during file copy. We have implemented a workaround for our affected native code.
Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis
-
Cell Ranger 5.0 introduces the multi pipeline that can simultaneously process any combination of 5' Gene Expression, Feature Barcode (cell surface protein or antigen) and V(D)J libraries from a single GEM well. The multi pipeline uses the cell calls provided by the gene expression data to improve the cell calls inferred by the V(D)J library.
-
A new metric, “Number of Short Reads Skipped”, is added to the web summary, indicating the total number of read pairs that were ignored by the pipeline because they do not satisfy the minimum length requirements.
Changes that apply to V(D)J analysis
-
Cell Ranger v5.0 introduces a new clonotype grouping algorithm that computationally approximates groups of cells which are descendants of a single, fully rearranged common ancestor and infers the germline sequence of the V genes from each individual in the dataset. In previous versions (4.0 and earlier), the algorithm grouped cells based only on the set of productive CDR3 nucleotide sequences. As a consequence, whenever a true clonotype had a CDR3 mutation, the true exact subclonotypes were presented by the algorithm as multiple separate clonotypes. The previous approach to clonotyping in Cell Ranger 4.0 and earlier led to inaccuracies in the B cell clonotypes due to the grouping by unique CDR3 sequence. Additionally, single-chain clonotypes were reported as separate clonotypes, which could lead to both over- and under-estimation of the size of a given clonotype. The new clonotyping algorithm is improved in specificity, sensitivity, and overall accuracy because it accounts for mutations found in the V(D)J transcript and in the V(D)J junction. It also merges single chain clonotypes into the correct fully-paired clonotypes for both T cells and B cells. Additional cell filters are also imposed during clonotyping to improve data quality.
-
Changes to V(D)J outputs:
-
The following output files are removed in 5.0:
consensus.fastq
andconsensus_annotations.json
-
The following output files are added in 5.0: - Contig info binary file, which would be used as an input to aggregate V(D)J samples - Donor reference fasta
-
Two new columns are added to the clonotypes.csv file that displays the iNKT/MAIT evidence.
-
The files
filtered_contig_annotations.csv
,filtered_contig.fasta
,filtered_contig.fastq
now only contain data from the contigs in cell barcodes that are productive. -
A number of new fields are added to
consensus_annotations.csv
:v_start
,v_end
,v_end_ref
,j_start
,j_start_ref
,j_end
,cdr3_start
,cdr3_end
-
-
The recommended V(D)J reference packages for human and mouse have been updated from v4.0-5.0. The changes to the V(D)J reference sequences are listed below:
HUMAN:
- Replace IGKV2D-40, whose leader sequence appears to be truncated.
- Delete IGKV2-18, which is probably a pseudogene.
- Delete IGLV5-48, which is truncated on the right.
- Delete TRBV21-1, which has multiple frameshifts.
- Add IGHV4-30-4, which was missing.
- Add IGKV1-NL1, which was missing.
- Add IGHV4-38-2, which was missing.
MOUSE:
- Delete TRAV23, which is frame-shifted.
- Delete the first base of the constant region gene IGHG2B.
- Make a six-base insertion in IGKV12-89, based on empirical data.
- Correct IGHV8-9, whose amino acid sequence showed the canonical C at the end of FWR3 as S. This is consistent with 10x data.
- Add an allele of IGKV2-109, which was missing.
- Add IGKV4-56, which was missing.
- Add IGHV1-2, which was missing.
-
cellranger aggr
now aggregates V(D)J data, allowing users to recompute V(D)J clonotype groupings across the combined data. -
Soft deprecation of
--force-cells
incellranger vdj
:-
Since Cell Ranger 3.1, due to filters in the VDJ assembler,
--force-cells
in VDJ pipelines did not behave as users would expect it to behave. Users can only apply--force-cells
to the number of barcodes passing the combined filters in the assembler. -
This makes it effectively impossible for users to increase the number of recovered cells. Rather, it is only possible to reduce the number of recovered cells using
--force-cells
in this context, unlike the behavior in thecellranger count
pipeline. -
Because this specific flag is likely to be misunderstood by users, and is also not highly requested, we are starting to deprecate it. In Cell Ranger 5.0,
--force-cells
will be available only as an undocumented silent option. This will also allow users who are using this routinely in their workflows to anticipate eventual deprecation.
-
Changes that apply to Gene Expression and Feature Barcode analysis
-
Targeted Gene Expression analysis is available in Cell Ranger 4.0 and is invoked by specifying the
--target-panel
option when running the cellranger count command. -
Cell Ranger 4.0 introduces the new
targeted-compare
pipeline for direct comparative analysis of matched parent Whole Transcriptome Amplification (WTA) and Targeted Gene Expression datasets. -
Cell Ranger 4.0 includes the new
targeted-depth
subcommand to estimate sequencing depths appropriate for Targeted Gene Expression experiments based on input WTA results and an associated target panel file. -
Recommended reference packages for human and mouse have been updated from version 3.0.0 to 2020-A:
-
Transcriptome annotations updated from Ensembl 93 to GENCODE v32 (human) and vM23 (mouse), which are equivalent to Ensembl 98.
-
GRCh38 and mm10 sequences are not changed; chromosome names now follow the GENCODE/UCSC convention (e.g.,
chr1
andchrM
) rather than the Ensembl convention (1
andMT
). -
Additional filtering removes genes with unreliable annotations that often overlap more legitimate genes (see build scripts for details), resulting in improved overall sensitivity. 2020-A reference packages are backwards compatible with Cell Ranger v3.1.0 and prior.
-
Mapping rates and gene/UMI sensitivity are increased due to more comprehensive annotations and improved manual curation of genes:
- When analyzing 3’ Gene Expression data, Cell Ranger 4.0 trims the template switch oligo (TSO) sequence from the 5’ end of Read-2 and the poly-A sequence from the 3’ end before aligning reads to the reference transcriptome. This behavior is different from Cell Ranger 3.1, which does not perform any trimming.
A full length cDNA molecule is normally flanked by the 30-bp TSO sequence, AAGCAGTGGTATCAACGCAGAGTACATGGG
, at the 5' end and the poly-A sequence at the 3' end. Some fraction of sequencing reads are expected to contain either or both of these sequences, depending on the fragment size distribution of the library. Reads derived from short RNA molecules are more likely to contain either or both TSO and poly-A sequence than longer RNA molecules.
Trimming results in better alignment, with the fraction of reads mapped to a gene increasing by up to 1.5%, because the presence of non-template sequence in the form of either TSO or poly-A confounds read mapping. Trimming improves the sensitivity of the assay as well as the computational efficiency of the pipeline. Tags ts:i
and pa:i
in the output BAM files indicate the number of TSO nucleotides trimmed from the 5' end of Read-2 and the number of poly-A nucleotides trimmed from the 3' end. The trimmed bases are present in the sequence of the BAM record and are soft clipped in the CIGAR string.
Below, we illustrate how the fraction of reads mapped confidently to the transcriptome varies for both trimmed and untrimmed alignment as a function of read-length for a variety of sample types .
-
Cell Ranger 4.0 adds support for an “un-tethered” Feature Barcode pattern, (BC) without an anchor, specified in the Feature Reference CSV. This option allows the user to specify the sequence of the Feature Barcode without specifying a particular location on the read where the sequence is expected to be found.
-
cellranger reanalyze
now outputs the count matrix used in the analysis, so as to reflect any subsetting of barcodes used. -
Bug fixes for GTF files output by
mkref
. These changes do not affect the pipeline results.- GTF attributes with duplicate keys (e.g., tag
"value1"
;tag "value2"
;) are handled correctly. Previously, only the last such attribute was kept. - GTF attributes with unquoted integer values (e.g.,
exon_number 1
;) are kept. Previously, they were removed. - GTF lines end with semicolons.
- Unix line endings are used rather than DOS line endings, consistent with other Cell Ranger outputs.
- GTF attributes with duplicate keys (e.g., tag
-
Bug fixes for the BAM file
- The duplicate flag (
0x400
) is set correctly in the secondary alignments (flag0x100
) of PCR duplicate reads and low-support UMI reads (xf:i:2
) - Low-support UMI reads (
xf:i:2
) have the corrected barcode in UB:Z. Previously, it contained the raw barcode.
- The duplicate flag (
-
BAM file changes
- Cell Ranger v4.0 will not output the
li:i
tag. TheRG:Z
tag contains this information. - Cell Ranger v4.0 will not output the
BC:Z
andQT:Z
tags.
- Cell Ranger v4.0 will not output the
-
Cell Ranger v4.0 now relies on Orbit to perform transcriptome alignment, which leverages a modified STAR v2.7.2a. These modifications provide compatibility with “versionGenome 20201” references, such as those generated by STAR v2.5.1b. In Cell Ranger 4.0 we still provide and use STAR v2.5.1b for other purposes such as
cellranger mkref
. In our testing we did not note any differences in transcriptome alignments between the STAR shipped in Cell Ranger 3.1 (STAR v2.5.1b), STAR v2.7.2a, or Orbit. -
mkfastq
now accepts file names without lane number, e.g.,sample1_S1_R1_001.fastq.gz
. -
Cell Ranger's
aggr
pipeline no longer supports the aggregation of v1mol_info.h5
files.
Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis
-
mkfastq
supports dual-indexed libraries for gene expression, both WTA and Targeted, V(D)J, and Feature Barcode datasets. -
mkfastq
supports a new sequencing configuration for Novaseq where the I2 index may need to be reverse-complemented before demultiplexing dual-indexed libraries. -
mkfastq
now accepts file names without lane number, e.g., sample1_S1_R1_001.fastq.gz. -
count
andvdj
run approximately two to four times faster than in Cell Ranger v3.1, depending on the sequencing data, and reduces disk I/O by half. -
A new command-line interface with improved error-handling has been engineered into Cell Ranger v4.0.
-
The Martian pipeline framework has been upgraded to v4.0.
mrp
andmrjob
will shut down if they detect that their log files were deleted or renamed. See the Martian release notes for more details. -
The following features present in Cell Ranger v3.1 are no longer present in Cell Ranger 4.0:
mkfastq
no longer supports data from the Single Cell 3′ v1 chemistry.- The
cellranger demux
subcommand has been removed. - The command-line interface does not accept FASTQs created by the deprecated cellranger demux pipeline. If you need to process FASTQs in this layout, contact [email protected] for assistance.
cellranger count
andcellranger vdj
are no longer able to process data from multiple gem-wells through manual editing of MRO files. The Single Cell 3′ v1 and Single Cell 5′-R1 assay configurations will no longer be autodetected in Cell Ranger 4.0. Users who want to analyze data from those chemistries must explicitly specify the chemistry (SC3Pv1
orSC5P-R1
respectively) using the--chemistry
argument.- The
--id
argument used by the pipelines has a 64 character limit in Cell Ranger 4.0.
-
The
--id
argument used by the pipelines has a 64 character limit in Cell Ranger 4.0.
Changes that apply to V(D)J analysis
-
Recommended VDJ reference packages for human and mouse have been updated from version 3.1.0 to 4.0.0. The changes to the VDJ reference sequences are listed below:
- Remove the first base of the C region in certain cases. In these cases we observe that in most transcripts, the J region and C region overlap by exactly one base.
- Add an allele of the gene IGHJ6 to the human VDJ reference.
-
Bug fix in contig annotation: If a reference D region matches a contig perfectly, annotate the contig with that D region.
-
The command line argument
--chain
is added back in 4.0 for rare cases when the automatic chain detection fails. -
A new output
airr_rearrangement.tsv
is added, which contains annotated contigs of VDJ rearrangements in the AIRR TSV format. -
The VDJ reference is copied to the outputs folder starting with Cell Ranger v4.0.
-
Feature Barcoding Only Analysis - It is now possible to run
cellranger count
using Cell Surface Protein (antibody captured) libraries without a GEX library. The previous version of Cell Ranger required a Gene Expression library along with a library generated by Feature Barcoding technology. However, the new version of Cell Ranger provides customers with flexibility to sequence either one of the libraries, or both. In particular, cell calling now works with antibody counts only, and all secondary analyses (PCA, t-SNE, UMAP, clusterings) work with antibody-only count matrix as well. More details are available on the Feature Barcoding Only Analysis page. -
UMAP based lower dimensionality projections of datasets analyzed by
cellranger count
are now produced in addition to the previously produced t-SNE projections. The projections are made available both as CSV files and as data that can be directly viewed in Loupe Browser. The parameters for the projection can also be modified and experimented with using cellranger reanalyze. This alternate visualization method has become increasingly popular for visualizing single cell data since the earliest report that used it. For more details, see the description in the algorithms overview section. -
New Web Summary Look - The Cell Ranger
web_summary.html
file has been updated to match the styles and formats of other 10x products. Compared to the old version users will notice new fonts and some aesthetic changes in the new version. -
Bug Fix: If equal numbers of reads with given Barcode / UMI combination map to two genes, the assignment of the Barcode / UMI are now considered ambiguous and not reported in moleculeinfo.h5 or the count matrix. Previously they were reported _twice, once for each gene.
-
Other minor bug fixes
Release Notes for Martian 3.2.3: Job Scheduling
-
Fix a crash in cases where the
mrp
binary becomes unavailable on disk during a pipestance run. -
In addition to logging the type of filesystem for the pipestance directory, mrp will also log the type of filesystem for the martian bin directory (which is often different from the pipestance directory), and also the mount options for both directories.
-
Regardless of
--jobinterval
setting,mrp
will now never attempt to submit more than one job at a time to the queue in cluster mode. -
mrp
will now shut down if the pipestance log file has been deleted, even if a new one has been created in its place. This prevents problems in the case where the pipestance directory (including the log and lock files) have been deleted. -
Memory cgroups limits are now detected, reported, and used as default limits where applicable. This should be especially helpful for users submitting
mrp
to a cluster such as SLURM which uses memory cgroups to prevent jobs from using too much memory, by preventingmrp
from trying to use more than the job's allowance. -
Other small bug fixes and performance improvements.
V(D)J Release Notes
Major algorithm changes and effects on performance
-
The assembly, annotation and cell calling algorithms have all been replaced, as have the reference sequences. However with noted exceptions, the interface is unchanged.
-
Many changes were made to the assembly algorithm that allow it to achieve the same sensitivity using less data. After these changes, the recommended sequencing configuration was changed to 26 x 91 (from 150 x 150), while leaving the number of read pairs per cell fixed at 5000. This enables V(D)J, Gene Expression and Feature Barcoding libraries to be sequenced in a single run, thereby simplifying the workflow.
-
The effect of the new changes varies considerably from sample to sample and we have added a discussion on Experimental Design that explains some of this. In some instances the number of productive pairs increases markedly if the same dataset is rerun with the new code.
-
The old read configuration 150 x 150 is still supported and may be preferable for some users, because of pricing or availability, particularly for users who are running only V(D)J data. For 150 x 150, the recommended depth is proportionally lower, 2000 read pairs per cell.
-
Many corrections were made to the Prebuilt reference sequences.
-
Contig annotation has been improved in several ways. This includes more accurate detection of CDR3 regions, a more stringent full-length requirement, and a requirement that V segments begin with a start codon (coupled to reference sequence corrections). This could affect annotation for species other than human or mouse, having incomplete reference sequences.
-
A productive pair is no longer declared in cases where there are three or more contigs having the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells.
-
Some new large clones are now reported, that were missed previously for a variety of reasons, including failure to align J segments having high somatic hypermutation.
-
A productive pair is no longer declared in cases where three or more contigs share the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells. In addition, certain clonal expansions of plasma cells are now contracted because the expansion represents mRNA leakage during processing, rather than a true biological expansion. Finally, requirements for small clones sharing a chain with a large clone have been tightened to reduce the likelihood of false clones arising from ambient mRNA or doublets. All of these changes correctly reduce the number of reported productive pairs (usually by a small fraction).
-
Because of these changes, we recommend that customers rerun existing datasets using Cell Ranger 3.1 if possible.
-
Because cell calling is changed, the denominator used for computing the Cells With Productive V-J Spanning Pair metric may have changed. For this reason, differences in performance between Cell Ranger 3.0 and 3.1 are better assessed using the Number of Cells With Productive V-J Spanning Pair metric.
-
Cell Ranger 3.1 is significantly faster. There are five fewer stages in the pipeline.
Interface Changes:
-
Cell Count Confidence is no longer reported because we found that in some cases incorrect counts were reported with high confidence. Cell counting from V(D)J data alone is limited in accuracy because targeted cells having sufficiently low expression cannot be detected.
-
Contigs Unannotated is no longer reported because all contigs are now annotated. The justification for this is that since enrichment uses primers binding to constant regions, bona fide contigs would be expected to have at least a C annotation.
-
For species other than human or mouse, for which custom primers are needed, the sequences of the inner enrichment primers must now be supplied as a command-line argument.
Job Scheduling Changes
- Add support for SGE and LSF clusters that track virtual memory use.
Enable Analysis of CITE-seq Experiments
-
Cell Ranger can now process data from experiments where the antibodies were conjugated to oligonucleotides that were captured by oligo-dT primers. Previously, only experiments which used the Chromium Single Cell 3' Feature Barcode Library Kit, which utilizes a different capture sequence for Gene Expression and Feature Barcoding data, could be analyzed.
-
Please note that while Cell Ranger is now compatible with CITE-seq data, CITE-seq is not a supported application. To ensure full support for your 10x data analysis please visit the Feature Barcode Analysis page to see the supported Feature Barcoding technology.
Bug fixes
-
Fix an issue where STAR would crash on CPUs without AVX support.
-
Fix a determinism issue when aggregating 3' v2 and v3 data.
-
Increase the memory reservation for the SORT_BY_POS stage.
General
- Cell Ranger has been overhauled to support user-defined Feature Barcoding reagents, and to quantify these features alongside standard gene-expression reads. See Feature Barcoding for details. For users who have already run their data through earlier versions, there is no need to rerun it again using this new version.
Cell Calling Changes
-
Cell Ranger 3.0 implements a version of the EmptyDrops cell calling algorithm that will call more low RNA content cells, especially when they are mixed with a population of high RNA content cells. See Cell Calling Algorithms for details.
-
The cell calling 'knee-plot' in the web summary now indicates what fraction of barcodes in each segment of the curve were called as cells, since the new cell calling algorithm no longer makes a hard threshold on UMI counts.
Output File Format Changes
-
The file formats of the gene-barcode matrix (now called the feature-barcode matrix) have changed to accommodate Feature Barcoding results.
-
The mtx and barcodes.tsv files are now gzipped to save disk space The genes.tsv file has been renamed features.tsv.gz, and contains extra columns indicating the
feature_type
of each gene / feature. -
See Feature-Barcode Matrices for details.
-
As part of this change,
cellranger-rkit
is deprecated. We recommend Seurat for analysis in R. -
The Molecule info file format has been substantially changed to enable output from the new Feature Barcoding technology and remove rarely used mapping metrics.
Cell Ranger 2.2.0 will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.
-
Fix Martian UI display in FireFox
-
Fix non-integral resource requests (memory/threads)
-
Fix SUBSAMPLE_READS producing wrong metric names. Newer version of Martian no longer casts zero-fractional floats to ints, which this code was relying on to produce metric names with integral subsampling rates in them.
-
Fix failure to detect whitelist with demux when a single Sample Index is bad
-
Fix always-on multi-chromosome transcript warning in
mkref
-
Fix stall in ALIGN_READS on filesystems that don't support named pipes
-
Fix python error when autodetect of chemistry fails with multiple FASTQ paths
-
Fix handling of sample names with multiple underscores in
mkfastq
pipeline -
Fix suppression of process limit errors in the
mkfastq
QC stage
Changes to mkfastq
-
Barcode-aware QC stage is now opt-in via the
--qc
flag. -
Limit total CPU usage across stages to 12 cores unless
--localcores
is specified. This should improve reliability on machines with high numbers of cores.
Cell Ranger 2.1.1 Gene Expression
Note: This is expected to be the last version of Cell Ranger to support CentOS/RedHat 5 and Ubuntu 10. If you are using one of those operating systems, Cell Ranger will now warn you. Future versions of Cell Ranger will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.
Bug Fixes
-
Fix library ID labels being out of order in the matrix HDF5 file produced by
cellranger aggr
when 10 or more libraries are aggregated. This manifests as Loupe Cell Browser showing the library ID labels out of order after runningcellranger reanalyze
. -
Fix an
out-of-memory
error occurring when generating the kmer index on a reference with very long transcripts, e.g. on a pre-mRNA reference used when analyzing nuclei samples. -
Fix crash when analyzing FASTQs produced by SRA's
fastq-dump
. -
Fix the Differential Expression table in the web summary disappearing when gene IDs are equal to gene names in the reference GTF.
-
Fix a few web summary metrics becoming negative when more than 2.1 billion reads are analyzed at once.
-
Fix incorrect parsing of the
--localcores
argument, causing--localmem
to be ignored when specified immediately after--localcores
. -
Fix crash in
mkfastq
on NovaSeq when RunParameters.xml is namedrunParameters.xml
. -
Fix hang when running
sitecheck
on some systems. -
Fix error reporting in python stage code imports.
-
Fix estimation of stage virtual memory usage.
Improvements
-
Truncate large metadata files when generating a tarball for upload to 10x, rather than omitting them. Remove the requirement that the reference FASTA file modification time precede the STAR index file modification times.
-
The default
--localmem
in cluster mode will no longer ever be more than the free memory available when thecellranger
starts.
New Features
-
Add support for and autodetection of Single Cell 5' gene expression libraries, with support for both paired-end alignment (150x150) and R2-only alignment (26x98).
-
Add
--r1-length
and--r2-length
options tocellranger count
which enable hard trimming of input FASTQs. -
Add
--exclude-genes
option to cellranger reanalyze which, analogously to--genes
, allows for the exclusion of some genes from the secondary analysis (PCA, clustering, etc.). -
Add
--chemistry
tocellranger count
to override the automatic chemistry detection.
Performance Improvements
-
Reduce the run time by 30%.
-
Reduce the disk storage high-water-mark by 60%.
Algorithm Improvements
- Change the Antisense Reads Metric to only count a read as antisense if it has no sense alignments, effectively prioritizing sense alignments over antisense for this computation.
Output File Changes
-
Stop generating the TR and TQ BAM tags because these tags were retaining trimmed sequences that Cell Ranger would ignore anyway after converting the BAM back to FASTQ.
-
Add more mapping metrics (Reads Mapped to Genome, Reads Mapped Confidently to Genome), and reorder the mapping metrics to be consistent with their order of computation.
Bug Fixes
-
Fix mkfastq allowing max
bcl2fastq
threads to exceed--localcores
. -
Fix mkfastq crashing when reading NovaSeq quality data from RTA 3.3 and later.
-
Fix excessive memory requests in
SC_RNA_ANALYZER
. -
Fix nondetection of louvain binary failure in
RUN_GRAPH_CLUSTERING
. -
Fix crash in
RUN_GRAPH_CLUSTERING
when/dev/stdin
doesn't exist. -
Fix the barcode rank plot concatenating instead of unioning barcodes in multi-genome datasets.
System Requirements Changes
- Cell Ranger no longer supports Ubuntu 8 or CentOS 5.2 Linux distributions. Ubuntu 10.04 LTS or CentOS 5.5 or greater are now required.
Job Scheduling
-
The pipeline management system, mrp, is now open source on GitHub.
-
The monitoring port for the user interface is now always on by default, with an OS-selected port if none is specified.
- This behavior can be disabled with
--disable-ui
. - Access to the user interface port, if no port was specified explicitly, now requires a randomly-generated authentication token. This token is visible in the pipeline standard output and in the
_uiport
file.
- This behavior can be disabled with
-
A new tool,
mrstat
is now available.- Given the path to the directory with a running pipeline, mrstat will return basic information about the progress of the pipeline.
- With the
--stop
flag, it will cause the pipeline to fail and exit.
-
Two new variables are available for use in cluster-mode templates:
__MRO_JOB_WORKDIR__
can be used to specify the absolute path to the directory where the job should execute. This should alleviate issues on clusters such as PBS which sometimes do not set the working directory correctly.__MRO_ACCOUNT__
passes theMRO_ACCOUNT
environment variable frommrp
's environment. This is intended for cluster managers which support charging resources to specific accounts.
-
The pipeline standard output and log will now periodically provide progress updates for in-progress stages.
-
mrp
will now provide more clear and useful error reporting when the pipeline directory runs out of disk space. -
Several enhancements to the reliability of pipeline restart.
-
Fixes for several cases where a pipeline could "hang" indefinitely without making further progress.
-
Pipelines should now do a better job of staying within their CPU usage allocation.
Bug fixes
- Properly ignore SIGHUP when a pipeline is run using nohup.
Pipeline Argument Changes
-
Add
--override
option to all pipelines, allowing for stage-level overrides for cores and memory. -
Reanalyze no longer requires
--agg
to persist library ID; it is only required for persisting user-defined fields.
Bug fixes
-
Fix CHUNK_READS using more cores and using them less efficiently than intended.
-
Fix
aggr
using incorrect downsampling rates when more than 10 libraries are aggregated. -
Fix mkfastq proceeding even after
bcl2fastq
is killed. -
Fix lack of robustness to rare events where NFS latency induces double file deletion or double directory creation events.
-
Fix ALIGN_READS proceeding after the STAR subprocess fails, causing crashes in
ATTACH_BCS_AND_UMIS
. -
Improve error messages when STAR or samtools fail in ALIGN_READS.
-
Fix spaces in transcript IDs causing
ATTACH_BCS_AND_UMIS
to crash. mkref no longer allows spaces in transcript IDs. -
Fix crash when reads are adapter-trimmed by
bcl2fastq
and some reads end up empty. -
Fix out-of-memory condition in
ATTACH_BCS_AND_UMIS
for some libraries with >800M reads. -
Fix question marks replacing axis titles of barcode rank plot in web summary.
-
Fix excessive memory consumption and runtime of
mkfastq
on large sample sheets.
Job Scheduling
-
Fix several cases where, after
mrp
(which is invoked bycellranger
) gets killed, it was not able to restart correctly. -
On SGE clusters,
cellranger
/mrp
now periodically runs qstat to verify that the jobs it queued have not been killed or canceled. -
If the run fails, instead of just displaying a message pointing the user to the relevant
_errors
file, the contents of that file is printed.
-On automatic retry of failed stages, the reason for the original failure is logged.
mrp
is now more resilient against certain kinds of filesystem errors.
-
In the event of certain types of filesystem problems (such as permissions errors or disk quota),
mrp/
cellranger should now sometimes be able to provide more useful and immediate error messages. -
Additional information about the environment cellranger runs in is now logged and included in
mri.tgz
. -
Additional information about the environment the analysis runs in is now logged and included in
mri.tgz
. -
mrp
now correctly handles the signals sent by SGE and LSF when a soft time limit is reached (e.g. for SGE,-l s_rt 23:00:00
). -
Now supports
--overrides
method to dynamically change additional CPU and memory per stage.
Pipeline argument changes
-
Add
--barcodes
and--genes
options to reanalyze, which allow selection of a specific subset of barcodes and/or genes to use in the secondary analysis. -
Add
--force-cells
option tocount
andreanalyze
to explicitly set the cell count. If specified, Cell Ranger will take the top N barcodes (by UMI count) as cells instead of doing dynamic cell count estimation. -
Rename the estimated cells option from
--cells
to--expect-cells
for clarity. -
Add
--nosecondary
flag to count, which skips the secondary analysis. Disallow slashes in the--genome
argument inmkref
.
Add --id
option to mkfastq which allows you to name the output directory.
New subcommands
- Add
cellranger mat2csv
command, which converts a Cell Ranger sparse gene-barcode matrix to a dense CSV format. Note that the resulting file will be very large, even for a few hundred cells.
Web summary changes
-
Add "Reads Mapped Antisense to Gene" metric, which quantifies reads that are mapped to the non-coding strand of a gene. High values can indicate the use of an unsupported chemistry type, e.g. passing a Single Cell V(D)J library to
cellranger count
. -
Add "Fraction GEMs with >1 Cell (Lower / Upper Bound)" metrics, which define a confidence interval for the multiplet rate estimate in multi-genome samples.
-
Add more details to various metric descriptions.
Algorithm improvements
-
Add the requirement that reads overlap annotated exons by at least 50% in order to be considered exonic. As a result, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
-
Reduce
EXTRACT_READS
per-read runtime by 50% by avoiding OrderedDict and caching metric calculations. -
Reduce
SUBSAMPLE_READS
runtime by reducing the number of fixed target values for subsampling (to just 25k and 50k reads per cell).
File format improvements
-
Due to a format change (removal of the IntervalTree object), references produced with
cellranger mkref
using Cell Ranger v2.0 are not compatible with pipelines from Cell Ranger v1.x. -
Modify the
TX
,GX
, andGN
tags to have more granular transcript/gene annotations. Each BAM record is only annotated with transcripts/genes specific to that alignment, instead of combining annotations from all alignments of the corresponding read. -
Add
RE
tag, which indicates whether an alignment is exonic, intronic or intergenic.
Bug fixes
-
Fix rare bug in interval arithmetic, leading to exonic reads being falsely annotated as intronic or intergenic. As a result of this bugfix, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
-
Fix excessive
EXTRACT_READS
runtime (10+ hours) on very large FASTQs such as those produced by mkfastq. -
Fix a crash in
RUN_GRAPH_CLUSTERING
on filesystems that do not support named pipes. -
Fix
SUBSAMPLE_READS
using more VMEM than expected, causing it to be killed by SGE when exceeding the h_vmem limit on certain clusters. -
Fix
mkfastq
not merging output files properly due to sample numbering issues. -
Fix
mkfastq
crash due to-d
(demultiplexing-threads) argument being deprecated inbcl2fastq
2.19. -
Fix the components.csv file produced by PCA, which did not contain the correct matrix.
-
Fix a crash in RUN_PCA when the number of nonzero genes is smaller than the number of principal components.
-
Fix a crash in mkref with very large genomes; use the
limitGenomeGenerateRAM
option in STAR to overcome its default reference size limit. -
Fix certain special characters (like dashes) in reference names breaking the subsampled genes detected plot.
-
Fix
mkloupe
displaying an unhelpful error message when run on mixed-species runs and those from Cell Ranger v1.1 or earlier. -
Fix the
open-file-handle-limit
check using the submit host rather than the execution machine. -
Fix
cellranger aggr
allowing duplicatelibrary_ids
. -
Fix
CLOUPE_PREPROCESS
taking the full matrix even afterreanalyze
subselects barcodes. -
Fix a crash in mkfastq on
RunInfo.xml
files produced by the NovaSeq. -
Fix a crash in mkfastq when
bcl2fastq
2.19 is used in cluster mode or with the--demultiplexing-threads
argument. -
Fix
mkfastq
sometimes not properly merging samples inbcl2fastq
2.18 and 2.19 due to a change in the order in which lanes are processed bybcl2fastq
.
Martian Runtime Changes
- Add caching for deserialized JSON metadata. This improves performance for stages with many chunks.
Miscellaneous
-
Update samtools from 0.1.19 to 1.4.
-
Rename
RUN_PREPROCESS
toPREPROCESS_MATRIX
in theSC_RNA_ANALYZER
pipeline. -
Add
alerts.json
as an output of theSUMMARIZE_REPORTS
stage. This file is a machine-readable list of any abnormal metric values that raised alarms in the web summary. -
For multi-genome samples, display the full reference name rather than a comma delimited list of genomes in the web summary ("hg19, mm10" becomes "hg19_and_mm10").
- Fixes issue preventing mkfastq from demultiplexing data from recent sequencer software versions.
Analysis Improvements
-
Confidently align more reads to the transcriptome, greatly improving alignment rates with shorter reads. - Reads Confidently Mapped to Transcriptome increases from 55% to 62% with 98bp reads and from 34% to 54% with 32bp reads (Human PBMCs vs GRCh38).
-
Add a graph-based clustering algorithm: Louvain Modularity Optimization, which, unlike K-Means, does not require pre-specifying K.
Visualization
-
Automatically produce Loupe Cell Browser (.cloupe) files in the
count
,aggregate
, andreanalyze
pipelines. -
Output a web summary HTML file in the
reanalyze
pipeline. -
Be explicit about pre- and post- depth normalization metric values in the
aggr
web summary. -
When the web summary subselects 10e3 cells for display, show the original cluster sizes and not the subselected sizes.
-
Make the web summary HTML slightly smaller by rounding t-SNE coordinates.
-
Update plotly to enable scrollable legends.
File format improvements
- Add Read Group (RG) headers and tags to the output BAM file for better data provenance.
Bug fixes
-
Preserve trimmed bases via the TR/TQ BAM tags for much longer read lengths without crashing.
-
Fix crash when copying files on certain types of network shares that do not support file permissions.
-
Omit no-call bases from Q30 metrics to be consistent with Illumina's Q30 calculation.
-
Allow generation of 3-d (alongside 2-d) t-SNE projections without crashing.
-
Do a better job of hiding dynamic elements while the web summary HTML is loading.
General
-
Make the
--params
argument to reanalyze optional to enable re-runs with the default parameters. -
Check for mismatches between the library IDs given in the
aggr
CSV and those in the matrix file. -
Limit
max_clusters
for K-Means to 50 to ensure sane memory consumption.
-
Fix incorrect results being produced when
aggr
processes acount
output that contains multiple libraries (gem groups). -
Exclude untested genes from p-value adjustment.
-
Don't crash when extra commas are present in an IEM samplesheet for
mkfastq
. -
Don't crash when no project folders are present for
mkfastq
. -
Correctly handle the second index when
mkfastq
receives a dual-indexed IEM samplesheet. -
Allow matrices to have more than 2^31-1 nonzero entries in the matrix HDF5 format.
-
Don't display alerts until the web summary page fully loads.
General
-
Rename main pipeline to cellranger count, which produces a gene-barcode matrix for one library sequenced one or more times.
-
Add support for and autodetection of Chromium Single Cell 3' v2 chemistry; still compatible with v1 chemistry.
-
Fix incorrect default cell count being used when "expected recovered cells" not specified.
New aggr aggregation pipeline
-
New pipeline
cellranger aggr
which aggregates data from multiple libraries into one dataset. -
Supports combining libraries totalling up to 1,000,000 cells and secondary analysis of the combined data.
-
Automatically performs sequencing depth-normalization for all combined libraries.
New reanalyze custom reanalysis pipeline
- Reruns secondary analysis (dimensionality reduction, clustering, and differential expression) with fully customizable parameters.
New mkfastq demultiplexing pipeline
-
Easier to integrate with existing
bcl2fastq
-based workflows. -
Now the preferred demultiplexing method; demux still available but deprecated.
-
mkfastq
is a thin wrapper aroundbcl2fastq
with same basic interface. -
Accepts Illumina Experiment Manager-compatible sample sheets with support for 10x sample index sets.
-
Produces FASTQ files and folders in the same structure as
bcl2fastq
. -
Generates InterOp output for SAV.
-
Also generates 10x-specific run QC metrics in JSON format.
Scalability enhancements
-
Support combined secondary analysis (dimensionality reduction, clustering, differential expression, and visualization) of up to 1,000,000 cells in under 12 hours with 64 GB of RAM.
-
Change PCA implementation to the Netflix-scale memory-efficient method IRLBA.
-
Decrease runtime of t-SNE implementation.
Analysis Improvements
-
Change differential expression algorithm to the negative-binomial based method sSeq.
-
Report log2 fold-change and p-value for all genes in all clusters.
Sample and genome support
- Add pre-built GRCh38 reference package
Web summary enhancements
-
Add plots that show Sequencing Saturation and Median Genes Detected as a function of downsampled reads per cell.
-
Add Total Genes Detected.
-
Rename "cDNA PCR Duplication" to "Sequencing Saturation."
-
Add chemistry field.
-
Order clusters by size.
-
Add help bubbles to charts.
File format improvements
-
Generate BAM index files with the same basename as the main file.
-
Change cell-barcode and UMI quality tags to CY and UY for better compatibility with the SAM specification.
-
Add TR, TQ tags to BAM to enable lossless BAM to FASTQ conversion.
-
Output HDF5-based sparse matrices in addition to the Matrix Exchange format files for better scalability to high cell counts.
-
Report proportion of variance explained for each principal component.
Martian runtime
-
Pipestance output files (outs) are no longer symlinks.
-
Partial stage restart.
-
Add output filename override, supports two output files having same basename.
-
Add
--onfinish
handler support. -
Add support for units of KB and B for memory reservation in cluster job templates.
-
Pipestances now generate a UUID in _uuid.
-
Add auto-retry mechanism when pipeline stages fail due to causes that appear to be transient.
-
--maxjob
s now defaults to 64 in local jobmode. -
--jobinterval
now defaults to 100ms in local jobmode. -
Fix for rare race condition in some Python components
-
Enabled STAR multithreading
-
Added more detailed reference metadata
-
Fixed chromosome name mismatches in 10x reference data
-
Fixed t-SNE algorithm not converging for samples with high cell counts
-
Fixed cell-barcode correction not correcting as many sequences as it should
-
Fixed out-of-memory crash in COUNT_GENES for high-depth samples
-
Fixed occasional loss of the last few reads per chunk in ATTACH_BCS_AND_UMI
-
Added "Reads Mapped Confidently to Exonic Regions" metric to the summary.
-
Changed alert for "Reads Mapped Confidently to Transcriptome" to reflect shorter read lengths and non-human references.
-
Fixed problem where differential expression table sorts incorrectly on click.
-
Fixed problem where very high depth samples would cause an out-of-memory error.
-
Fixed problem where mkgtf would produce incorrectly formatted GTF files.
-
Fixed problem where debug tar.gz file would be very large if the pipestance halted mid-stage.
-
Fixed problem with copying files on certain CIFS volumes.
- Initial release.