-
Feature Barcoding Only Analysis - It is now possible to run
cellranger count
using Cell Surface Protein (antibody captured) libraries without a GEX library. The previous version of Cell Ranger required a Gene Expression library along with a library generated by Feature Barcoding technology. However, the new version of Cell Ranger provides customers with flexibility to sequence either one of the libraries, or both. In particular, cell calling now works with antibody counts only, and all secondary analyses (PCA, t-SNE, UMAP, clusterings) work with antibody-only count matrix as well. More details are available on the [Feature Barcoding analysis] page (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/3.1/using/feature-bc-analysis) page. -
UMAP based lower dimensionality projections of datasets analyzed by
cellranger count
are now produced in addition to the previously produced t-SNE projections. The projections are made available both as CSV files and as data that can be directly viewed in Loupe Browser. The parameters for the projection can also be modified and experimented with using cellranger reanalyze. This alternate visualization method has become increasingly popular for visualizing single cell data since the earliest report that used it. For more details, see the description in the algorithms overview section. -
New Web Summary Look - The Cell Ranger
web_summary.html
file has been updated to match the styles and formats of other 10x products. Compared to the old version users will notice new fonts and some aesthetic changes in the new version. -
Bug Fix: If equal numbers of reads with given Barcode / UMI combination map to two genes, the assignment of the Barcode / UMI are now considered ambiguous and not reported in moleculeinfo.h5 or the count matrix. Previously they were reported _twice, once for each gene.
-
Other minor bug fixes
Release Notes for Martian 3.2.3: Job Scheduling
-
Fix a crash in cases where the
mrp
binary becomes unavailable on disk during a pipestance run. -
In addition to logging the type of filesystem for the pipestance directory, mrp will also log the type of filesystem for the martian bin directory (which is often different from the pipestance directory), and also the mount options for both directories.
-
Regardless of
--jobinterval
setting,mrp
will now never attempt to submit more than one job at a time to the queue in cluster mode. -
mrp
will now shut down if the pipestance log file has been deleted, even if a new one has been created in its place. This prevents problems in the case where the pipestance directory (including the log and lock files) have been deleted. -
Memory cgroups limits are now detected, reported, and used as default limits where applicable. This should be especially helpful for users submitting
mrp
to a cluster such as SLURM which uses memory cgroups to prevent jobs from using too much memory, by preventingmrp
from trying to use more than the job's allowance. -
Other small bug fixes and performance improvements.
V(D)J Release Notes
Major algorithm changes and effects on performance
-
The assembly, annotation and cell calling algorithms have all been replaced, as have the reference sequences. However with noted exceptions, the interface is unchanged.
-
Many changes were made to the assembly algorithm that allow it to achieve the same sensitivity using less data. After these changes, the recommended sequencing configuration was changed to 26 x 91 (from 150 x 150), while leaving the number of read pairs per cell fixed at 5000. This enables V(D)J, Gene Expression and Feature Barcoding libraries to be sequenced in a single run, thereby simplifying the workflow.
-
The effect of the new changes varies considerably from sample to sample and we have added a discussion on Experimental Design that explains some of this. In some instances the number of productive pairs increases markedly if the same dataset is rerun with the new code.
-
The old read configuration 150 x 150 is still supported and may be preferable for some users, because of pricing or availability, particularly for users who are running only V(D)J data. For 150 x 150, the recommended depth is proportionally lower, 2000 read pairs per cell.
-
Many corrections were made to the Prebuilt reference sequences.
-
Contig annotation has been improved in several ways. This includes more accurate detection of CDR3 regions, a more stringent full-length requirement, and a requirement that V segments begin with a start codon (coupled to reference sequence corrections). This could affect annotation for species other than human or mouse, having incomplete reference sequences.
-
A productive pair is no longer declared in cases where there are three or more contigs having the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells.
-
Some new large clones are now reported, that were missed previously for a variety of reasons, including failure to align J segments having high somatic hypermutation.
-
A productive pair is no longer declared in cases where three or more contigs share the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells. In addition, certain clonal expansions of plasma cells are now contracted because the expansion represents mRNA leakage during processing, rather than a true biological expansion. Finally, requirements for small clones sharing a chain with a large clone have been tightened to reduce the likelihood of false clones arising from ambient mRNA or doublets. All of these changes correctly reduce the number of reported productive pairs (usually by a small fraction).
-
Because of these changes, we recommend that customers rerun existing datasets using Cell Ranger 3.1 if possible.
-
Because cell calling is changed, the denominator used for computing the Cells With Productive V-J Spanning Pair metric may have changed. For this reason, differences in performance between Cell Ranger 3.0 and 3.1 are better assessed using the Number of Cells With Productive V-J Spanning Pair metric.
-
Cell Ranger 3.1 is significantly faster. There are five fewer stages in the pipeline.
Interface Changes:
-
Cell Count Confidence is no longer reported because we found that in some cases incorrect counts were reported with high confidence. Cell counting from V(D)J data alone is limited in accuracy because targeted cells having sufficiently low expression cannot be detected.
-
Contigs Unannotated is no longer reported because all contigs are now annotated. The justification for this is that since enrichment uses primers binding to constant regions, bona fide contigs would be expected to have at least a C annotation.
-
For species other than human or mouse, for which custom primers are needed, the sequences of the inner enrichment primers must now be supplied as a command-line argument.
Job Scheduling Changes
- Add support for SGE and LSF clusters that track virtual memory use.
Enable Analysis of CITE-seq Experiments
-
Cell Ranger can now process data from experiments where the antibodies were conjugated to oligonucleotides that were captured by oligo-dT primers. Previously, only experiments which used the Chromium Single Cell 3' Feature Barcode Library Kit, which utilizes a different capture sequence for Gene Expression and Feature Barcoding data, could be analyzed.
-
Please note that while Cell Ranger is now compatible with CITE-seq data, CITE-seq is not a supported application. To ensure full support for your 10x data analysis please visit the Feature Barcode Analysis page to see the supported Feature Barcoding technology.
Bug fixes
-
Fix an issue where STAR would crash on CPUs without AVX support.
-
Fix a determinism issue when aggregating 3' v2 and v3 data.
-
Increase the memory reservation for the SORT_BY_POS stage.
General
- Cell Ranger has been overhauled to support user-defined Feature Barcoding reagents, and to quantify these features alongside standard gene-expression reads. See Feature Barcoding for details. For users who have already run their data through earlier versions, there is no need to rerun it again using this new version.
Cell Calling Changes
-
Cell Ranger 3.0 implements a version of the EmptyDrops cell calling algorithm that will call more low RNA content cells, especially when they are mixed with a population of high RNA content cells. See Cell Calling Algorithms for details.
-
The cell calling 'knee-plot' in the web summary now indicates what fraction of barcodes in each segment of the curve were called as cells, since the new cell calling algorithm no longer makes a hard threshold on UMI counts.
Output File Format Changes
-
The file formats of the gene-barcode matrix (now called the feature-barcode matrix) have changed to accommodate Feature Barcoding results.
-
The mtx and barcodes.tsv files are now gzipped to save disk space The genes.tsv file has been renamed features.tsv.gz, and contains extra columns indicating the
feature_type
of each gene / feature. -
See Feature-Barcode Matrices for details.
-
As part of this change,
cellranger-rkit
is deprecated. We recommend Seurat for analysis in R. -
The Molecule info file format has been substantially changed to enable output from the new Feature Barcoding technology and remove rarely used mapping metrics.
Cell Ranger 2.2.0 will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.
-
Fix Martian UI display in FireFox
-
Fix non-integral resource requests (memory/threads)
-
Fix SUBSAMPLE_READS producing wrong metric names. Newer version of Martian no longer casts zero-fractional floats to ints, which this code was relying on to produce metric names with integral subsampling rates in them.
-
Fix failure to detect whitelist with demux when a single Sample Index is bad
-
Fix always-on multi-chromosome transcript warning in
mkref
-
Fix stall in ALIGN_READS on filesystems that don't support named pipes
-
Fix python error when autodetect of chemistry fails with multiple FASTQ paths
-
Fix handling of sample names with multiple underscores in
mkfastq
pipeline -
Fix suppression of process limit errors in the
mkfastq
QC stage
Changes to mkfastq
-
Barcode-aware QC stage is now opt-in via the
--qc
flag. -
Limit total CPU usage across stages to 12 cores unless
--localcores
is specified. This should improve reliability on machines with high numbers of cores.
Cell Ranger 2.1.1 Gene Expression
Note: This is expected to be the last version of Cell Ranger to support CentOS/RedHat 5 and Ubuntu 10. If you are using one of those operating systems, Cell Ranger will now warn you. Future versions of Cell Ranger will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.
Bug Fixes
-
Fix library ID labels being out of order in the matrix HDF5 file produced by
cellranger aggr
when 10 or more libraries are aggregated. This manifests as Loupe Cell Browser showing the library ID labels out of order after runningcellranger reanalyze
. -
Fix an
out-of-memory
error occurring when generating the kmer index on a reference with very long transcripts, e.g. on a pre-mRNA reference used when analyzing nuclei samples. -
Fix crash when analyzing FASTQs produced by SRA's
fastq-dump
. -
Fix the Differential Expression table in the web summary disappearing when gene IDs are equal to gene names in the reference GTF.
-
Fix a few web summary metrics becoming negative when more than 2.1 billion reads are analyzed at once.
-
Fix incorrect parsing of the
--localcores
argument, causing--localmem
to be ignored when specified immediately after--localcores
. -
Fix crash in
mkfastq
on NovaSeq when RunParameters.xml is namedrunParameters.xml
. -
Fix hang when running
sitecheck
on some systems. -
Fix error reporting in python stage code imports.
-
Fix estimation of stage virtual memory usage.
Improvements
-
Truncate large metadata files when generating a tarball for upload to 10x, rather than omitting them. Remove the requirement that the reference FASTA file modification time precede the STAR index file modification times.
-
The default
--localmem
in cluster mode will no longer ever be more than the free memory available when thecellranger
starts.
New Features
-
Add support for and autodetection of Single Cell 5' gene expression libraries, with support for both paired-end alignment (150x150) and R2-only alignment (26x98).
-
Add
--r1-length
and--r2-length
options tocellranger count
which enable hard trimming of input FASTQs. -
Add
--exclude-genes
option to cellranger reanalyze which, analogously to--genes
, allows for the exclusion of some genes from the secondary analysis (PCA, clustering, etc.). -
Add
--chemistry
tocellranger count
to override the automatic chemistry detection.
Performance Improvements
-
Reduce the run time by 30%.
-
Reduce the disk storage high-water-mark by 60%.
Algorithm Improvements
- Change the Antisense Reads Metric to only count a read as antisense if it has no sense alignments, effectively prioritizing sense alignments over antisense for this computation.
Output File Changes
-
Stop generating the TR and TQ BAM tags because these tags were retaining trimmed sequences that Cell Ranger would ignore anyway after converting the BAM back to FASTQ.
-
Add more mapping metrics (Reads Mapped to Genome, Reads Mapped Confidently to Genome), and reorder the mapping metrics to be consistent with their order of computation.
Bug Fixes
-
Fix mkfastq allowing max
bcl2fastq
threads to exceed--localcores
. -
Fix mkfastq crashing when reading NovaSeq quality data from RTA 3.3 and later.
-
Fix excessive memory requests in
SC_RNA_ANALYZER
. -
Fix nondetection of louvain binary failure in
RUN_GRAPH_CLUSTERING
. -
Fix crash in
RUN_GRAPH_CLUSTERING
when/dev/stdin
doesn't exist. -
Fix the barcode rank plot concatenating instead of unioning barcodes in multi-genome datasets.
System Requirements Changes
- Cell Ranger no longer supports Ubuntu 8 or CentOS 5.2 Linux distributions. Ubuntu 10.04 LTS or CentOS 5.5 or greater are now required.
Job Scheduling
-
The pipeline management system, mrp, is now open source on GitHub.
-
The monitoring port for the user interface is now always on by default, with an OS-selected port if none is specified.
- This behavior can be disabled with
--disable-ui
. - Access to the user interface port, if no port was specified explicitly, now requires a randomly-generated authentication token. This token is visible in the pipeline standard output and in the
_uiport
file.
- This behavior can be disabled with
-
A new tool,
mrstat
is now available.- Given the path to the directory with a running pipeline, mrstat will return basic information about the progress of the pipeline.
- With the
--stop
flag, it will cause the pipeline to fail and exit.
-
Two new variables are available for use in cluster-mode templates:
__MRO_JOB_WORKDIR__
can be used to specify the absolute path to the directory where the job should execute. This should alleviate issues on clusters such as PBS which sometimes do not set the working directory correctly.__MRO_ACCOUNT__
passes theMRO_ACCOUNT
environment variable frommrp
's environment. This is intended for cluster managers which support charging resources to specific accounts.
-
The pipeline standard output and log will now periodically provide progress updates for in-progress stages.
-
mrp
will now provide more clear and useful error reporting when the pipeline directory runs out of disk space. -
Several enhancements to the reliability of pipeline restart.
-
Fixes for several cases where a pipeline could "hang" indefinitely without making further progress.
-
Pipelines should now do a better job of staying within their CPU usage allocation.
Bug fixes
- Properly ignore SIGHUP when a pipeline is run using nohup.
Pipeline Argument Changes
-
Add
--override
option to all pipelines, allowing for stage-level overrides for cores and memory. -
Reanalyze no longer requires
--agg
to persist library ID; it is only required for persisting user-defined fields.
Bug fixes
-
Fix CHUNK_READS using more cores and using them less efficiently than intended.
-
Fix
aggr
using incorrect downsampling rates when more than 10 libraries are aggregated. -
Fix mkfastq proceeding even after
bcl2fastq
is killed. -
Fix lack of robustness to rare events where NFS latency induces double file deletion or double directory creation events.
-
Fix ALIGN_READS proceeding after the STAR subprocess fails, causing crashes in
ATTACH_BCS_AND_UMIS
. -
Improve error messages when STAR or samtools fail in ALIGN_READS.
-
Fix spaces in transcript IDs causing
ATTACH_BCS_AND_UMIS
to crash. mkref no longer allows spaces in transcript IDs. -
Fix crash when reads are adapter-trimmed by
bcl2fastq
and some reads end up empty. -
Fix out-of-memory condition in
ATTACH_BCS_AND_UMIS
for some libraries with >800M reads. -
Fix question marks replacing axis titles of barcode rank plot in web summary.
-
Fix excessive memory consumption and runtime of
mkfastq
on large sample sheets.
Job Scheduling
-
Fix several cases where, after
mrp
(which is invoked bycellranger
) gets killed, it was not able to restart correctly. -
On SGE clusters,
cellranger
/mrp
now periodically runs qstat to verify that the jobs it queued have not been killed or canceled. -
If the run fails, instead of just displaying a message pointing the user to the relevant
_errors
file, the contents of that file is printed.
-On automatic retry of failed stages, the reason for the original failure is logged.
mrp
is now more resilient against certain kinds of filesystem errors.
-
In the event of certain types of filesystem problems (such as permissions errors or disk quota),
mrp/
cellranger should now sometimes be able to provide more useful and immediate error messages. -
Additional information about the environment cellranger runs in is now logged and included in
mri.tgz
. -
Additional information about the environment the analysis runs in is now logged and included in
mri.tgz
. -
mrp
now correctly handles the signals sent by SGE and LSF when a soft time limit is reached (e.g. for SGE,-l s_rt 23:00:00
). -
Now supports
--overrides
method to dynamically change additional CPU and memory per stage.
Pipeline argument changes
-
Add
--barcodes
and--genes
options to reanalyze, which allow selection of a specific subset of barcodes and/or genes to use in the secondary analysis. -
Add
--force-cells
option tocount
andreanalyze
to explicitly set the cell count. If specified, Cell Ranger will take the top N barcodes (by UMI count) as cells instead of doing dynamic cell count estimation. -
Rename the estimated cells option from
--cells
to--expect-cells
for clarity. -
Add
--nosecondary
flag to count, which skips the secondary analysis. Disallow slashes in the--genome
argument inmkref
.
Add --id
option to mkfastq which allows you to name the output directory.
New subcommands
- Add
cellranger mat2csv
command, which converts a Cell Ranger sparse gene-barcode matrix to a dense CSV format. Note that the resulting file will be very large, even for a few hundred cells.
Web summary changes
-
Add "Reads Mapped Antisense to Gene" metric, which quantifies reads that are mapped to the non-coding strand of a gene. High values can indicate the use of an unsupported chemistry type, e.g. passing a Single Cell V(D)J library to
cellranger count
. -
Add "Fraction GEMs with >1 Cell (Lower / Upper Bound)" metrics, which define a confidence interval for the multiplet rate estimate in multi-genome samples.
-
Add more details to various metric descriptions.
Algorithm improvements
-
Add the requirement that reads overlap annotated exons by at least 50% in order to be considered exonic. As a result, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
-
Reduce
EXTRACT_READS
per-read runtime by 50% by avoiding OrderedDict and caching metric calculations. -
Reduce
SUBSAMPLE_READS
runtime by reducing the number of fixed target values for subsampling (to just 25k and 50k reads per cell).
File format improvements
-
Due to a format change (removal of the IntervalTree object), references produced with
cellranger mkref
using Cell Ranger v2.0 are not compatible with pipelines from Cell Ranger v1.x. -
Modify the
TX
,GX
, andGN
tags to have more granular transcript/gene annotations. Each BAM record is only annotated with transcripts/genes specific to that alignment, instead of combining annotations from all alignments of the corresponding read. -
Add
RE
tag, which indicates whether an alignment is exonic, intronic or intergenic.
Bug fixes
-
Fix rare bug in interval arithmetic, leading to exonic reads being falsely annotated as intronic or intergenic. As a result of this bugfix, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
-
Fix excessive
EXTRACT_READS
runtime (10+ hours) on very large FASTQs such as those produced by mkfastq. -
Fix a crash in
RUN_GRAPH_CLUSTERING
on filesystems that do not support named pipes. -
Fix
SUBSAMPLE_READS
using more VMEM than expected, causing it to be killed by SGE when exceeding the h_vmem limit on certain clusters. -
Fix
mkfastq
not merging output files properly due to sample numbering issues. -
Fix
mkfastq
crash due to-d
(demultiplexing-threads) argument being deprecated inbcl2fastq
2.19. -
Fix the components.csv file produced by PCA, which did not contain the correct matrix.
-
Fix a crash in RUN_PCA when the number of nonzero genes is smaller than the number of principal components.
-
Fix a crash in mkref with very large genomes; use the
limitGenomeGenerateRAM
option in STAR to overcome its default reference size limit. -
Fix certain special characters (like dashes) in reference names breaking the subsampled genes detected plot.
-
Fix
mkloupe
displaying an unhelpful error message when run on mixed-species runs and those from Cell Ranger v1.1 or earlier. -
Fix the
open-file-handle-limit
check using the submit host rather than the execution machine. -
Fix
cellranger aggr
allowing duplicatelibrary_ids
. -
Fix
CLOUPE_PREPROCESS
taking the full matrix even afterreanalyze
subselects barcodes. -
Fix a crash in mkfastq on
RunInfo.xml
files produced by the NovaSeq. -
Fix a crash in mkfastq when
bcl2fastq
2.19 is used in cluster mode or with the--demultiplexing-threads
argument. -
Fix
mkfastq
sometimes not properly merging samples inbcl2fastq
2.18 and 2.19 due to a change in the order in which lanes are processed bybcl2fastq
.
Martian Runtime Changes
- Add caching for deserialized JSON metadata. This improves performance for stages with many chunks.
Miscellaneous
-
Update samtools from 0.1.19 to 1.4.
-
Rename
RUN_PREPROCESS
toPREPROCESS_MATRIX
in theSC_RNA_ANALYZER
pipeline. -
Add
alerts.json
as an output of theSUMMARIZE_REPORTS
stage. This file is a machine-readable list of any abnormal metric values that raised alarms in the web summary. -
For multi-genome samples, display the full reference name rather than a comma delimited list of genomes in the web summary ("hg19, mm10" becomes "hg19_and_mm10").
- Fixes issue preventing mkfastq from demultiplexing data from recent sequencer software versions.
Analysis Improvements
-
Confidently align more reads to the transcriptome, greatly improving alignment rates with shorter reads. - Reads Confidently Mapped to Transcriptome increases from 55% to 62% with 98bp reads and from 34% to 54% with 32bp reads (Human PBMCs vs GRCh38).
-
Add a graph-based clustering algorithm: Louvain Modularity Optimization, which, unlike K-Means, does not require pre-specifying K.
Visualization
-
Automatically produce Loupe Cell Browser (.cloupe) files in the
count
,aggregate
, andreanalyze
pipelines. -
Output a web summary HTML file in the
reanalyze
pipeline. -
Be explicit about pre- and post- depth normalization metric values in the
aggr
web summary. -
When the web summary subselects 10e3 cells for display, show the original cluster sizes and not the subselected sizes.
-
Make the web summary HTML slightly smaller by rounding t-SNE coordinates.
-
Update plotly to enable scrollable legends.
File format improvements
- Add Read Group (RG) headers and tags to the output BAM file for better data provenance.
Bug fixes
-
Preserve trimmed bases via the TR/TQ BAM tags for much longer read lengths without crashing.
-
Fix crash when copying files on certain types of network shares that do not support file permissions.
-
Omit no-call bases from Q30 metrics to be consistent with Illumina's Q30 calculation.
-
Allow generation of 3-d (alongside 2-d) t-SNE projections without crashing.
-
Do a better job of hiding dynamic elements while the web summary HTML is loading.
General
-
Make the
--params
argument to reanalyze optional to enable re-runs with the default parameters. -
Check for mismatches between the library IDs given in the
aggr
CSV and those in the matrix file. -
Limit
max_clusters
for K-Means to 50 to ensure sane memory consumption.
-
Fix incorrect results being produced when
aggr
processes acount
output that contains multiple libraries (gem groups). -
Exclude untested genes from p-value adjustment.
-
Don't crash when extra commas are present in an IEM samplesheet for
mkfastq
. -
Don't crash when no project folders are present for
mkfastq
. -
Correctly handle the second index when
mkfastq
receives a dual-indexed IEM samplesheet. -
Allow matrices to have more than 2^31-1 nonzero entries in the matrix HDF5 format.
-
Don't display alerts until the web summary page fully loads.
General
-
Rename main pipeline to cellranger count, which produces a gene-barcode matrix for one library sequenced one or more times.
-
Add support for and autodetection of Chromium Single Cell 3' v2 chemistry; still compatible with v1 chemistry.
-
Fix incorrect default cell count being used when "expected recovered cells" not specified.
New aggr aggregation pipeline
-
New pipeline
cellranger aggr
which aggregates data from multiple libraries into one dataset. -
Supports combining libraries totalling up to 1,000,000 cells and secondary analysis of the combined data.
-
Automatically performs sequencing depth-normalization for all combined libraries.
New reanalyze custom reanalysis pipeline
- Reruns secondary analysis (dimensionality reduction, clustering, and differential expression) with fully customizable parameters.
New mkfastq demultiplexing pipeline
-
Easier to integrate with existing
bcl2fastq
-based workflows. -
Now the preferred demultiplexing method; demux still available but deprecated.
-
mkfastq
is a thin wrapper aroundbcl2fastq
with same basic interface. -
Accepts Illumina Experiment Manager-compatible sample sheets with support for 10x sample index sets.
-
Produces FASTQ files and folders in the same structure as
bcl2fastq
. -
Generates InterOp output for SAV.
-
Also generates 10x-specific run QC metrics in JSON format.
Scalability enhancements
-
Support combined secondary analysis (dimensionality reduction, clustering, differential expression, and visualization) of up to 1,000,000 cells in under 12 hours with 64 GB of RAM.
-
Change PCA implementation to the Netflix-scale memory-efficient method IRLBA.
-
Decrease runtime of t-SNE implementation.
Analysis Improvements
-
Change differential expression algorithm to the negative-binomial based method sSeq.
-
Report log2 fold-change and p-value for all genes in all clusters.
Sample and genome support
- Add pre-built GRCh38 reference package
Web summary enhancements
-
Add plots that show Sequencing Saturation and Median Genes Detected as a function of downsampled reads per cell.
-
Add Total Genes Detected.
-
Rename "cDNA PCR Duplication" to "Sequencing Saturation."
-
Add chemistry field.
-
Order clusters by size.
-
Add help bubbles to charts.
File format improvements
-
Generate BAM index files with the same basename as the main file.
-
Change cell-barcode and UMI quality tags to CY and UY for better compatibility with the SAM specification.
-
Add TR, TQ tags to BAM to enable lossless BAM to FASTQ conversion.
-
Output HDF5-based sparse matrices in addition to the Matrix Exchange format files for better scalability to high cell counts.
-
Report proportion of variance explained for each principal component.
Martian runtime
-
Pipestance output files (outs) are no longer symlinks.
-
Partial stage restart.
-
Add output filename override, supports two output files having same basename.
-
Add
--onfinish
handler support. -
Add support for units of KB and B for memory reservation in cluster job templates.
-
Pipestances now generate a UUID in _uuid.
-
Add auto-retry mechanism when pipeline stages fail due to causes that appear to be transient.
-
--maxjob
s now defaults to 64 in local jobmode. -
--jobinterval
now defaults to 100ms in local jobmode. -
Fix for rare race condition in some Python components
-
Enabled STAR multithreading
-
Added more detailed reference metadata
-
Fixed chromosome name mismatches in 10x reference data
-
Fixed t-SNE algorithm not converging for samples with high cell counts
-
Fixed cell-barcode correction not correcting as many sequences as it should
-
Fixed out-of-memory crash in COUNT_GENES for high-depth samples
-
Fixed occasional loss of the last few reads per chunk in ATTACH_BCS_AND_UMI
-
Added "Reads Mapped Confidently to Exonic Regions" metric to the summary.
-
Changed alert for "Reads Mapped Confidently to Transcriptome" to reflect shorter read lengths and non-human references.
-
Fixed problem where differential expression table sorts incorrectly on click.
-
Fixed problem where very high depth samples would cause an out-of-memory error.
-
Fixed problem where mkgtf would produce incorrectly formatted GTF files.
-
Fixed problem where debug tar.gz file would be very large if the pipestance halted mid-stage.
-
Fixed problem with copying files on certain CIFS volumes.
- Initial release.