Support homeCell Ranger ARCAnalysis
Single-Library Analysis with Cell Ranger ARC

Single-Library Analysis with Cell Ranger ARC

Cell Ranger ARC's pipelines analyze sequencing data produced from Chromium Single Cell Multiome ATAC + Gene Expression.

The Cell Ranger ARC pipeline can only analyze Gene Expression and ATAC data together. It must not be used to analyze Gene Expression or ATAC alone.

You can run 10x Genomics single cell pipelines with 10x Genomics Cloud Analysis, our recommended method to easily process FASTQ files into Cell Ranger ARC output files for most new customers. Sign up for a free account or view tutorials and learn more.

The analysis involves the following steps:

  1. Run cellranger-arc mkfastq on the Illumina BCL output folder for each ATAC (GEX resp.) flow cell to generate ATAC (GEX resp.) FASTQ data. Note that a separate run of mkfastq is required for each ATAC and each GEX flow cell.
  2. Run a separate instance of cellranger-arc count for each GEM well that was demultiplexed by the cellranger-arc mkfastq in the previous step.

For the following example, assume that one sample is processed using Single Cell Multiome ATAC + Gene Expression to generate a Multiome ATAC library and a Multiome Gene Expression (GEX) library. The Multiome ATAC library is sequenced on flow cell HNATACSQXX and the Illumina BCL output is located in /sequencing/Sample_ATAC_HNATACSQXX; similarly, the Multiome GEX library is sequenced on flow cell HNGEXSQXXX and the Illumina BCL output is located in /sequencing/Sample_GEX_HNGEXSQXXX.

Follow the instructions on running cellranger-arc mkfastq to generate FASTQ files for both the ATAC and GEX flow cells. cellranger-arc mkfastq will create output ATAC FASTQ files in HNATACSQXX/outs/fastq_path and GEX FASTQ files in HNGEXSQXXX/outs/fastq_path.

Reference packages for human (GRCh38) and mouse (mm10) compatible with Cell Ranger ARC are available for download. You can also create a reference package using cellranger-arc mkref starting with a genome assembly FASTA file, a GTF file of gene annotations, and optionally a file of transcription factor motifs in JASPAR format.

Construct a 3-column libraries CSV file that specifies the location of the ATAC and GEX FASTQ files associated with the sample.

Column NameDescription
fastqsA fully qualified path to the directory containing the demultiplexed FASTQ files for this sample. This field does not accept comma-delimited paths. If you have multiple sets of fastqs for this library, add an additional row, and use the use same library_type value.
sampleSample name assigned as the Sample_ID in the demultiplexing sample sheet.
library_typeThis field is case-sensitive and must exactly match Chromatin Accessibility for a Multiome ATAC library and Gene Expression for a Multiome GEX library.

For our example, the file would look as follows:

fastqs,sample,library_type /home/jdoe/runs/HNGEXSQXXX/outs/fastq_path,example,Gene Expression /home/jdoe/runs/HNATACSQXX/outs/fastq_path,example,Chromatin Accessibility

The CSV contains two rows, as the sequence data for GEX and ATAC likely came from different flow cells. The library_type is restricted to be either Gene Expression or Chromatin Accessibility.

To generate single cell feature counts and secondary analyses for a single library, run cellranger-arc count with the following arguments. For a complete listing of the arguments accepted, see the Command Line Argument Reference below, or run cellranger-arc count --help.

For help on which arguments to use to target a particular set of FASTQs, consult Specifying Input FASTQ Files for 10x Genomics Pipelines.

After determining these input arguments, run cellranger-arc:

$ cd /home/jdoe/runs $ cellranger-arc count --id=sample345 \ --reference=/opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --libraries=/home/jdoe/runs/libraries.csv \ --localcores=16 \ --localmem=64

Following a series of checks to validate input arguments, cellranger-arc count pipeline stages will begin to run:

Martian Runtime - v4.0.5 Running preflight checks (please wait)... Checking FASTQ folder... Checking reference... Checking reference_path (/opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0) on compute-server32... Checking chemistry... Checking optional arguments... ...

By default, cellranger-arc will use all the cores available on your system to execute pipeline stages. You can specify a different number of cores to use with the --localcores option; for example, --localcores=16 will limit cellranger-arc to using up to sixteen cores at once. Similarly, --localmem will restrict the amount of memory (in GB) used by cellranger-arc.

The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345) for its output. If this folder already exists, cellranger-arc will assume it is an existing pipestance and attempt to resume running it.

A successful cellranger-arc count run should conclude with a message similar to this:

Outputs: - Secondary analysis outputs: clustering: atac: { ... } gex: { ... } dimensionality_reduction: atac: { ... } gex: { ... } feature_linkage: ... tf_analysis: ... - Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html - Run summary metrics CSV: /home/jdoe/runs/sample345/outs/summary.csv - Per barcode summary metrics: /home/jdoe/runs/sample345/outs/per_barcode_metrics.csv - Filtered feature barcode matrix MEX: /home/jdoe/runs/sample345/outs/filtered_feature_bc_matrix - Filtered feature barcode matrix HDF5: /home/jdoe/runs/sample345/outs/filtered_feature_bc_matrix.h5 - Raw feature barcode matrix MEX: /home/jdoe/runs/sample345/outs/raw_feature_bc_matrix - Raw feature barcode matrix HDF5: /home/jdoe/runs/sample345/outs/raw_feature_bc_matrix.h5 - Loupe browser visualization file: /home/jdoe/runs/sample345/outs/cloupe.cloupe - GEX Position-sorted alignments BAM: /home/jdoe/runs/sample345/outs/gex_possorted_bam.bam - GEX Position-sorted alignments BAM index: /home/jdoe/runs/sample345/outs/gex_possorted_bam.bam.bai - GEX Per molecule information file: /home/jdoe/runs/sample345/outs/gex_molecule_info.h5 - ATAC Position-sorted alignments BAM: /home/jdoe/runs/sample345/outs/atac_possorted_bam.bam - ATAC Position-sorted alignments BAM index: /home/jdoe/runs/sample345/outs/atac_possorted_bam.bam.bai - ATAC Per fragment information file: /home/jdoe/runs/sample345/outs/atac_fragments.tsv.gz - ATAC Per fragment information index: /home/jdoe/runs/sample345/outs/atac_fragments.tsv.gz.tbi - ATAC peak locations: /home/jdoe/runs/sample345/outs/atac_peaks.bed - ATAC smoothed transposition site track: /home/jdoe/runs/sample345/outs/atac_cut_sites.bigwig - ATAC peak annotations based on proximal genes: /home/jdoe/runs/sample345/outs/atac_peak_annotation.tsv Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully! yyyy-mm-dd hh:mm:ss Shutting down. Saving pipestance info to "sample345/sample345.mri.tgz"

The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345). The subfolder named outs will contain the main pipeline output files:

File NameDescription
web_summary.htmlRun summary metrics and charts in HTML format.
summary.csvRun summary metrics in CSV format.
raw_feature_bc_matrix.h5Raw feature barcode matrix stored as a CSC sparse matrix in hdf5 format. The rows consist of all the gene and peak features concatenated together and the columns consist of all observed barcodes with non-zero signal for either ATAC or gene expression.
raw_feature_bc_matrixRaw feature barcode matrix stored as a CSC sparse matrix in MEX format. The rows consist of all the gene and peak features concatenated together and the columns consist of all observed barcodes with non-zero signal for either ATAC or gene expression.
per_barcode_metrics.csvATAC and GEX read count summaries generated for every barcode observed in the experiment. For more details see Per-barcode metrics.
gex_possorted_bam.bamGEX reads aligned to the genome and transcriptome annotated with barcode information in BAM format.
gex_possorted_bam.bam.baiIndex for gex_possorted_bam.bam.
gex_molecule_info.h5Count and barcode information for every GEX molecule observed in the experiment in hdf5 format.
filtered_feature_bc_matrix.h5Filtered feature barcode matrix stored as a CSC sparse matrix in hdf5 format. The rows consist of all the gene and peak features concatenated together (identical to raw feature barcode matrix) and the columns are restricted to those barcodes that are identified as cells.
filtered_feature_bc_matrixFiltered feature barcode matrix stored as a CSC sparse matrix in MEX format. The rows consist of all the gene and peak features concatenated together (identical to raw feature barcode matrix) and the columns are restricted to those barcodes that are identified as cells.
cloupe.cloupeLoupe Browser visualization file with all the analysis outputs.
atac_possorted_bam.bamATAC reads aligned to the genome annotated with barcode information in BAM format.
atac_possorted_bam.bam.baiIndex for atac_possorted_bam.bam.
atac_peaks.bedLocations of open-chromatin regions identified in this sample. These regions are referred to as "peaks".
atac_peak_annotation.tsvAnnotations of peaks based on genomic proximity alone. Note that these are not functional annotations and they do not make use of linkage with GEX data.
atac_fragments.tsv.gzCount and barcode information for every ATAC fragment observed in the experiment in TSV format.
atac_fragments.tsv.gz.tbiIndex for atac_fragments.tsv.gz.
atac_cut_sites.bigwigGenome track of observed transposition sites in the experiment smoothed at a resolution of 400 bases in BIGWIG format.
analysisVarious secondary analyses that utilize the ATAC data, the GEX data, and their linkage: dimensionality reduction and clustering results for the ATAC and GEX data, differential expression, and differential accessibility for all clustering results above and linkage between ATAC and GEX data. See Analysis Overview for more information.

Once cellranger-arc count has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .cloupe file in Loupe Browser, or refer to the Understanding Output section to explore the data manually.

These are the required command line arguments (also available through cellranger-arc aggr --help):

ArgumentDescription
--idRequired. A unique run ID string (e.g., sample345). The name is arbitrary and will be used to name the directory containing all pipeline-generated files and outputs. Only letters, numbers, underscores, and hyphens are allowed (maximum of 64 characters).
--librariesPath to a 3-column CSV file declaring FASTQ paths, sample names and library types of input ATAC and GEX FASTQs. The libraries CSV format is described here.
--referencePath to the cellranger-arc-compatible reference package. References for human and mouse are available for download. Custom references can be constructed as described here.

Additional optional parameters are available:

ArgumentDescription
--descriptionSample description to embed into output files
--gex-exclude-intronsDisable counting of intronic reads. In this mode we only count reads that are exonic and compatible with annotated splice junctions in the reference. Note: using this mode will reduce the UMI counts in the count matrix.
--min-atac-countCell caller override: define the minimum number of ATAC transposition events in peaks (ATAC counts) for a cell barcode. Note: this option must be specified in conjunction with `min-gex-count`. With `--min-atac-count=X` and `--min-gex-count=Y` a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts. It is advisable to use these parameters only after reviewing the web summary generated using default parameters.
--min-gex-countCell caller override: define the minimum number of GEX UMI counts for a cell barcode. Note: this option must be specified in conjunction with `min-atac-count`. With `--min- atac-count=X` and `--min-gex-count=Y` a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts. It is advisable to use these parameters only after reviewing the web summary generated using default parameters.
--no-bamSkip BAM file generation. This will reduce the total computation time for the pipestance and the size of the output directory. If unsure, it is recommended not to use this option, as BAM files can be useful for troubleshooting and downstream analysis. Default: false.
--peaksPeak-caller override: specify peaks to use in downstream analyses from supplied BED file. Note that the file must only contain three columns specifying the contigstart, and end of the peaks. The peaks must not overlap each other. The file must be sorted by position with the same chromosome order as the reference package. The file is allowed to contain comment lines beginning with `#`.
--localcoresRestricts cellranger-arc to use specified number of cores to execute pipeline stages. By default, cellranger-arc will use all of the cores available on your system.
--localmemRestricts cellranger-arc to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger-arc will use 90% of the memory available on your system.