The cellranger-atac workflow starts by demultiplexing the Illumina sequencer's base call files (BCLs) for each flow cell directory into FASTQ files. 10x Genomics recommends using cellranger-atac mkfastq
, a pipeline that wraps Illumina's bcl2fastq
software and provides a number of convenient features in addition to the features of bcl2fastq
:
- Translates 10x Genomics sample index set names into the corresponding list of four sample index oligonucleotides. For example, well A1 can be specified in the sample sheet as SI-NA-A1, and
cellranger-atac mkfastq
will recognize the four oligos (AAACGGCG, CCTACCAT, GGCGTTTC, and TTGTAAGA) and merge the resulting FASTQ files. - Supports a simplified CSV sample sheet format to handle 10x Genomics use cases.
- Supports most
bcl2fastq
arguments, such as--use-bases-mask
.
The compute workflow begins with running one instance of cellranger-atac mkfastq
for each flow cell of data being analyzed. Once the data are successfully demultiplexed, one instance of cellranger-atac count
is run for each Epi ATAC library, independent of the number of sequencing runs of each library.
In the first example, two 10x Genomics libraries (each processed through a separate Chromium chip channel) are multiplexed on a single flow cell. After running cellranger-atac mkfastq
, run a separate instance of the cellranger-atac count
pipeline on each library:
data:image/s3,"s3://crabby-images/08c46/08c4690c57551df35701bd49494a4d35619b7178" alt=""
In the second example, one 10x Genomics library was sequenced on two flow cells. After running cellranger-atac mkfastq
, run a single instance of the cellranger-atac count
pipeline on all the FASTQ files generated:
data:image/s3,"s3://crabby-images/aae94/aae94b057124eba617520824d35d82b66bdbe77d" alt=""
The cellranger-atac mkfastq
pipeline accepts additional options beyond those shown in the table below because it is a wrapper around bcl2fastq
. Consult the User Guide for Illumina's bcl2fastq
for more information.
Parameter | Function |
---|---|
--run | Required. The path of Illumina BCL run folder. |
--id | Optional; defaults to the name of the flow cell referred to by --run . Name of the folder created by mkfastq . |
--samplesheet | Optional. Path to an Illumina Experiment Manager-compatible sample sheet which contains 10x Genomics sample index names (e.g., SI-NA-A1) in the sample index column. All other information, such as sample names and lanes, should be in the sample sheet. |
--sample-sheet | Optional. Equivalent to --samplesheet above. |
--csv | Optional. Path to a simple CSV with lane, sample, and index columns, which describe the way to demultiplex the flow cell. The index column should contain a 10x Genomics sample dual-index name (e.g., SI-NA-A12). This is an alternative to the Illumina IEM samplesheet, and will be ignored if --samplesheet is specified. |
--simple-csv | Optional. Equivalent to --csv above. |
--lanes | bcl2fastq option. Comma-delimited series of lanes to demultiplex (e.g. 1,3). Use this if you have a sample sheet for an entire flow cell but only want to generate a few lanes for further 10x Genomics analysis. |
--use-bases-mask | bcl2fastq option. Same meaning as for bcl2fastq . Use to clip extra bases off a read if you ran extra cycles for QC. |
--delete-undetermined | bcl2fastq option. Delete the Undetermined FASTQs generated by bcl2fastq . Useful if you are demultiplexing a small number of samples from a large flow cell. |
--barcode-mismatches | bcl2fastq option. Same meaning as for bcl2fastq . Use this option to change the number of allowed mismatches per index adapter (0, 1, 2). Default: 1. |
--output-dir | bcl2fastq option. Generate FASTQ output in a path of your own choosing, instead of flow_cell_id/outs/fastq_path . |
--project | bcl2fastq option. Custom project name, to override the sample sheet or to use in conjunction with the --csv argument. |
--jobmode | Martian option. Job manager to use. Valid options: local (default), sge , lsf , slurm or a .template file. |
--localcores | Martian option. Set max cores the pipeline may request at one time. Only applies when --jobmode=local . |
--localmem | Martian option. Set max GB the pipeline may request at one time. Only applies when --jobmode=local . |
The cellranger-atac mkfastq
pipeline recognizes two file formats for describing samples: a simple, three-column CSV format, or the Illumina Experiment Manager (IEM) sample sheet format used by bcl2fastq
. Both of these formats are illustrated below.
cellranger-atac mkfastq
pipeline. It cannot be used to run downstream pipelines (e.g. cellranger-atac count
).To follow along:
- Download the tiny-bcl-atac tar file.
- Untar the tiny-bcl tar file in a convenient location. This will create a new
tiny-bcl/
subdirectory. - Download the simple CSV layout file: cellranger-atac-tiny-bcl-simple-1.0.0.csv.
- Download the Illumina Experiment Manager sample sheet: cellranger-atac-tiny-bcl-samplesheet-1.0.0.csv.
A simple CSV sample sheet is recommended for most sequencing experiments. The simple CSV format has only three columns (Lane, Sample, Index), and is thus less prone to formatting errors. You can see an example of this in cellranger-atac-tiny-bcl-simple-1.0.0.csv
:
Lane,Sample,Index
1,test_sample_atac,SI-NA-C1
Here are the options for each column:
Lane | Which lane(s) of the flow cell to process. Can be either a single lane, a range (e.g., 2-4) or '*' for all lanes in the flow cell. |
Sample | The name of the sample. This name is the prefix to all the generated FASTQs, and corresponds to the --sample argument in all downstream 10x Genomics pipelines. Sample names must conform to the Illumina bcl2fastq naming requirements. Only letters, numbers, underscores, and hyphens are allowed; no other symbols, including dots ("."), are allowed. |
Index | The 10x Genomics sample index that was used in library construction, e.g. SI-NA-A12 |
To run cellranger-atac mkfastq
with a simple layout CSV, use the --csv
argument. Here's how to run cellranger-atac mkfastq
on the tiny-bcl-atac
sequencing run with the simple layout (replace /path/to/tiny_bcl
with the path to tiny-bcl
on your system):
$ cellranger-atac mkfastq --id=tiny-bcl \
--run=/path/to/tiny_bcl \
--csv=cellranger-atac-tiny-bcl-simple-1.0.0.csv
The cellranger-atac mkfastq
pipeline can also be run with a sample sheet in the Illumina Experiment Manager (IEM) format. An IEM sample sheet has several fields specific to running on Illumina platforms, including a [Data]
section where sample and index information is specified. cellranger-atac mkfastq
supports listing either index set names or the oligo sequences.
For example, "SI-NA-C1" refers to a 10x Genomics single-indexed sample index consisting of a set of four oligo sequences. In this example, only reads from lane 1 will be used. To demultiplex the given sample index across all lanes, omit the lanes column entirely.
[Data]
Lane,Sample_ID,index
1,test_sample,SI-NA-C1
In this example, the four index sequences for "SI-NA-C1" are specified in separate rows under the index column.
[Data]
Lane,Sample_ID,index
1,sample1,ATCTGATC
1,sample1,CGTGCTAA
1,sample1,GAGAAGGG
1,sample1,TCACTCCT
Sample names must conform to the Illumina bcl2fastq
naming requirements. Only letters, numbers, underscores, and hyphens are allowed. No other symbols, including dots ("."), are allowed.
Also note that while an authentic IEM sample sheet will contain other sections above the [Data] section, these are optional for demultiplexing. To avoid data loss from trimming, we do not recommend including adapter sequences in the [Settings] section of the sample sheet. For demultiplexing an existing run with cellranger-atac mkfastq
, only the [Data] section is required.
Next, run the cellranger-atac mkfastq
pipeline, using the --samplesheet
argument (replace /path/to/tiny-bcl-atac
with the path to tiny_bcl
on your system):
$ cellranger-atac mkfastq --id=tiny-bcl \
--run=/path/to/tiny_bcl \
--samplesheet=cellranger-atac-tiny-bcl-samplesheet-1.0.0.csv
If you encounter any preflight errors, refer to the Troubleshooting page.
Once the cellranger-atac mkfastq
pipeline has successfully completed, the output can be found in a new folder named with the value provided to cellranger-atac mkfastq
in the --id
option (if not specified, defaults to the name of the flow cell):
$ ls -l
drwxrwxr-x 4 jdoe jdoe 4096 Aug 29 15:29 tiny-bcl-atac
The key output files can be found in outs/fastq_path
, and are organized in the same manner as a conventional bcl2fastq
run:
$ ls -l tiny-bcl/outs/fastq_path/
drwxr-xr-x 3 jdoe jdoe 3 Aug 9 12:26 Reports
drwxr-xr-x 2 jdoe jdoe 8 Aug 9 12:26 Stats
drwxr-xr-x 3 jdoe jdoe 3 Aug 9 12:26 tiny-bcl
-rw-r--r-- 1 jdoe jdoe 20615106 Aug 9 12:26 Undetermined_S0_L001_I1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 151499694 Aug 9 12:26 Undetermined_S0_L001_R1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 52692701 Aug 9 12:26 Undetermined_S0_L001_R2_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 151499694 Aug 9 12:26 Undetermined_S0_L001_R3_001.fastq.gz
$ tree tiny-bcl/outs/fastq_path/tiny_bcl/
tiny-bcl/outs/fastq_path/tiny_bcl/
Sample1
Sample1_S1_L001_I1_001.fastq.gz
Sample1_S1_L001_R1_001.fastq.gz
Sample1_S1_L001_R2_001.fastq.gz
Sample1_S1_L001_R3_001.fastq.gz
This example was produced with a sample sheet that included tiny-bcl
as the Sample_Project
, so the directory containing the sample folders is named tiny-bcl
. If a Sample_Project
was not specified, or if a simple layout CSV file was used (which does not have a Sample_Project
column), the directory containing the sample folders would be named according to the flow cell ID instead.
To remove the Undetermined
FASTQs from the output, you can run mkfastq
with the --delete-undetermined
flag. To see all cellranger-atac mkfastq
options, run cellranger-atac mkfastq --help
.
If the pipeline crashes while running cellranger-atac mkfastq
, upload this tarball (with the extension .mri.tgz
) found in your output directory. Replace your@email.edu
with your email
cellranger-atac upload your@email.edu jobid.mri.tgz
where jobid
is what you input into the --id
option of mkfastq
(if not specified, defaults to the ID of the flow cell).
This tarball contains numerous diagnostic logs that we can use for debugging.
You will receive an automated email from 10x Genomics. If not, email support@10xgenomics.com. For the fastest service, respond with the following:
- The exact
cellranger-atac
command line you used. - The sample sheet that you used.
- The
RunInfo.xml
andrunParameters.xml
files from your BCL directory. - The kind of libraries you are demultiplexing (including chemistry).
- Run cellranger-atac count.
- Learn how to specify FASTQs: Input FASTQ files must conform to naming conventions to successfully complete.