The cellranger count pipeline aligns sequencing reads in FASTQ files to a reference transcriptome and generates a .cloupe
file for visualization and analysis in Loupe Browser, along with a number of other outputs compatible with other publicly-available tools for further analysis.
We will call our working directory the yard. Start by making a directory to run the analysis in.
mkdir ~/yard/run_cellranger_count
cd ~/yard/run_cellranger_count
Next, download FASTQ files from one of the publicly-available data sets on the 10x Genomics support site. This example uses the 1,000 PBMC data set from human peripheral blood mononuclear cells (PBMC), consisting of lymphocytes (T cells, B cell, and NK kills) and monocytes.
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tar
The size of this dataset is 5.17G and takes a few minutes to download.
Since this is a tar file and not a tar.gz
file, you don't need the -z
argument used in previous tutorials to extract it.
tar -xvf pbmc_1k_v3_fastqs.tar
The output is similar to the following:
pbmc_1k_v3_fastqs/
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_I1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_I1_001.fastq.gz
Now you have a directory of two sets of FASTQ files, and can see they are named based on the bcl2fastq2 naming convention: Sample_S1_L00X_R1_001.fastq.gz
. The files names indicate that they were all from the same sample called pbmc_1k_v3 and the library was run on two lanes - Lane 1: L001 and Lane 2: L002.
Next, you need a reference transcriptome. From the download page for the FASTQ files it showed that these are human cells. There are several prebuilt human reference transcriptome packages on the 10x Genomics support site. Download the latest package and decompress it.
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
tar -zxvf refdata-gex-GRCh38-2020-A.tar.gz
The size of the reference genome is 10.6G and takes ~five minutes to download.
Saving data and reference files
Once you have downloaded and extracted the reference transcriptome files, you can keep them for future runs. However, if you need to delete to save space on your server between runs, the pre-compiled reference files are publicly-available, and can re-downloaded if needed.
Your raw data FASTQ files, however, are raw data that cannot be replaced. We strongly recommend backing these up and archiving them in case something happens to the disk space.
Once you have FASTQ files and a reference transcriptome, you are ready to run cellranger count
.
Print the usage statement to see what is needed to build the command.
cellranger count --help
The output is similar to the following:
cellranger-count
Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well
USAGE:
cellranger count [FLAGS] [OPTIONS] --id <ID> --transcriptome <PATH>
FLAGS:
--no-bam Do not generate a bam file
--nosecondary Disable secondary analysis, e.g. clustering. Optional
--include-introns Include intronic reads in count
--no-libraries Proceed with processing using a --feature-ref but no Feature Barcode libraries
specified with the 'libraries' flag
--no-target-umi-filter Turn off the target UMI filtering subpipeline. Only applies when --target-panel is
used
--dry Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
--disable-ui Do not serve the web UI
--noexit Keep web UI running after pipestance completes or fails
--nopreflight Skip preflight checks
-h, --help Prints help information
...
To run cellranger count
, you need to specify an --id
. This can be any string, which is a sequence of alpha-numeric characters, underscores, or dashes and no spaces, that is less than 64 characters. Cell Ranger creates an output directory that is named using this id. This directory is called a "pipeline instance" or pipestance for short.
The --fastqs
should be a path to the directory containing the FASTQ files. If you demultiplexed your data using cellranger mkfastq, you can use the path to fastq_path
directory in the outs
from the pipeline. If there is more than one sample in the FASTQ directory, use the --sample
argument to specify which samples to use. This --sample
argument works off of the sample id at the beginning of the FASTQ file name. It is unnecessary for this tutorial run because all of the FASTQ files are from the same sample, but it is included as an example. The last argument needed is the path to the --transcriptome
reference package. Be sure to edit the file paths in the command below.
cellranger count --id=run_count_1kpbmcs \
--fastqs=/mnt/home/user.name/yard/run_cellranger_count/pbmc_1k_v3_fastqs \
--sample=pbmc_1k_v3 \
--transcriptome=/mnt/home/user.name/yard/run_cellranger_count/refdata-gex-GRCh38-2020-A
Since this is a full-sized dataset, it can take several hours to complete.
The output is similar to the following:
/mnt/yard/user.name/yard/apps/cellranger-7.2.0/bin
cellranger count (7.2.0)
Copyright (c) 2021 10x Genomics, Inc. All rights reserved.
-------------------------------------------------------------------------------
Martian Runtime - v4.0.6
...
2021-10-15 17:12:42 [perform] Serializing pipestance performance data.
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
When the output of the cellranger count
command says, “Pipestance completed successfully!”, this means the job is done.
The cellranger count
pipeline outputs are in the pipestance directory in the outs folder. List the contents of this directory with ls -1
.
ls -1 run_count_1kpbmcs/outs
The output is similar to the following:
├── analysis
├── cloupe.cloupe
├── filtered_feature_bc_matrix
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── possorted_genome_bam.bam
├── possorted_genome_bam.bam.bai
├── raw_feature_bc_matrix
├── raw_feature_bc_matrix.h5
└── web_summary.html
Check the web_summary.html to see results of the experiment. You can also load the cloupe.cloupe file into the Loupe Browser and start an analysis. This outs/
directory also contains a number of outputs that can be used as input for software tools developed outside of 10x Genomics, such as the Seurat R package.