In this tutorial you will:
For successful run of this tutorial, you must:
- Be comfortable in the Linux environment
- Have familiarity running command line tools
- Choose a compute platform
- Have access to a system that meets the minimum system requirements
You can download and install spaceranger
in any location. For this tutorial, we will create a working directory spaceranger_tutorial
and continue all the remaining steps in it.
# Create working directory
mkdir spaceranger_tutorial
# Change directory
cd spaceranger_tutorial
To install the latest version of spaceranger
- Go to the Downloads page
- Fill out the 10x Genomics End User Software License Agreement information
- Copy and paste the download command from either one of the command line utilities (
curl
orwget
). You should see a download progress status similar to the output below.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
10 1220M 10 125M 0 0 30.9M 0 0:00:39 0:00:04 0:00:35 30.8M
This downloads the spaceranger
tarball spaceranger-2.0.0.tar.gz
to your working directory. Next, we extract the contents.
# Extract spaceranger tarball
tar -zxvf spaceranger-2.0.0.tar.gz
# Expected output
spaceranger-2.0.0/
spaceranger-2.0.0/.env.json
spaceranger-2.0.0/.version
spaceranger-2.0.0/LICENSE
spaceranger-2.0.0/builtwith.json
spaceranger-2.0.0/sourceme.bash
spaceranger-2.0.0/sourceme.csh
spaceranger-2.0.0/bin/
spaceranger-2.0.0/bin/_spaceranger_internal
spaceranger-2.0.0/bin/spaceranger
spaceranger-2.0.0/bin/rna/
spaceranger-2.0.0/bin/rna/_includes
...
When the extraction process is finished, you will have access to the command prompt and the folder spaceranger-2.0.0
will be created in the working directory.
The spaceranger-2.0.0
folder contains the executable and all of the required dependencies. The key folders that you would use are highlighted.
1spaceranger-2.0.0 2├── bin 3├── external 4│ ├── anaconda 5│ ├── martian 6│ │ └── jobmanagers 7│ ├── spaceranger_tiny_inputs 8│ └── spaceranger_tiny_ref 9├── lib 10│ ├── bin 11│ │ ├── bamtofastq 12│ │ ├── redstone 13│ │ └── ... 14│ └── python 15│ └── cellranger 16│ └── barcodes 17│ ├── visium-v1_coordinates.txt 18│ ├── visium-v2_coordinates.txt 19│ ├── visium-v4_coordinates.txt 20│ ├── visium-v5_coordinates.txt 21│ └── ... 22├── mro 23├── probe_sets 24│ ├── Visium_Human_Transcriptome_Probe_Set_v1.0_GRCh38-2020-A.csv 25│ ├── Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv 26│ └── Visium_Mouse_Transcriptome_Probe_Set_v1.0_mm10-2020-A.csv 27├── target_panels 28│ ├── gene_signature_v1.0_GRCh38-2020-A.target_panel.csv 29│ ├── immunology_v1.0_GRCh38-2020-A.target_panel.csv 30│ ├── neuroscience_v1.0_GRCh38-2020-A.target_panel.csv 31│ └── pan_cancer_v1.0_GRCh38-2020-A.target_panel.csv 32└── THIRD-PARTY-LICENSES.spaceranger.txt
antibody_refs
folder contains the validated antibody panel used in combined GEX + PEX analysisprobe_sets
contains the probe set reference CSV file used in analysis for FFPE sampleslib/python/cellranger/barcodes
folder contains visium barcodes whitelist and their coordinates on the slidelib/bin
folder contains tools such asbamtofastq
which is to convert 10x Genomics BAM files to FASTQ andredstone
to enable data transfer to 10x Genomicsexternal/spaceranger_tiny_ref
andexternal/spaceranger_tiny_inputs
are utilized forspaceranger testrun
external/martian/jobmanagers
folder contains sample templates for commonly used job schedulersTHIRD-PARTY-LICENSES.spaceranger.txt
file contains all the licenses for dependencies used in Space Ranger
spaceranger
is now installed. There are two ways to specify spaceranger
in the commands.
- Use the full path to the
spaceranger-2.0.0
folder
# Method 1
## Change directory to spaceranger-2.0.0
cd spaceranger-2.0.0
## Get the full path
pwd
## Change working directory back to spaceranger_tutorial
cd ..
# Method 2
## Get the full path
readlink -f spaceranger-2.0.0
# Expected output
## The path will change dpending on the compute setup you are using.
/PATH/TO/WORKING_DIRECTORY/spaceranger_tutorial/spaceranger-2.0.0
- Adding
spaceranger-2.0.0
to your $PATH variable
# Method 1
## Get the full path
readlink -f spaceranger-2.0.0
## Export PATH by providing the full path
export PATH=/PATH/TO/WORKING_DIRECTORY/spaceranger_tutorial/spaceranger-2.0.0:$PATH
## Confirm installation
which spaceranger
# Method 2
## Change directory to spaceranger-2.0.0
cd spaceranger-2.0.0
## Export PATH by specifying a shell variable
export PATH=$PWD:$PATH
## Confirm installation
which spaceranger
## Change working directory back to spaceranger_tutorial
cd ..
# Expected output
~/spaceranger_tutorial/spaceranger-2.0.0
The tilde symbolizes your home directory which will be same as /PATH/TO/WORKING_DIRECTORY
as before.
You can now invoke spaceranger
at the command prompt to see the usage statement.
# Input
## When using full path to the spaceranger folder
/PATH/TO/WORKING_DIRECTORY/spaceranger_tutorial/spaceranger-2.0.0/spaceranger
## When adding spaceranger folder to the $PATH variable
spaceranger
# Expected output
USAGE:
spaceranger <SUBCOMMAND>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
count Count gene expression and feature barcoding reads
from a single capture area
aggr Aggregate data from multiple 'spaceranger count' runs
...
testrun Execute the 'count' pipeline on a small test dataset
upload Upload analysis logs to 10x Genomics support
sitecheck Collect linux system configuration information
help Prints this message or the help of the given subcommand(s)
For the rest of the tutorial, we will invoke spaceranger
assuming addition of the spaceranger-2.0.0
folder to the $PATH
variable.
spaceranger sitecheck
enables you to check your system configuration to ensure it meets the minimum recommended requirements. Run the command and use >
to re-direct the output to a text file.
spaceranger sitecheck > sitecheck.txt
Open the file with less
and use /
(e.g. /CPU Cores
) to search for specific sections with the file. Press 'q' to quit.
less sitecheck.txt
We will examine some key configuration metrics and compare against the recommended system requirements.
- CPU Cores
CPU Cores
grep -c processor /proc/cpuinfo
---------------------------------------------------------------------
96
=====================================================================
...
This system has 96 CPUs and is capable of running spaceranger
which requires at least 8 CPUs, preferably 32.
- Memory Total
Memory Total
grep MemTotal /proc/meminfo | cut -d ':' -f 2 | sed 's/^[ \t]*//'
---------------------------------------------------------------------
289287896 kB
=====================================================================
...
For direct comparison, let's convert kB to GB:
which satisfies the system requirements of having at least [64GB RAM, preferably 128].
- User Limits
1User Limits 2bash -c 'ulimit -a' 3--------------------------------------------------------------------- 4core file size (blocks, -c) 0 5data seg size (kbytes, -d) unlimited 6scheduling priority (-e) 0 7file size (blocks, -f) unlimited 8pending signals (-i) 1520514 9max locked memory (kbytes, -l) 64 10max memory size (kbytes, -m) unlimited 11open files (-n) 10240 12pipe size (512 bytes, -p) 8 13POSIX message queues (bytes, -q) 819200 14real-time priority (-r) 0 15stack size (kbytes, -s) 8192 16cpu time (seconds, -t) unlimited 17max user processes (-u) 131072 18virtual memory (kbytes, -v) unlimited 19file locks (-x) unlimited 20=====================================================================
The two metrics to consider are highlighted.
a. For the max user processes, the recommendation is the limit to be 64 per core. Assuming we use all 96 cores,
b. For max open files, the system limit which is the recommendation. While the pipelines may run at lower open file limit, caution is urged. This value is dependent on the system, the sample type and number of samples being run. In case the pipeline errors, it is advisable to increase the user limit ulimit
and try again.
- Global File Limit
Global File Limit
cat /proc/sys/fs/file-{max,nr}
---------------------------------------------------------------------
2921445
68736 0 262144
=====================================================================
The value satisfies the minimum requirement of 10k per GB RAM , where 289 GB is the total memory of the system.
The software support team can review your sitecheck
results. There are two ways to send it across:
- If the compute platform has access to internet, use the
upload
pipeline replacing the email address with your emailspaceranger upload [email protected] sitecheck.txt
- If the compute platform is not connected to the internet, you can send the
sitecheck.txt
as an attachment to [email protected].
We can verify the installation using spaceranger testrun
. This pipeline can be run in two configurations depending on the internet connectivity of the compute platform.
# With internet access
spaceranger testrun --id=verify_install
# Without internet access
spaceranger testrun --no-internet --id=verify_install
# Expected output
Martian Runtime - v4.0.5
Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path
Checking optional arguments...
...
Pipestance completed successfully!
Successful completion of the testrun
by extension implies successful installation of spaceranger
.
Q: How can I use multiple versions of spaceranger
?
Sometimes it is useful to have access to older as well as newer versions of spaceranger
. There are two suggested ways to achieve this:
- Update
$PATH
to point to the latest version
Since the spaceranger
tarball comes annotated with the version number, you can download and uninstall the latest version and subsequently update the [$PATH] variable to point to the version you wish to use.
- Use virtual environments
You can install and set up conda which functions as both package and environment manager. Use of virtual environments for running software provides many useful benefits such as reproducibility, compatibility, versioning as well as giving admin permissions on shared compute environments such as High-Performance Computing clusters (HPCs).