Support homeCell RangerAlgorithms Overview
Cell Ranger's CRISPR Guide Capture Algorithm

Cell Ranger's CRISPR Guide Capture Algorithm

Feature Barcode technology may be used to perform pooled CRISPR screens in an efficient and scalable fashion. For an introduction to pooled CRISPR screens, see: Perturb-Seq (Adamson et al., 2016, and Dixit et al., 2016), CRISP-seq (Jaitin et al., 2016), CROP-seq (Datlinger et al., 2017), or CRISPR-QTL (Gasperini et al., 2019). See the Glossary for definitions of terms.

The goal of a pooled CRISPR screen is to use CRISPR to perturb the expression of a list of pre-identified genes and quantitatively measure the effects of those perturbations on the transcriptome of the cells of interest. The cells are typically transfected in a pooled fashion with a number of plasmids that code for guide RNAs that target the pre-selected genes of interest. Since the assay captures both transcripts and transfected guide RNAs from each cell, one can correlate the changes in the transcriptome with the perturbations received by each sub-group of cells.

A wide variety of experimental designs are used in pooled CRISPR screens, depending on the nature of the biological questions being investigated and the scope of the experiment. We emphasize three general principles commonly employed in such experiments.

  1. Multiple guide RNAs per target gene. In general, it is hard to predict the functional efficacy of a guide RNA construct purely from its in silico design. In order to mitigate the risk of non-functional guide RNA molecules that do not perturb the expression of their target genes significantly, pooled CRISPR screens typically employ 2-5 guide RNA constructs per target gene.

  2. Non-targeting guide RNAs that function as negative controls. In order to measure the effectiveness of a particular guide RNA construct in perturbing the expression of its target gene, or the effects of such a perturbation on the rest of the transcriptome, one would need to perform a differential expression analysis where the cells expressing the relevant guide RNA(s) are compared with control cells. The experimental design typically includes control guide RNA constructs that are explicitly designed not to target any annotated genes in the reference transcriptome; these guide RNAs are called "non-targeting" guides. The control cells used in the differential expression analyses are typically cells identified as containing only (some combination of) non-targeting guides. In order to account for possible error in the design or transfection of these non-targeting guide RNA constructs, typically more than one such construct (usually 2-5) are used in the experiment.

  3. Carefully designed and validated transfection protocol. Based on the particular transfection protocol used in the assay, the distribution of guide RNA constructs among cells can vary widely, from as few as a median of 1 guide per cell to as high as 15 per cell. The transfection protocol is usually carefully designed based on the requirements imposed by the biological questions of interest, such as the median number of guide RNA constructs per cell or the number of cells required per perturbation of interest. In addition, typically the transfection protocol is validated by some combination of PCR-based techniques and next-generation sequencing (see Methods sections of the References).

In pooled CRISPR screens, the presence of low levels of ambient guide RNA in solution typically leads to a small number of "background" UMI counts even in cells that do not express any guide RNA constructs. Protospacer calling is used to identify CRISPR sgRNA associated with cell barcodes and separate signal from background UMIs.

During the Protospacer Calling step, Cell Ranger identifies, for each guide RNA construct specified in the Feature Reference CSV File, the sub-population of cells that express that particular guide RNA significantly above background.

Cell Ranger models two cell populations for each guide RNA: those expressing the guide and those with UMI counts from ambient guide RNA only. It then fits a Gaussian Mixture Model to the log-transformed distribution of Molecules/Cell for each guide RNA to differentiate these populations. This model assesses the likelihood that a cell is part of the guide-expressing population versus the background. Cells are classified as expressing the guide RNA if they have a high probability of belonging to that guide-expressing population and possess at least three guide RNA UMIs. This process is applied separately for each guide RNA in the Feature Reference CSV File. Refer to the Experimental Planning Guide for details.

Cell Ranger v7.0 and later allow CRISPR Guide Capture datasets to be aggregated. Protospacer calling is performed again on the aggregated data.

In pooled CRISPR screens, two central questions arise. First, to what extent did the expression of the target genes change amongst those cells expressing the guide RNAs that targeted those genes ("Perturbation Efficiency")? Second, what effects did these perturbations have on the transcriptome of those cells ("Perturbation Effects")?

Both questions rely on differential expression analyses. As with Gene Expression, Cell Ranger uses the quick and simple method sSeq (Yu, Huber, & Vitek, 2013) in order to find differentially expressed genes between the perturbed cells and the control cells (cells that only contain guide RNAs designed specifically to be non-targeting). For details on the implementation of sSeq within Cell Ranger, see Gene Expression.

To quantify Perturbation Efficiency, we report the log2-fold-change in the expression of each target gene. To address transcriptome-wide Perturbation Effects, we provide a list of top perturbed genes for each perturbation, in addition to a list of how every gene in the reference transcriptome changed under each perturbation.

Each of the above results are calculated "by feature," where the cells are grouped based on the combinations of guide RNAs they contain, or "by target," where they are grouped based on the combinations of genes targeted by those guide RNAs. (The latter can lead to increased statistical power in cases where each gene is targeted by multiple guides, since cells where the same combinations of genes are perturbed may be grouped together.)

CRISPR output files are described in detail, with examples, here: CRISPR output files.