Blog
Jan 12, 2023 / Oncology / Neuroscience / Immunology / Developmental Biology

Single cell and spatial assays meet long-read sequencing

Olivia Habern

You’ve heard the saying, “less is more,” and frequently it holds true. But in the world of transcriptomics, there can be something important hidden in the more.

Traditional single cell gene expression library preparation methods fragment cDNA for short-read sequencing. While this powerful sequencing approach can provide a holistic view of the transcriptomic landscape of a cell, it may not reveal all the secrets contained in full-length transcripts. This includes possible transcript isoforms: sequences of RNA transcribed from the same gene but combined in different ways to produce different, but related, mRNA transcripts.

Now, with recent advances in the long-read sequencing space, single cell and spatial gene expression are compatible with long-read sequencing. This enables detection of transcript isoforms in both specific cell types and spatially resolved tissue sections, providing a solution to understand how isoform expression may be influenced by cell type or specific regions of tissue, and how the same gene could have multiple regulatory functions depending on this biological context—a level of complexity that cannot be captured with standard bulk sequencing.

Keep reading to explore the biological relevance of transcript isoforms, challenges to detecting them, and the advantages of integrating long-read sequencing into your single cell and spatial profiling workflows.

Exploring the significance of alternative transcript isoforms

The “central dogma” of molecular biology is that biological information flows from DNA, organized in genes, to RNA through a process called transcription, and from RNA to protein through a process called translation. But scientists have come to realize that the process isn’t exactly linear. Genes don’t always have a one-to-one relationship with their end product.

Through alternative transcription start/stop sites, polyadenylation sites, and/or alternative splicing sites, the same gene can be manipulated to produce diverse mRNA sequences, or isoforms, which have varying protein-coding capacities (1). Reyes and Huber write, “In mammalian genomes, at least 70% of genes have multiple polyadenylation sites, >50% of genes have alternative transcription start sites and nearly all genes undergo alternative splicing… These molecular processes have the potential to substantially increase the repertoire of transcripts, proteins and functions encoded by mammalian genomes” (2). More specifically, there are only around 20,000 human protein-coding genes, but scientists estimate there are almost 150,000 transcript isoforms (3).

The question is, are those isoforms and the proteins they encode good, bad, or ugly?

In many biological systems, isoform diversity has not been well characterized, but there is evidence to suggest that isoforms can play a positive role in supporting a range of neural functions (1), as well as cellular development and differentiation, including processes like embryonic hematopoiesis (2).

Dysregulation of isoform expression, however, is also associated with human disease. Jiang and Chen write that, “15% of human hereditary diseases and cancers are reported to be associated with alternative splicing” (3). Splicing factors are the most frequently mutated genes in myelodysplastic syndromes, a group of cancers in the bone marrow (3).

Changes in the ratio of isoforms, commonly a result of dysregulation in alternative splicing, can also drive disease. For example, tauopathies are a class of neurodegenerative disorders—among which Alzheimer's disease is included—associated with an imbalanced ratio of tau protein isoforms, produced by errant alternative splicing of the gene MAPT (microtubule associated protein tau). The rare kidney disease, Frasier syndrome, is linked to a mutation that prevents the synthesis of a particular isoform of the Wilms' tumor gene, WT1 (3).

As these examples demonstrate, it is imperative that scientists can not only identify transcript isoforms, but also accurately measure their abundance (and at a resolution that can provide meaningful biological context). Such insights could shed light on the contributions of unique isoforms to physiologic states, provide a better understanding of disease mechanisms, and uncover possible therapeutic targets.

Searching for transcript isoforms in cells and tissue 

Despite their significance, transcript isoforms can hide in the crowd. Bulk sequencing methods for isoform detection only provide an average of the transcripts expressed across a population of cells, leaving out crucial biological context to understand cell type–specific isoform activity. This also runs the risk of masking rare or low-expressed transcript isoforms in small subpopulations of cells. 3’- and 5’-biased short-read sequencing methods provide single cell resolution, but require de novo transcript assembly to identify possible transcript isoforms. With this approach, scientists piece together short sequences of RNA to determine which isoforms are expressed, but results may not be definitive and it can be difficult to quantify gene expression of isoforms.

To unambiguously identify which isoforms are expressed and gain a true understanding of their biology, scientists need the ability to sequence full-length, intact transcripts at single cell resolution. Utilizing advanced bioinformatics pipelines, researchers can efficiently process sequencing data to accurately identify and quantify transcript isoforms at single cell resolution. We’re excited to share new long-read sequencing library preparation workflows from 10x Compatible Partners, Oxford Nanopore Technologies and Pacific Biosciences, that are compatible with not only Chromium Single Cell, but also Visium Spatial assays, making it possible for scientists to identify alternative transcript isoforms in single cells and in spatially resolved tissue.

Researchers from Weill Cornell Medicine, led by Principal Investigator Hagen Tilgner, PhD, have spearheaded the application of single cell and single nuclei long-read sequencing protocols, with two Nature Biotechnology papers studying RNA isoforms in diverse cerebellar cell types (4) and cell type–specific inclusion of exons associated with autism in adult human frontal cortex (5).

In collaboration with 10x Genomics scientists, Dr. Tilgner and team have validated isoform sequencing methods in tissue as well. Using Visium Spatial Gene Expression to analyze postnatal mouse brain tissue, they observed mutually exclusive isoform expression domains between two adjacent brain regions, suggesting the microenvironment dictated brain region–specific transcript splicing (6). This finding wouldn’t have been apparent with their single cell isoform sequencing data alone, suggesting that not just the specific composition of cell types, but their spatial organization, is required for a fuller understanding of isoform dynamics.

Increasing opportunities for discovery with long-read sequencing 

Is there a place for long-read sequencing in the 10x Genomics technology ecosystem? In short: yes. Transcript isoforms represent an understudied, but likely crucial domain for human health, development, and disease. Long-read sequencing opens the door to new discoveries within this domain, building on the insights into biological complexity enabled by 3’ and 5’ short-read sequencing approaches.

Want to learn how you can build long-read sequencing into your workflows, or explore modified long-read sequencing protocols and compatible assays? Check out the following articles and application note:

References:

  1. Ray T, et al. Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat Commun 11: 3328 (2020). doi: 10.1038/s41467-020-17009-7
  2. Reyes A and Huber W. Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res 46: 582–592 (2018). doi: 10.1093/nar/gkx1165
  3. Jiang W and Chen L. Alternative splicing: Human disease and quantitative analysis from high-throughput sequencing. Comput Struct Biotechnol J 19: 183–195 (2021). doi: 10.1016/j.csbj.2020.12.009
  4. Gupta I, et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol 36: 1197–1202 (2018). doi: 10.1038/nbt.4259
  5. Hardwick S, et al. Single-nuclei isoform RNA sequencing unlocks barcoded exon connectivity in frozen brain tissue. Nat Biotechnol 40: 1082–1092 (2022). doi: 10.1038/s41587-022-01231-3
  6. Joglekar A, et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nat Commun 12: 463 (2021). doi: 10.1038/s41467-020-20343-5