Understanding tumor ecosystems by single-cell sequencing: promises and limitations
Cellular heterogeneity within and across tumors has been a major obstacle in understanding and treating cancer, and the complex heterogeneity is masked if bulk tumor tissues are used for analysis. The advent of rapidly developing single-cell sequencing technologies, which include methods related to single-cell genome, epigenome, transcriptome, and multi-omics sequencing, have been applied to cancer research and led to exciting new findings in the fields of cancer evolution, metastasis, resistance to therapy, and tumor microenvironment. In this review, we discuss recent advances and limitations of these new technologies and their potential applications in cancer studies.
Copy number alteration
Circulating tumor cells
Disseminated tumor cells
Fluorescence in situ hybridization
Single-cell RNA sequencing
Single molecule fluorescence in situ hybridization
Single nucleotide variant
A single cell is the ultimate unit of life activity, in which genetic mechanisms and the cellular environment interplay with each other and shape the formation and function of such complex structures as tissues and organs. Dissecting the composition and characterizing the interaction, dynamics, and function at the single-cell resolution are crucial for fully understanding the biology of almost all life phenomena, under both normal and diseased conditions. Cancer, a disease caused by somatic mutations conferring uncontrolled proliferation and invasiveness, could in particular benefit from advances in single-cell analysis. During oncogenesis, different populations of cancer cells that are genetically heterogeneous emerge, evolve, and interact with cells in the tumor microenvironment, which leads to host metabolism hijacking, immune evasion, metastasis to other body parts, and eventual mortality. Cancer cells can also manifest resistance to various therapeutic drugs through cellular heterogeneity and plasticity. Cancer is increasingly viewed as a ‘tumor ecosystem’, a community in which tumor cells cooperate with other tumor cells and host cells in their microenvironment, and can also adapt and evolve to changing conditions [1, 2, 3, 4, 5].
Detailed understanding of tumor ecosystems at single-cell resolution has been limited for technological reasons. Conventional genomic, transcriptomic, and epigenomic sequencing protocols require microgram-level input materials, and so cancer-related genomic studies were largely limited to bulk tumor sequencing, which does not address intratumor heterogeneity and complexity. The advent of single-cell sequencing technologies [6, 7, 8] has shifted cancer research to a new paradigm and revolutionized our understanding of cancer evolution [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], tumor heterogeneity [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46], and the tumor microenvironment [47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]. Development of single-cell sequencing technologies and the applications in cancer research have been astonishing in the past decade, but many challenges still exist and much remains to be explored. Single-cell cancer genomic studies have been reviewed previously [60, 61, 62, 63]. In this review, we summarize recent progress and limitations in cancer sample single-cell sequencing with a focus on the dissection of tumor ecosystems.
Overview of single-cell sequencing and analysis
Accompanying the tremendous progress of experimental single-cell sequencing technologies, specialized bioinformatics and algorithmic approaches have also been developed to best interpret the single-cell data while reducing their technological noise. Examples of these approaches include the imputation of dropout events [92, 93, 94, 95], normalization and correction of batch effects [96, 97, 98, 99, 100], clustering for identification of cell types [98, 101, 102, 103, 104, 105, 106, 107, 108], pseudo-temporal trajectory inference [109, 110, 111, 112], spatial position inference [87, 88, 90], and data visualization [102, 113, 114, 115]. Progress in this area requires the application of statistics, probability theory, and computing technologies, which lead to new algorithms, software packages, databases, and web servers. Detailed information of specific single-cell technologies and the underlying principles of the algorithms have been elegantly discussed in other reviews [61, 64, 65, 66, 67, 68, 69, 70, 72, 116, 117, 118, 119, 120, 121, 122, 123]. This myriad of experimental and computational methods is becoming the new foundation for uncovering the mystery of cancer complexity at the single-cell resolution.
Despite the dramatic advances, substantial limitations and challenges still exist in single-cell sequencing. The first challenge lies in the technological noise introduced during the amplification step. Notable allelic dropouts (i.e., amplification and sequencing of only one allele of a particular gene in a diploid/multiploid cell) and non-uniform genome coverage hinder the accurate detection of single nucleotide variants (SNVs) at the genome or exome level. These problems can be partially alleviated by the LIANTI (linear amplification via transposon insertion) method , which implements a linear genomic amplification by bacterial transposons and reportedly reaches improvements in genome coverage (~ 97%), allelic dropout rates (< 0.19) and false negative rates (< 0.47). Similarly, in single-cell RNA sequencing (scRNA-seq), lowly expressed genes are prone to dropout and susceptible to technological noise even when detected, although they often encode proteins with important regulatory or signaling functions. These technological issues are more profound for scRNA-seq technologies designed to offer higher throughput [81, 84]. Although many computational methods are available to model or impute dropout events [92, 94, 95], their performances vary and may introduce artificial biases. Much effort is needed to fully address this challenge.
The second challenge is that only a small fraction of cells from bulk tissues can be sequenced. Bulk tissues consist of millions of cells, but present studies can often only sequence hundreds to thousands of single cells because of technological and economic limitations [9, 10, 11, 20, 25, 124, 125, 126]. To what extent the sequenced cells represent the distribution of cells in the entire tissue of interest is not clear. A plausible solution to address this challenge would be to further improve the throughput of cellular captures, e.g., MARS-seq  and SPLiT-seq , or alternatively to combine bulk and single-cell sequencing together and then conduct deconvolution analysis . Deconvolution analysis for bulk RNA-seq data uses cell-type signature genes as inputs [128, 129, 130], which can be substituted by single-cell sequencing results, although critical computational challenges still exist, such as collinearity among single cells. If marker genes for known cell types are orthogonal to each other, the proportions of each cell type in a bulk sample can be reliably estimated. However, collinearity of gene expression exists widely among single cells, which complicates the deconvolution process. At present, successful deconvolution of bulk RNA-seq data based on scRNA-seq-defined signatures has been reported only in cases where orthogonal molecular signatures and fine cluster structures are well balanced . The wide usage of scRNA-seq based deconvolution will hinge upon the availability of comprehensive single-cell clusters and the development of general methods for selecting orthogonal signatures for each cell type.
Spatial information of single cells in the tissue is often lost during the isolation step and thus single-cell sequencing data typically do not show how cells are organized to implement the concerted function within a tissue of interest. Many new techniques have been developed to keep or restore the spatial information of sequenced single cells such as fluorescence in situ hybridization (FISH), single-molecule fluorescence in situ hybridization (smFISH), laser capture microdissection, laser scanning microscopy, including two-photon laser scanning microscopy, and fluorescence in situ sequencing [21, 30, 87, 88, 89, 90, 91, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143]. However, at present all of these techniques have inherent limitations and only apply to specific spatial architecture. For example, while FISH-based technologies can map the spatial distribution of a set of selected genes upon which the spatial information of single cells subject to RNA-seq can be reconstructed via probabilistic inference, the methods are limited to two dimensions and the inference is primarily dependent on the availability of marker genes that can properly discriminate the spatial characteristics with sufficient resolutions. Other conditions for valid marker genes include accurate and robust estimation of their expression levels, but this requirement can be greatly compromised by inherent dropout in scRNA-seq protocols. Accurate restoration of single cell spatial positions via FISH-based inference also requires replicable tissues for parallel FISH and scRNA-seq, which can be only approximately fulfilled on model organisms. For human cancers, however, such requirements usually cannot be met and spatial-recording methods have thus been proposed. With laser capture microdissection, single cells are obtained simultaneously when their spatial information is recorded. However, the cellular throughput of such methods is extremely limited due to operation difficulties, and the biological interpretation of the recorded spatial positions are confined because adjacent cells cannot be properly dissected for scRNA-seq, whereas sequenced cells are often distantly distributed. Low molecular throughput is also problematic with these recently developed in situ sequencing methods. Typically, only tens or hundreds of known genes can be in situ labeled or sequenced, far from the requirement of fully understanding the molecular landscapes of single cells of interest. Furthermore, the replicability of such complicated experiments also imposes barriers for their practical applications to human samples.
Because single-cell sequencing captures individual cells at a particular time point, other factors such as cell cycle and functional state must be considered. By contrast, these factors are often ignored in bulk sequencing due to the average effect. Cell cycle phases can be discerned by phase-specific expression analysis [144, 145, 146], but cell types and cell states can be hard to distinguish. Sometimes even cancer cells cannot be easily distinguished from normal cells, although inferred DNA copy numbers are often used for this purpose [22, 47, 51]. More robust methods are needed for cell type determination in silico.
Compared to traditional bulk sequencing technologies, which characterize samples via a gene-by-sample matrix, single-cell sequencing adds a cellular layer between genes and samples, which results in a gene-by-cell-by-sample data structure. Addition of the cellular dimension allows simultaneous characterization of samples at both the molecular and cellular level. However, bioinformatics and algorithmic methods for single-cell sequencing data analysis are generally developed for gene-by-cell data, which essentially have the same structure with the gene-by-sample matrices. Although methods exploiting the cellular dimension for phenotype classification have been proposed , tools sufficiently employing all the molecular, cellular, and sample information of the new data structure are still needed.
Given the maturation of single-cell sequencing technologies, especially scRNA-seq, the scale of datasets of one study soon increases from hundreds to tens of thousands and even millions of cells. For large programs, e.g., the Human Cell Atlas project , the volume of data demands more robust computer hardware and software. Although a few down-sampling or convolution-based methods have been proposed to manage large-scale scRNA-seq data for clustering and differential expression analysis [149, 150, 151], efficient and effective algorithms are of pressing need to circumvent these difficulties.
Complexity of tumor ecosystems
Recent progress of cancer studies based on single-cell sequencing
Heterogeneity of cancer clones
Mutation profiles of CTCs
Expression patterns of cancer cells upon treatment
Expression heterogeneity and dynamics of cancer cells
Expression patterns of CTCs
Heterogeneity of tumor microenvironment
Single-cell RNA-seq, mass cytometry
Heterogeneity of tumor microenvironment
Single-cell DNA-seq and RNA-seq
Integrated analysis of cancer cells
Epigenomics of cancer cells
Multi-omics analyses of the same cancer cells
Diversification of cancer cells
Spatial single-cell sequencing
Spatial heterogeneity and metastasis of cancer cells
Decomposition of clonal and sub-clonal tumor structure
Early success of single-cell sequencing applications in cancer research came from the studies of clonal and sub-clonal structure of primary tumors. DNA-based single-cell sequencing has been applied to breast [7, 20, 21, 26, 156, 157], kidney , bladder , and colon tumors [39, 160, 161], glioblastoma , and hematological malignancies such as acute myeloid leukemia and acute lymphoblastic leukemia [11, 33, 163, 164, 165]. These studies demonstrated the existence of common mutations among different cancer cell clones in individual cancer patients, which provided evidence for the origin of common cancerous cells and subsequent clonal evolution. Meanwhile, the application of scRNA-seq in glioma [22, 51, 166] demonstrated that cell differentiation of neural stem cells also contributes to tumor heterogeneity, thus supporting a cancer stem cell model. Notably, a recent study of intra-tumor diversification of colorectal cancers  integrated single-cell technologies and tumor organoid culture to show that cancer cells had several times more somatic mutations than normal cells. The authors of this study also observed that most of the mutations occurred during the final dominant clonal expansion, contributed by mutational processes absent from normal controls. In addition to canonical mutations, transcriptomic alterations and DNA methylation were cell-autonomous, stable, and followed the phylogenetic tree of each cancer. The study by Roerink et al.  provided a paradigm of cancer evolution by characterizing clonal and sub-clonal tumor structures, and indicated the potential dynamics of cancer progression. These findings exemplify the unique power of single-cell sequencing to characterize the diversity of cancer cells, resulting in different evolutionary models between cancers. In particular, single-cell data challenged the cancer stem cell model by showing that continued proliferation and clonal expansion formed the majority of tumor cells. Furthermore, scRNA-seq data supported the cancer stem cell model by demonstrating the contribution of cell differentiation to tumor heterogeneity. Copy number alternations (CNAs) and point mutations of cancer cells were subject to different evolutionary modes, with the former preferring punctuated evolution and the latter preferring gradual accumulation. Outstanding disparities need to be resolved before consistent models of cancer genesis and evolution can be applied to a wide range of cancers. Studies with larger sample size and higher molecular and cellular resolution are needed to reconcile various cancer evolution models. Sequencing analysis of single-cell-derived organoids could provide a template for investigating cancer evolution, but this should be extended to larger samples and other cancer types.
Monitoring cancer progress through characterization of circulating tumor cells
Circulating tumor cells (CTCs) are extremely rare in blood (1 in 106), with only tens of cells captured from a typical blood draw . The application of bulk sequencing to such limited input material for genomic exploration is difficult, hindering the analysis of cancer cell migration via blood. Single-cell sequencing has transformed the ability to characterize CTCs and has been used to identify metastatic potential of CTCs in cancer metastasis models, to monitor abnormal signaling pathways for drug-resistance prediction. By characterizing mutation profiles of CTCs, their tissue sources can be matched to the positions of primary and metastatic tumors [13, 16, 24, 167, 168]. This type of analysis holds great potential in early cancer detection and real-time monitoring of disease progression with or without treatment. Furthermore, the origin and destination of CTCs could be further explored to reveal the dissemination conditions of specific tumors. The application of DNA-based single-cell sequencing to CTCs in colon cancer , melanoma , lung cancer , and prostate cancer [171, 172] revealed that the copy number profiles of CTCs are highly similar to primary and metastatic tumors but point mutation profiles show much greater variations, consistent with punctuated evolution of CNAs and gradual evolution of point mutations observed within tumors. A recent integrative analysis of colon, breast, gastric, and prostate cancers by single-cell DNA sequencing compared the mutation profiles between primary tumor cells and CTCs, and revealed convergent evolution of CNAs from primary cancer tissues to CTCs . Remarkably, CNAs affecting the oncogene MYC and the tumor suppressor gene PTEN were observed only in a minor proportion of primary tumor cells but were present in all CTCs spanning multiple cancer types. These observations suggest that the potential of primary tumor cells to transit into CTCs are quite uneven, or otherwise strong selection pressure exists upon CTCs during the metastasis process. To resolve the detailed molecular mechanisms involved in the generation of CTCs in primary tumors to colonization in metastasis sites, it will be important to temporally trace the variations of CTCs during cancer progression from primary tumors to metastasis in both a research and clinical setting. Furthermore, scRNA-seq has been used in the study of CTCs in melanoma , breast , pancreatic [126, 174], and prostate cancers , revealing specific transcriptional signatures of CTCs relative to their primary and metastatic tumors. Extracellular matrix proteins were specifically expressed by CTCs, and plakoglobin appeared to be a key regulator of CTC clusters with survival advantages distinct from individual CTCs. Furthermore, abnormal signaling pathways for drug resistance prediction can be monitored using scRNA-seq of CTCs, as illustrated by the Miyamoto et al. study , in which scRNA-Seq profiling of 77 CTCs from 13 prostate cancer patients revealed extensive heterogeneity of the androgen receptor gene at both expression and splicing levels. Activation of non-canonical Wnt signaling was observed in the retrospective study of CTCs from patients treated with an androgen receptor inhibitor, indicating the potential resistance to therapy. Despite enviable progress, CTC studies remain limited by difficulties in the detection and enrichment of CTCs from blood. How to effectively obtain insight into the generation, progress, metastasis, and response to therapies of the entire tumor through the characterization of CTCs is still an elusive question.
Interrogating the genesis and evolution of therapy resistance
Chemotherapy and targeted therapies have been important weapons to combat cancers, but drug resistance is common for most tumors. Due to the complexity of cancer drug resistance, the underlying mechanisms remain poorly understood for most human cancers, which hampers the development of new approaches to overcome drug resistance. An important question to address is whether drug resistance arises from rare pre-existing subclones with drug-resistant phenotypes prior to treatment (intrinsic resistance) or, alternatively, is acquired through induction of new mutations conferring drug-resistance (acquired resistance). Acquired versus intrinsic resistance has been studied for decades in bacteria, which are single-cell systems , but remains elusive in most human cancers. Single-cell sequencing can be used to resolve tumor heterogeneity, reconstruct the evolutionary trajectories of cancer cells, and identify rare subclones, and has therefore been a promising method to address drug resistance [19, 25, 29, 47, 165]. The recent study by Kim et al.  of triple-negative breast cancers treated with neoadjuvant chemotherapy employed both single-cell DNA- and RNA-sequencing to resolve the genesis and evolution of drug-resistant clones. Using DNA data from 900 cells and RNA data from 6862 cells, CNAs in drug-resistant subclones were found to be pre-existing and adaptively selected while their expression profiles were acquired through transcriptional reprogramming in response to chemotherapy. These results suggest a model of drug-resistance acquisition involving both intrinsic and acquired modes of evolution. According to the newly proposed model, drug resistance-associated CNAs are acquired in rare tumor clones during several short evolutionary bursts at the earliest stages of tumor progression and then subject to gradual evolution. Following anti-tumor therapies, the selective pressure will result in two fates for tumor cells: clonal extinction and persistence, during which the pre-existing rare drug-resistant tumor clones will persist and become the major clones. The transcriptional programs of the persisting clones will converge on a few common pathways associated with the therapy-resistance phenotypes. Both genomic mutations and transcriptional reprogramming could be relevant in understanding therapy resistance as they might exert different modes of evolution for changes at individual levels. It remains unclear how different mechanisms coordinate with one another; therefore, more powerful technologies, such as single-cell multi-omics, are needed to address these questions.
Dissecting the tumor microenvironment to understand cancer immune evasion and metastasis
The tumor microenvironment represents all components of a solid tumor that are not cancer cells. Besides the genetic and non-genetic heterogeneity among tumor clones, heterogeneity among tumor-infiltrating stromal and immune cells in the microenvironment also plays vital roles in tumor growth, angiogenesis, immune evasion, metastasis, and responses to various therapies. With bulk DNA sequencing, the genomes of these cells in the microenvironment are indistinguishable from those of normal tissues and thus often interfere with the detection of tumor CNAs and point mutations by altering tumor purity. With bulk RNA sequencing, the mRNAs of these cells are intermingled with those of tumor cells, which makes it difficult to untangle the expression signals by tumor cells from those by microenvironment cells. The variable compositions of tumor microenvironment often become ‘dark matter’ that confounds subsequent analyses. Although pathway analysis may indicate major types of infiltrated cells, the results are not sufficiently detailed to provide insights into the underlying mechanisms of tumor phenotypes. Computational deconvolution analysis can infer tumor-infiltrating cell types based on tumor bulk RNA-seq data [128, 129, 130]. However, these algorithms are limited by the availability of gene signatures specific to individual cell types and the collinearity among gene signature profiles.
The majority of these limitations are overcome by single-cell sequencing. With scRNA-seq, the immune landscapes of melanoma , glioblastoma , breast [52, 55, 56], head and neck , colorectal , liver , kidney, [54, 58] and lung [53, 57, 59] cancers have been depicted at unprecedented resolution. New immune cell subtypes with distinct functions or states have been identified, and genes specifically expressed in rare immune cells have been linked to tumor immune evasion. For example, results from a recent single cell study of lung cancers by 10X Genomics  revealed that tumor-enriched B cells can be further grouped into six clusters, of which two follicular B cell clusters are characterized by the high expression of CD20, CXCR4, and HLA-DRs. By contrast, two plasma B-cell clusters express immunoglobulin gamma and the remaining two mucosa-associated lymphoid tissue-derived B-cell clusters have immunoglobulins A and M and JCHAIN as signature molecules. Subtypes of macrophages were also depicted by mass cytometry . In particular, T cells, which specifically recognize tumor neoantigens and kill cancer cells in a targeted way, have been in the spotlight of single cell interrogation of several cancer types [49, 55, 57]. Tissue-resident T-cell subsets are found in liver, lung, and breast tumors, with lower T-cell exhaustion levels associated with better prognosis [49, 55, 57]. Immunotherapies that reinvigorate cytotoxic T cells via immune checkpoint blockade or adoptively transfer neoantigen-specific T cells are therapeutically effective in multiple cancer types . Specific T-cell clusters with suppressive functions in treatment-naïve tumors and T-cell clusters that respond to immunotherapies have been identified [47, 49, 178, 179]. Signature genes of these T-cell clusters, e.g., LAYN identified in exhausted CD8+ T cells and regulatory T cells of liver cancer, can provide attractive biomarkers to predict patient responses to cancer immunotherapies and potentially serve as new candidate targets for further investigation. Nevertheless, accompanying these great achievements, single-cell studies of tumor microenvironment are limited in their depictions of spatial, temporal, and interactive characteristics among cancer and immune cells.
Besides the immune cells themselves, cancer-associated fibroblasts (CAFs) also play crucial roles in cancer immune evasion and metastasis. Heterogeneity of CAFs in various cancer types via scRNA-seq has been shown in several studies [47, 48, 50, 59]. In lung cancer studies by 10X Genomics , five distinct types of tumor-resident fibroblasts were identified that expressed unique repertoires of collagens and other extracellular matrix molecules. In colorectal cancers profiled by SMART-seq2 , two distinct subtypes of CAFs were identified, one of which was enriched for epithelial–mesenchymal transition (EMT)-related genes, which is consistent with results from the lung cancer study . The heterogeneity of CAFs of these cancer types was consistent with results from earlier studies in metastatic melanoma and head and neck cancer, in which the potential functions of CAF subclusters were indicated [47, 48]. Interestingly, a specific subcluster of CAFs that exclusively expressed multiple complement factors, including C1S, C1R, C3, C4A, CFB, and C1NH (SERPING1), correlated with T-cell infiltration based on data analysis from the Cancer Genome Atlas project . Although the correlation cannot imply causality, the cellular and molecular mechanisms of T-cell recruitment by CAFs should be studied. Furthermore, certain CAFs observed in a head and neck cancer single-cell study were found to co-localize with malignant cells highly expressing a p-EMT (partial EMT) gene program that is correlated with metastasis . The co-localization was supported by numerous ligand–receptor interactions between CAFs and the corresponding malignant cells, thus providing new clues for the underlying mechanisms of tumor invasion. The dynamic nature of CAF gene expression certainly deserves further exploration.
Outlook of single-cell sequencing in cancer research
Despite its exciting prospects, single-cell sequencing still faces notable technical challenges that limit the release of its full power in cancer research and clinical applications. For example, the single layer–omics technology generally only gives a snapshot of the state of tested cells. Thorough understanding of the functions of individual cells often requires comprehensive molecular information that covers all layers from the nucleus to extracellular matrix, and includes genomes, epigenomes, chromosome confirmation, transcriptomes, proteomes, metabolomes, and interactomes (Fig. 3). Comprehensive information is important for cancer studies because of the great genomic and phonemic heterogeneity of cancer cells. Single-cell multi-omics technologies [32, 76, 77, 78, 79, 124, 187, 193] have proved feasible but these methods are still in the infant phase of development, limited by low coverage, throughput, and automation levels. Wide application of such technologies in cancer research and clinics requires more effort to conquer the aforementioned challenges. CITE-seq has been used to simultaneously profile mRNA levels and the abundance of a set of selected proteins of cancer samples . Furthermore, SUPeR-seq allows simultaneous measuring of linear and circular RNA levels within the same single cancer cell and associated cells , and G&T-seq provides both genomic and transcriptomic information of a given cell . scTrio-seq has been used to obtain epigenomic, genomic, and transcriptomic information of the same cancer cell .
Future challenges will include circumventing the loss of spatial information of tested single cells during the dissociation step. Tumor ecosystems are highly organized and dynamic; therefore, the spatial positions of various cancer cells and the tumor microenvironment cells and their interactions may play pivotal roles during cancer progression, metastasis, immune evasion, and the development of therapeutic resistance (Fig. 3). Integration of imaging techniques with single-cell sequencing have made meaningful progress in this area. By recording the spatial information of single cells or important ‘anchor genes’ via FISH, smFISH, immunohistochemistry, laser capture microdissection, laser scanning microscopy, or in situ sequencing, the spatial structure of single cells can be experimentally recorded or computationally reconstructed [21, 87, 88, 89, 90, 91, 132, 138, 143, 149], thereby shedding light on the spatial heterogeneity of tumor ecosystems. The recently developed NICHE-seq technology  allows isolation of immune cells in a specifically prescribed niche of model animals for single-cell sequencing, which provides a powerful tool to explore tumor immunology in animal models. However, the wider application of NICHE-seq to clinical samples will take time, because two-photon laser scanning microscopy requires the targeted cells to be optically labeled, which at present is only possible on model animals. ProximID maps the cellular interaction network of tissues and could be used for spatial position mapping to cellular physical networks  to show how cancer cells interplay with the tumor microenvironment (Fig. 3). ProximID dissects tissues into doublets or triplets to capture the physical interactions among cells and determines cellular identities via scRNA-seq. ProximID provides great promise for cellular interaction and spatial position mapping, as shown by the recently proposed paired-cell sequencing method that adopts a similar strategy ; however, cellular throughput is still modest at present. A newer version of ProximID parallels the microdissection of doublets and triplets with single cell identity determination, and improves the throughput at the expense of accuracy of cell identity assignment. Overall, creative technological advances in the basic research field have recently emerged in quick succession. Despite obvious pros and cons, they provide exciting new tools to interrogate human cancers at the single-cell level.
Furthermore, the development of new computational and analytical tools is often lagging behind corresponding experimental methods. The new single-cell sequencing data, with added new dimensions or features, often violate the analytical assumptions of bulk sequencing studies, which makes existing analytical tools obsolete or underpowered. For example, the data structure of single-cell sequencing of cancers requires the application of tensors to depict the gene-by-cell-by-sample relationships, whereas the bulk sequencing data can be sufficiently encapsulated by gene-by-sample matrices. Analytical tools currently available are generally designed for matrix-based data structure. Reduction of dimensionality from tensors to matrices is currently needed to use the available bioinformatics tools to analyze either gene-by-cell, gene-by-sample, or cell-by-sample relationships. Tools for simultaneous analysis of gene–cell–sample relationships are urgently needed. The ever-increasing data size of single-cell sequencing studies also requires more robust computational powers. Down-sampling is often applied to reduce data size so that the dataset can be analyzed. Computational algorithms that can handle large single-cell sequencing datasets while simultaneously maintaining similar analytical performance are needed. The spatial single-cell RNA sequencing technique also generates unprecedented data type, for which two new algorithms have been proposed recently [196, 197], allowing analysis of the spatial variance of cancers. Computational development specifically for single-cell data will likely be the field to watch in the next few years, because there are many unresolved yet important issues. It is hoped that bioinformatics of single-cell analysis will catch up with the rapid technology development and the ever-expanding appetite for new data in the cancer research field.
Potential applications of single-cell sequencing in the clinic
Single-cell technologies use limited input materials to resolve tumor heterogeneity and so have great potential in the cancer clinic for diagnosis, prognosis, early detection, risk assessment, progress monitoring, and therapy response prediction. Single cancerous cells can be isolated from blood samples in early stages of cancer genesis [161, 170, 172], which enables early detection and assessment of cancers [198, 199]. If a set of known driver mutations are observed independently in multiple single cancer cells, clonal expansion of cancerous cells is inferred. Additional diagnostic tests are then combined to validate the inference, and further monitoring or treatments may be needed. For diagnosed cancer patients, single-cell sequencing can reveal clonal and subclonal information of their tumor lesions with respect to their genomic and transcriptomic characteristics, upon which clinicians can determine the most suitable therapies . With longitudinal sampling of CTCs or DTCs (disseminated tumor cells), single-cell sequencing also allows the monitoring of patient responses to the prescribed therapies [31, 171, 201]. The resulting genomic and transcriptomic information can be used to examine the selective pressure of drugs to various cancer clones and alert the emergence or expansion of drug-resistance cancer clones . The non-invasive nature of CTC or DTC isolation also greatly reduces the inherent risks of core biopsy directly at the tumor site. Single-cell sequencing data potentially provide metrics beyond conventional genomic mutation data or gene expression data for prognosis analysis. For example, various indices for tumor heterogeneity could be designed to predict responses to therapies, probability of metastasis, disease-free periods, and overall survival [147, 202, 203, 204, 205].
Since its inception, single-cell sequencing has revolutionized cancer research. The pioneering studies have covered the development and applications of single-cell DNA and RNA sequencing to address a wide range of topics such as intra-tumor heterogeneity of primary tumors, roles of CTCs and DTCs during metastasis, evolution of therapy resistance, and the characteristics of tumor microenvironments. Many novel biological insights have been obtained, and the revolution is just starting. Improvement of existing single-cell sequencing technologies, emergence of new techniques, and the integration of single-cell sequencing with other experimental protocols have provided powerful toolsets to understand many of the remaining mysteries of cancers. Single-cell epigenomics, multi-omics, and spatial single-cell sequencing technologies are some of the major directions of single-cell sequencing technologies that will bring the second wave of revolutions of cancer research.
This project was supported by grants from the Beijing Advanced Innovation Centre for Genomics at Peking University, Key Technologies R&D Program (2016YFC0900100), and the National Natural Science Foundation of China (81573022, 31530036, 91742203).
XR and ZZ conceived and wrote the manuscript. BK designed and made the figures. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 43.Carter L, Rothwell DG, Mesquita B, Smowton C, Leong HS, Fernandez-Gutierrez F, et al. Molecular analysis of circulating tumor cells identifies distinct copy-number profiles in patients with chemosensitive and chemorefractory small-cell lung cancer. Nat Med. 2017;23:114–9.CrossRefPubMedGoogle Scholar
- 46.Suzuki A, Matsushima K, Makinoshima H, Sugano S, Kohno T, Tsuchihara K, et al. Single-cell analysis of lung adenocarcinoma cell lines reveals diverse expression patterns of individual cells invoked by a molecular target drug treatment. Genome Biol. 2015;16:66.CrossRefPubMedPubMedCentralGoogle Scholar
- 51.Venteicher AS, Tirosh I, Hebert C, Yizhak K, Neftel C, Filbin MG, et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science. 2017;355. https://doi.org/10.1126/science.aai8478.
- 57.Guo X, Zhang Y, Zheng L, Zheng C, Song J, Zhang Q, Kang B, Liu Z, Jin L, Xing R, et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med. 2018;174:1293–308.Google Scholar
- 122.de Vargas RL, Claassen M. Computational and experimental single cell biology techniques for the definition of cell type heterogeneity, interplay and intracellular dynamics. Curr Opin Biotechnol. 2015;34:9–15.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.