Skip to main content

A Bioinformatic Toolkit for Single-Cell mRNA Analysis

  • Protocol
  • First Online:
Book cover Single Cell Methods

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1979))

Abstract

The recent technological developments in the field of single-cell RNA-Seq enable us to assay the transcriptome of up to a million single cells in parallel. However, the analyses of such big datasets present a major challenge. During the last decade, a wide variety of strategies have been proposed covering different steps of the analysis. Here, we introduce a selection of computational tools to provide an overview of a generic analysis pipeline.

The first step of every scRNA-Seq experiment is proper study design, which does not require sophisticated experimental or informatics skills but is nonetheless presumably the most important step. The quality of the resulting data strictly depends on the proper planning of the experiment, including the selection of the most suitable technology for the biological question of interest as well as an elaborated study design to minimize the influence of confounding factors. Once the experiment has been conducted, the raw sequencing data needs to be processed to extract the gene expression information for each cell. This task comprises quality assessment of the sequenced reads, alignment against a reference genome, demultiplexing of the cell barcodes, and quantification of the reads/transcripts per gene. As any other transcriptomics technology, single-cell mRNA-Seq requires data normalization to assure sample-to-sample, here cell-to-cell, comparability and the consideration of confounding factors.

Once gene expression values have been extracted from the reads and normalized, the researcher has the agony of choosing between a plethora of analysis approaches to investigate diverse aspects of the single-cell transcriptomes, such as dimensionality reduction and clustering to explore cellular heterogeneity or trajectory analysis to model differentiation processes.

In this chapter, we present a wrap-up of the abovementioned steps to conduct single-cell RNA-Seq analyses and present a selection of existing tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tang F, Barbacioru C, Wang Y et al (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6:377–382. https://doi.org/10.1038/nmeth.1315

    Article  CAS  PubMed  Google Scholar 

  2. Picelli S, Björklund ÅK, Faridani OR et al (2013) Smart-Seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10:1096–1098. https://doi.org/10.1038/nmeth.2639

    Article  CAS  PubMed  Google Scholar 

  3. Islam S, Kjällquist U, Moliner A et al (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-Seq. Genome Res 21:1160–1167. https://doi.org/10.1101/gr.110882.110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Macosko EZ, Basu A, Satija R et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161:1202–1214. https://doi.org/10.1016/j.cell.2015.05.002

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gierahn TM, Wadsworth MH, Hughes TK et al (2017) Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat Methods 14:395–398. https://doi.org/10.1038/nmeth.4179

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cao J, Packer JS, Ramani V et al (2017) Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357:661–667. https://doi.org/10.1126/science.aam8940

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cadwell CR, Palasantza A, Jiang X et al (2016) Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-Seq. Nat Biotechnol 34:199–203. https://doi.org/10.1038/nbt.3445

  8. Paul F, Arkin Y, Giladi A et al (2015) Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163:1663–1677. https://doi.org/10.1016/j.cell.2015.11.013

    Article  CAS  PubMed  Google Scholar 

  9. Klein AM, Mazutis L, Akartuna I et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201. https://doi.org/10.1016/J.CELL.2015.04.044

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Fan HC, Fu GK, SP a F (2015) Combinatorial labeling of single cells for gene expression cytometry. Science 347:1258367. https://doi.org/10.1126/science.1258367

    Article  CAS  PubMed  Google Scholar 

  11. Goldstein LD, Chen Y-JJ, Dunne J et al (2017) Massively parallel nanowell-based single-cell gene expression profiling. BMC Genomics 18:519. https://doi.org/10.1186/s12864-017-3893-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Dey SS, Kester L, Spanjaard B et al (2015) Integrated genome and transcriptome sequencing of the same cell. Nat Biotechnol 33:285. https://doi.org/10.1038/nbt.3129

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Angermueller C, Clark SJ, Lee HJ et al (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13:229. https://doi.org/10.1038/nmeth.3728

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hou Y, Guo H, Cao C et al (2016) Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res 26:304. https://doi.org/10.1038/cr.2016.23

  15. Stoeckius M, Hafemeister C, Stephenson W et al (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14:865. https://doi.org/10.1038/nmeth.4380

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kang HM, Subramaniam M, Targ S et al (2017) Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol 36:89–94. https://doi.org/10.1038/nbt.4042

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Langmead B, Nellore A (2018) Cloud computing for genomic data analysis and collaboration. Nat Rev Genet 19:208–219. https://doi.org/10.1038/nrg.2017.113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Regev A, Teichmann SA, Lander ES et al (2017) The human cell atlas. Elife 6:e27041. https://doi.org/10.7554/eLife.27041

    Article  PubMed  PubMed Central  Google Scholar 

  19. Beaulieu-Jones BK, Greene CS (2017) Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol 35:342–346. https://doi.org/10.1038/nbt.3780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-Seq quantification. Nat Biotechnol 34:525–527. https://doi.org/10.1038/nbt.3519

  21. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635

    Article  CAS  PubMed  Google Scholar 

  22. Dutton G (2016) From DNA to diagnosis without delay. Genet Eng Biotechnol News 36:8–9. https://doi.org/10.1089/gen.36.05.03

    Article  Google Scholar 

  23. Turakhia Y, Bejerano G, Dally WJ (2018) Darwin. In: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS ’18. ACM Press, New York, NY, pp 199–213

    Chapter  Google Scholar 

  24. Lopez R, Regier J, Cole M, et al (2017) A deep generative model for gene expression profiles from single-cell RNA sequencing

    Google Scholar 

  25. Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19:15. https://doi.org/10.1186/s13059-017-1382-0

    Article  PubMed  PubMed Central  Google Scholar 

  26. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Jaitin DA, Kenigsberg E, Keren-Shaul H et al (2014) Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343:776–779. https://doi.org/10.1126/science.1247651

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Smith T, Heger A, Sudbery I (2017) UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27:491–499. https://doi.org/10.1101/gr.209601.116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Parekh S, Ziegenhain C, Vieth B et al (2018) zUMIs: a fast and flexible pipeline to process RNA sequencing data with UMIs. bioRxiv:153940. https://doi.org/10.1101/153940

  30. Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360. https://doi.org/10.1038/nmeth.3317

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. https://doi.org/10.1186/gb-2013-14-4-r36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419. https://doi.org/10.1038/nmeth.4197

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930. https://doi.org/10.1093/bioinformatics/btt656

    Article  CAS  PubMed  Google Scholar 

  34. Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169. https://doi.org/10.1093/bioinformatics/btu638

    Article  CAS  PubMed  Google Scholar 

  35. Ilicic T, Kim JK, Kolodziejczyk AA et al (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:29. https://doi.org/10.1186/s13059-016-0888-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Grün D, Kester L, van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nat Methods 11:637–640. https://doi.org/10.1038/nmeth.2930

    Article  CAS  PubMed  Google Scholar 

  37. Butler A, Hoffman P, Smibert P et al (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36:411. https://doi.org/10.1038/nbt.4096

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Diaz A, Liu SJ, Sandoval C et al (2016) SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 32:2219–2220. https://doi.org/10.1093/bioinformatics/btw201

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Vallejos CA, Risso D, Scialdone A et al (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565–571. https://doi.org/10.1038/nmeth.4292

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Qiu X, Hill A, Packer J et al (2017) Single-cell mRNA quantification and differential analysis with Census. Nat Methods 14:309. https://doi.org/10.1038/nmeth.4150

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Grün D, Van Oudenaarden A (2015) Design and analysis of single-cell sequencing experiments. Cell 163:799. https://doi.org/10.1016/j.cell.2015.10.039

    Article  CAS  PubMed  Google Scholar 

  42. Buettner F, Natarajan KN, Casale FP et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33:155–160. https://doi.org/10.1038/nbt.3102

    Article  CAS  PubMed  Google Scholar 

  43. Yu P, Lin W (2016) Single-cell transcriptome study as big data. Genomics Proteomics Bioinformatics 14:21

    Article  PubMed  PubMed Central  Google Scholar 

  44. Shalek AK, Satija R, Adiconis X et al (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498:236–240. https://doi.org/10.1038/nature12172

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lin P, Troup M, Ho JWK (2016) CIDR: ultrafast and accurate clustering through imputation for single cell RNA-Seq data. bioRxiv. https://doi.org/10.1101/068775

  46. Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16:241. https://doi.org/10.1186/s13059-015-0805-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Grün D, Lyubimova A, Kester L et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525:251–255. https://doi.org/10.1038/nature14966

    Article  CAS  PubMed  Google Scholar 

  48. van DD, Nainys J, Sharma R et al (2017) MAGIC: a diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. bioRxiv:111591. https://doi.org/10.1101/111591

  49. Huang M, Wang J, Torre E et al (2017) Gene expression recovery for single cell RNA sequencing. bioRxiv:138677. https://doi.org/10.1101/138677

  50. Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat Commun 9:997. https://doi.org/10.1038/s41467-018-03405-7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London, Edinburgh. Dublin Philos Mag J Sci 2:559–572. https://doi.org/10.1080/14786440109462720

    Article  Google Scholar 

  52. Van Der ML, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605. https://doi.org/10.1007/s10479-011-0841-3

    Article  Google Scholar 

  53. Wattenberg M, Viégas F, Johnson I (2016) How to use t-SNE effectively. Distill 1:e2. https://doi.org/10.23915/distill.00002

    Article  Google Scholar 

  54. Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82. https://doi.org/10.1016/j.neucom.2013.11.045

    Article  Google Scholar 

  55. Lopez R, Regier J, Cole MB et al (2018) Bayesian inference for a generative model of transcriptome profiles from single-cell RNA sequencing. bioRxiv:292037. https://doi.org/10.1101/292037

  56. Eraslan G, Simon LM, Mircea M et al (2018) Single cell RNA-seq denoising using a deep count autoencoder. bioRxiv:300681. https://doi.org/10.1101/300681

  57. Wang D, Gu J (2017) VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder. bioRxiv:199315. https://doi.org/10.1101/199315

  58. Haghverdi L, Buettner F, Theis FJ (2014) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31:2989. https://doi.org/10.1093/bioinformatics/btv325

    Article  CAS  Google Scholar 

  59. Haghverdi L, Büttner M, Wolf FA et al (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13:845. https://doi.org/10.1038/nmeth.3971

    Article  CAS  PubMed  Google Scholar 

  60. McInnes L, Healy J (2018) UMAP: Uniform Manifold Approximation and Projection for dimension reduction

    Google Scholar 

  61. Becht E, Dutertre C-A, Kwok IWH et al (2018) Evaluation of UMAP as an alternative to t-SNE for single-cell data. bioRxiv:298430. https://doi.org/10.1101/298430

  62. Trapnell C, Cacchiarelli D, Grimsby J et al (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32:381–386. https://doi.org/10.1038/nbt.2859

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Juliá M, Telenti A, Rausell A (2015) Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics 31:3380–3382. https://doi.org/10.1093/bioinformatics/btv368

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Ji Z, Ji H (2016) TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 44:e117–e117. https://doi.org/10.1093/nar/gkw430

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Saelens W, Cannoodt R, Todorov H, Saeys Y (2018) A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv:276907. https://doi.org/10.1101/276907

  66. Cannoodt R, Saelens W, Sichien D et al (2016) SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. bioRxiv:79509. https://doi.org/10.1101/079509

  67. Street K, Risso D, Fletcher RB et al (2017) Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. bioRxiv:128843. https://doi.org/10.1101/128843

  68. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. https://doi.org/10.1088/1742-5468/2008/10/P10008

  69. Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974–1980. https://doi.org/10.1093/bioinformatics/btv088

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, Palo Alto, CA, pp 226–231

    Google Scholar 

  71. Mass E, Ballesteros I, Farlik M et al (2016) Specification of tissue-resident macrophages during organogenesis. Science 353:aaf4238. https://doi.org/10.1126/science.aaf4238

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Scholz CJ, Biernat P, Becker M et al (2018) FASTGenomics: an analytical ecosystem for single-cell RNA sequencing data. bioRxiv:272476. https://doi.org/10.1101/272476

  73. Zhu X, Wolfgruber TK, Tasato A et al (2017) Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists. Genome Med 9:108. https://doi.org/10.1186/s13073-017-0492-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Gardeux V, David FPA, Shajkofci A et al (2017) ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data. Bioinformatics 33:3123–3125. https://doi.org/10.1093/bioinformatics/btx337

Download references

Acknowledgments

The authors would like to acknowledge Prof. Dr. med. Joachim L. Schultze for support and advice during the writing process. Moreover, the authors Paweł Biernat and Matthias Becker are supported by a grant from the Federal Ministry for Economic Affairs and Energy (BMWi Project FASTGenomics). The work of Jonas Schulte-Schrepping receives funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 733100 (SYSCID). The DFG graduate program 2168/1 (Bonn and Melbourne International Research and Training Group—Bo&MeRanG) supports Patrick Günther.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Baßler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Baßler, K., Günther, P., Schulte-Schrepping, J., Becker, M., Biernat, P. (2019). A Bioinformatic Toolkit for Single-Cell mRNA Analysis. In: Proserpio, V. (eds) Single Cell Methods. Methods in Molecular Biology, vol 1979. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9240-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9240-9_26

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9239-3

  • Online ISBN: 978-1-4939-9240-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics