Recent Machine Learning Approaches for Single-Cell RNA-seq Data Analysis

  • Aristidis G. VrahatisEmail author
  • Sotiris K. Tasoulis
  • Ilias Maglogiannis
  • Vassilis P. Plagianakos
Part of the Studies in Computational Intelligence book series (SCI, volume 891)


DNA sequencing has become an extremely popular assay with researchers claiming that in the distant future, the DNA sequencing impact will be equal to the microscope impact. Single-cell RNA-seq (scRNA-seq) is an emerging DNA-sequencing technology with promising capabilities, but with major computational challenges due to the large-scaled generated data. Given the fact that sequencing costs are constantly decreasing, the volume and complexity of the data generated by these technologies will be constantly increasing. Toward this direction, major computational challenges are posed at the cell level, in particular, when focusing on the ultra-high dimensionality aspect of the scRNA-seq data. The main challenges are related to three pillars of machine learning (ML) analysis, classification, clustering, and visualization methods. Although there has been remarkable progress in ML methods for single-cell RNA-seq data analysis, numerous questions are still unresolved. This review records the state-of-the-art classification, clustering, and visualization methods tailored for single-cell transcriptomics data.


Machine Learning Single-cell RNA-seq Clustering Classification Visualization 



This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.


  1. 1.
    Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM Symposium on Principles of Database Systems, pp. 274–281. ACM Press (2001)Google Scholar
  2. 2.
    Amir, E.A.D., Davis, K.L., Tadmor, M.D., Simonds, E.F., Levine, J.H., Bendall, S.C., Shenfeld, D.K., Krishnaswamy, S., Nolan, G.P., Pe’er, D.: ViSVE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31(6), 545 (2013)CrossRefGoogle Scholar
  3. 3.
    Andreu-Perez, J., Poon, C.C., Merrifield, R.D., Wong, S.T., Yang, G.Z.: Big data for health. IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015)CrossRefGoogle Scholar
  4. 4.
    Andrews, T.S., Hemberg, M.: Identifying cell populations with scRNASeq. Mol. Aspects Med. 59, 114–122 (2018)CrossRefGoogle Scholar
  5. 5.
    Angerer, P., Simon, L., Tritschler, S., Wolf, F.A., Fischer, D., Theis, F.J.: Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017)CrossRefGoogle Scholar
  6. 6.
    Becht, E., McInnes, L., Healy, J., Dutertre, C.A., Kwok, I.W., Ng, L.G., Ginhoux, F., Newell, E.W.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38 (2019)CrossRefGoogle Scholar
  7. 7.
    Behbehani, G.K., Bendall, S.C., Clutter, M.R., Fantl, W.J., Nolan, G.P.: Single-cell mass cytometry adapted to measurements of the cell cycle. Cytometry Part A 81(7), 552–566 (2012)CrossRefGoogle Scholar
  8. 8.
    Bendall, S.C., Davis, K.L., Amir, E.A.D., Tadmor, M.D., Simonds, E.F., Chen, T.J., Shenfeld, D.K., Nolan, G.P., Pe’er, D.: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157(3), 714–725 (2014)CrossRefGoogle Scholar
  9. 9.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM Press (2001)Google Scholar
  10. 10.
    Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)CrossRefGoogle Scholar
  11. 11.
    Buettner, F., Natarajan, K.N., Casale, F.P., Proserpio, V., Scialdone, A., Theis, F.J., Teichmann, S.A., Marioni, J.C., Stegle, O.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155 (2015)CrossRefGoogle Scholar
  12. 12.
    Camara, P.G.: Methods and challenges in the analysis of single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 47–53 (2018)CrossRefGoogle Scholar
  13. 13.
    Cannings, T.I., Samworth, R.J.: Random projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 959–1035 (2017).
  14. 14.
    Chen, J., Schlitzer, A., Chakarov, S., Ginhoux, F., Poidinger, M.: Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat. Commun. 7, 11988 (2016)CrossRefGoogle Scholar
  15. 15.
    Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215 (2008)CrossRefGoogle Scholar
  16. 16.
    Dimitrakopoulou, K., Vrahatis, A.G., Wilk, E., Tsakalidis, A.K., Bezerianos, A.: Olympus: an automated hybrid clustering method in time series gene expression. Case study: host response after influenza a (H1N1) infection. Comput. Methods Prog. Biomed. 111(3), 650–661 (2013)CrossRefGoogle Scholar
  17. 17.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Eberwine, J., Sul, J.Y., Bartfai, T., Kim, J.: The promise of single-cell sequencing. Nat. Methods 11(1), 25 (2014)CrossRefGoogle Scholar
  19. 19.
    Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012)CrossRefGoogle Scholar
  20. 20.
    Ghahramani, A., Watt, F.M., Luscombe, N.M.: Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv, p. 262501 (2018)Google Scholar
  21. 21.
    Gross, A., Schoendube, J., Zimmermann, S., Steeb, M., Zengerle, R., Koltay, P.: Technologies for single-cell isolation. Int. J. Mol. Sci. 16(8), 16897–16919 (2015)CrossRefGoogle Scholar
  22. 22.
    Grün, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., Clevers, H., van Oudenaarden, A.: Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568), 251 (2015)CrossRefGoogle Scholar
  23. 23.
    Guo, M., Wang, H., Potter, S.S., Whitsett, J.A., Xu, Y.: Sincera: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol. 11(11), e1004575 (2015)CrossRefGoogle Scholar
  24. 24.
    Hedlund, E., Deng, Q.: Single-cell RNA sequencing: technical advancements and biological applications. Mol. Aspects Med. 59, 36–46 (2018)CrossRefGoogle Scholar
  25. 25.
    Huang, X., Liu, S., Wu, L., Jiang, M., Hou, Y.: High throughput single cell RNA sequencing, bioinformatics analysis and applications. In: Single cell biomedicine, pp. 33–43. Springer (2018)Google Scholar
  26. 26.
    Hwang, B., Lee, J.H., Bang, D.: Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 50(8), 96 (2018)CrossRefGoogle Scholar
  27. 27.
    Ilicic, T., Kim, J.K., Kolodziejczyk, A.A., Bagger, F.O., McCarthy, D.J., Marioni, J.C., Teichmann, S.A.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17(1), 29 (2016)CrossRefGoogle Scholar
  28. 28.
    Jiang, L., Chen, H., Pinello, L., Yuan, G.C.: Giniclust: detecting rare cell types from single-cell gene expression data with gini index. Genome Biol. 17(1), 144 (2016)CrossRefGoogle Scholar
  29. 29.
    Kanter, I., Kalisky, T.: Single cell transcriptomics: methods and applications. Front. Oncol. 5, 53 (2015)CrossRefGoogle Scholar
  30. 30.
    Khalfaoui, B., Vert, J.P.: Droplasso: a robust variant of lasso for single cell RNA-seq data. arXiv preprint arXiv:1802.09381 (2018)
  31. 31.
    Kharchenko, P.V., Silberstein, L., Scadden, D.T.: Bayesian approach to single-cell differential expression analysis. Nat. Methods 11(7), 740 (2014)CrossRefGoogle Scholar
  32. 32.
    Kiselev, V.Y., Andrews, T.S., Hemberg, M.: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Gen., 1 (2019)Google Scholar
  33. 33.
    Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K.N., Reik, W., Barahona, M., Green, A.R., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483 (2017)CrossRefGoogle Scholar
  34. 34.
    Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. bioRxiv, p. 453449 (2018)Google Scholar
  35. 35.
    Kolodziejczyk, A.A., Kim, J.K., Svensson, V., Marioni, J.C., Teichmann, S.A.: The technology and biology of single-cell RNA sequencing. Mol. Cell 58(4), 610–620 (2015)CrossRefGoogle Scholar
  36. 36.
    Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time \((1+ {\epsilon } )\) -approximation algorithm for k-means clustering in any dimensions. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science 0, 454–462. (2004)
  37. 37.
    Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)CrossRefGoogle Scholar
  38. 38.
    Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017)CrossRefGoogle Scholar
  39. 39.
    Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inform. Insights 8, BII-S31559 (2016)CrossRefGoogle Scholar
  40. 40.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
  41. 41.
    MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., et al.: The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45(D1), D896–D901 (2016)CrossRefGoogle Scholar
  42. 42.
    Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015)CrossRefGoogle Scholar
  43. 43.
    Mardis, E.R.: DNA sequencing technologies: 2006–2016. Nat. Protoc. 12(2), 213 (2017)CrossRefGoogle Scholar
  44. 44.
    McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
  45. 45.
    Moussa, M., Măndoiu, I.I.: Single cell RNA-seq data clustering using TF-IDF based methods. BMC Genom. 19(6), 127 (2018)Google Scholar
  46. 46.
    Nusrat, S., Harbig, T., Gehlenborg, N.: Tasks, techniques, and tools for genomic data visualization. arXiv preprint arXiv:1905.02853 (2019)
  47. 47.
    Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87 (2011)CrossRefGoogle Scholar
  48. 48.
    Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)Google Scholar
  49. 49.
    Park, P.J.: Chip-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10(10), 669 (2009)CrossRefGoogle Scholar
  50. 50.
    Pennisi, E.: Will computers crash genomics? (2011)Google Scholar
  51. 51.
    Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16(1), 241 (2015)CrossRefGoogle Scholar
  52. 52.
    Poirion, O.B., Zhu, X., Ching, T., Garmire, L.: Single-cell transcriptomics bioinformatics and computational challenges. Front. Genet. 7, 163 (2016)CrossRefGoogle Scholar
  53. 53.
    Popescu, M., Keller, J.M.: Random projections fuzzy k-nearest neighbor (RPFKNN) for big data classification. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1813–1817. IEEE (2016)Google Scholar
  54. 54.
    Qu, Z., Lau, C.W., Nguyen, Q.V., Zhou, Y., Catchpoole, D.R.: Visual analytics of genomic and cancer data: a systematic review. Cancer Inf. 18, 1176935119835546 (2019)Google Scholar
  55. 55.
    Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., et al.: Science forum: the human cell atlas. Elife 6, e27041 (2017)CrossRefGoogle Scholar
  56. 56.
    Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58(4), 586–597 (2015)CrossRefGoogle Scholar
  57. 57.
    Rostom, R., Svensson, V., Teichmann, S.A., Kar, G.: Computational approaches for interpreting SCRNA-seq data. FEBS Lett. 591(15), 2213–2225 (2017)CrossRefGoogle Scholar
  58. 58.
    Scialdone, A., Natarajan, K.N., Saraiva, L.R., Proserpio, V., Teichmann, S.A., Stegle, O., Marioni, J.C., Buettner, F.: Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015)CrossRefGoogle Scholar
  59. 59.
    Setty, M., Tadmor, M.D., Reich-Zeliger, S., Angel, O., Salame, T.M., Kathail, P., Choi, K., Bendall, S., Friedman, N., Pe’er, D.: Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34(6), 637 (2016)CrossRefGoogle Scholar
  60. 60.
    Shapiro, E., Biezuner, T., Linnarsson, S.: Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14(9), 618 (2013)CrossRefGoogle Scholar
  61. 61.
    Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., Waterston, R.H.: DNA sequencing at 40: past, present and future. Nature 550(7676), 345 (2017)CrossRefGoogle Scholar
  62. 62.
    Stegle, O., Teichmann, S.A., Marioni, J.C.: Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16(3), 133 (2015)CrossRefGoogle Scholar
  63. 63.
    Svensson, V., Vento-Tormo, R., Teichmann, S.A.: Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13(4), 599 (2018)CrossRefGoogle Scholar
  64. 64.
    Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377 (2009)CrossRefGoogle Scholar
  65. 65.
    Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World wide web, pp. 287–297. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  66. 66.
    Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Biomedical data ensemble classification using random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 166–172 (2018).
  67. 67.
    Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-sequencing data through multiple random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5448–5450. IEEE (2018)Google Scholar
  68. 68.
    Todorov, H., Saeys, Y.: Computational approaches for high-throughput single-cell data analysis. FEBS J. 286(8), 1451–1467 (2018)CrossRefGoogle Scholar
  69. 69.
    Van Dijk, D., Sharma, R., Nainys, J., Yim, K., Kathail, P., Carr, A.J., Burdziak, C., Moon, K.R., Chaffer, C.L., Pattabiraman, D., et al.: Recovering gene interactions from single-cell data using data diffusion. Cell 174(3), 716–729 (2018)CrossRefGoogle Scholar
  70. 70.
    Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)Google Scholar
  71. 71.
    Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414 (2017)CrossRefGoogle Scholar
  72. 72.
    Weinreb, C., Wolock, S., Klein, A.M.: Spring: a kinetic interface for visualizing high dimensional single-cell expression data. Bioinformatics 34(7), 1246–1248 (2017)CrossRefGoogle Scholar
  73. 73.
    Wetterstrand, K.A.: DNA sequencing costs: data from the NHGRI genome sequencing program (GSP). 2013. (2016)
  74. 74.
    Witten, D.M., et al.: Classification and clustering of sequencing data using a poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  75. 75.
    Wolf, F.A., Angerer, P., Theis, F.J.: Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018)CrossRefGoogle Scholar
  76. 76.
    Wu, Y., Tamayo, P., Zhang, K.: Visualizing and interpreting single-cell gene expression datasets with similarity weighted nonnegative embedding. Cell Syst. 7(6), 656–666 (2018)CrossRefGoogle Scholar
  77. 77.
    Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12), 1974–1980 (2015)CrossRefGoogle Scholar
  78. 78.
    Zhao, Y., Tasoulis, S., Roos, T.: Manifold visualization via short walks. In: Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Short Papers, pp. 85–89. Eurographics Association (2016)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Authors and Affiliations

  • Aristidis G. Vrahatis
    • 1
    Email author
  • Sotiris K. Tasoulis
    • 1
  • Ilias Maglogiannis
    • 2
  • Vassilis P. Plagianakos
    • 1
  1. 1.Department of Computer Science and Biomedical InformaticsUniversity of ThessalyLamiaGreece
  2. 2.Department of Digital SystemsUniversity of PiraeusPiraeusGreece

Personalised recommendations