Skip to main content

Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection

  • Conference paper
  • First Online:
PRICAI 2019: Trends in Artificial Intelligence (PRICAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11672))

Included in the following conference series:

Abstract

We present FeatClust, a software tool for clustering small sample size single-cell RNA-Seq datasets. The FeatClust approach is based on feature selection. It divides features into several groups by performing agglomerative hierarchical clustering and then iteratively clustering the samples and removing features belonging to groups with the least variance across samples. The optimal number of feature groups is selected based on silhouette analysis on the clustered data, i.e., selecting the clustering with the highest average silhouette coefficient. FeatClust also allows one to visually choose the number of clusters if it is not known, by generating silhouette plot for a chosen number of groupings of the dataset. We cluster five small sample single-cell RNA-seq datasets and use the adjusted rand index metric to compare the results with other clustering packages. The results are promising and show the effectiveness of FeatClust on small sample size datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Software Availability

The FeatClust algorithm was implemented in Python programming language and is available on GitHub https://github.com/edwinv87/featclust. The installation and usage instructions are provided on the readme file on GitHub.

References

  1. Single-cell RNA-seq datasets. https://hemberg-lab.github.io/scRNA.seq.datasets/. Accessed 08 Sep 2018

  2. SEURAT: R toolkit for single cell genomics (2018). https://satijalab.org/seurat/. Accessed 5 Dec 2018

  3. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2007)

    Google Scholar 

  4. Biase, F.H., Cao, X., Zhong, S.: Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res. 24(11), 1787–1796 (2014). https://doi.org/10.1101/gr.177725.114

    Article  Google Scholar 

  5. Buettner, F., et al.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155–160 (2015). https://doi.org/10.1038/nbt.3102

    Article  Google Scholar 

  6. Butler, A., Hoffman, P., Smibert, P., Papalexi, E., Satija, R.: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018). https://doi.org/10.1038/nbt.4096

    Article  Google Scholar 

  7. Fan, X., et al.: Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16(1) (2015). https://doi.org/10.1186/s13059-015-0706-1

  8. Goolam, M., et al.: Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165(1), 61–74 (2016). https://doi.org/10.1016/j.cell.2016.01.047

    Article  Google Scholar 

  9. Guo, M., Wang, H., Potter, S.S., Whitsett, J.A., Xu, Y.: SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLOS Comput. Biol. 11(11), e1004575 (2015). https://doi.org/10.1371/journal.pcbi.1004575

    Article  Google Scholar 

  10. Hebenstreit, D.: Methods, challenges and potentials of single cell RNA-seq. Biology 1(3), 658–667 (2012). https://doi.org/10.3390/biology1030658

    Article  Google Scholar 

  11. Islam, S., et al.: Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21(7), 1160–1167 (2011). https://doi.org/10.1101/gr.110882.110

    Article  Google Scholar 

  12. Jaitin, D.A., et al.: Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343(6172), 776–779 (2014). https://doi.org/10.1126/science.1247651

    Article  Google Scholar 

  13. Ji, Z., Ji, H.: TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44(13), e117–e117 (2016). https://doi.org/10.1093/nar/gkw430

    Article  Google Scholar 

  14. Ji, Z., Ji, H.: TSCAN: Tools for Single-Cell ANalysis, October 2018. https://bioconductor.org/packages/release/bioc/html/TSCAN.html

  15. Kiselev, V.Y., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483–486 (2017). https://doi.org/10.1038/nmeth.4236

    Article  Google Scholar 

  16. Levine, J.H., et al.: Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1), 184–197 (2015). https://doi.org/10.1016/j.cell.2015.05.047

    Article  Google Scholar 

  17. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/tit.1982.1056489

    Article  MathSciNet  MATH  Google Scholar 

  18. Macosko, E.Z., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015). https://doi.org/10.1016/j.cell.2015.05.002

    Article  Google Scholar 

  19. Ramazzotti, D., Wang, B., Sano, L.D., Batzoglou, S.: Single-cell Interpretation via Multi-kernel LeaRning (SIMLR), January 2019

    Google Scholar 

  20. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., Regev, A.: Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33(5), 495–502 (2015). https://doi.org/10.1038/nbt.3192

    Article  Google Scholar 

  21. Tang, F., et al.: mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377–382 (2009). https://doi.org/10.1038/nmeth.1315

    Article  Google Scholar 

  22. Treutlein, B., et al.: Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509(7500), 371–375 (2014). https://doi.org/10.1038/nature13173

    Article  Google Scholar 

  23. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414–416 (2017). https://doi.org/10.1038/nmeth.4207

    Article  Google Scholar 

  24. Wang, D., Bodovitz, S.: Single cell analysis: the new frontier in ‘omics’. Trends Biotechnol. 28(6), 281–290 (2010). https://doi.org/10.1016/j.tibtech.2010.03.002

    Article  Google Scholar 

  25. Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963). https://doi.org/10.1080/01621459.1963.10500845

    Article  MathSciNet  Google Scholar 

  26. Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12), 1974–1980 (2015). https://doi.org/10.1093/bioinformatics/btv088

    Article  Google Scholar 

  27. Yan, L., et al.: Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20(9), 1131–1139 (2013). https://doi.org/10.1038/nsmb.2660

    Article  Google Scholar 

  28. Žurauskienė, J., Yau, C.: pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinform. 17(1) (2016). https://doi.org/10.1186/s12859-016-0984-y

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin Vans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vans, E., Sharma, A., Patil, A., Shigemizu, D., Tsunoda, T. (2019). Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29894-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29893-7

  • Online ISBN: 978-3-030-29894-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics