Abstract
We present FeatClust, a software tool for clustering small sample size single-cell RNA-Seq datasets. The FeatClust approach is based on feature selection. It divides features into several groups by performing agglomerative hierarchical clustering and then iteratively clustering the samples and removing features belonging to groups with the least variance across samples. The optimal number of feature groups is selected based on silhouette analysis on the clustered data, i.e., selecting the clustering with the highest average silhouette coefficient. FeatClust also allows one to visually choose the number of clusters if it is not known, by generating silhouette plot for a chosen number of groupings of the dataset. We cluster five small sample single-cell RNA-seq datasets and use the adjusted rand index metric to compare the results with other clustering packages. The results are promising and show the effectiveness of FeatClust on small sample size datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Software Availability
The FeatClust algorithm was implemented in Python programming language and is available on GitHub https://github.com/edwinv87/featclust. The installation and usage instructions are provided on the readme file on GitHub.
References
Single-cell RNA-seq datasets. https://hemberg-lab.github.io/scRNA.seq.datasets/. Accessed 08 Sep 2018
SEURAT: R toolkit for single cell genomics (2018). https://satijalab.org/seurat/. Accessed 5 Dec 2018
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2007)
Biase, F.H., Cao, X., Zhong, S.: Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res. 24(11), 1787–1796 (2014). https://doi.org/10.1101/gr.177725.114
Buettner, F., et al.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155–160 (2015). https://doi.org/10.1038/nbt.3102
Butler, A., Hoffman, P., Smibert, P., Papalexi, E., Satija, R.: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018). https://doi.org/10.1038/nbt.4096
Fan, X., et al.: Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16(1) (2015). https://doi.org/10.1186/s13059-015-0706-1
Goolam, M., et al.: Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165(1), 61–74 (2016). https://doi.org/10.1016/j.cell.2016.01.047
Guo, M., Wang, H., Potter, S.S., Whitsett, J.A., Xu, Y.: SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLOS Comput. Biol. 11(11), e1004575 (2015). https://doi.org/10.1371/journal.pcbi.1004575
Hebenstreit, D.: Methods, challenges and potentials of single cell RNA-seq. Biology 1(3), 658–667 (2012). https://doi.org/10.3390/biology1030658
Islam, S., et al.: Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21(7), 1160–1167 (2011). https://doi.org/10.1101/gr.110882.110
Jaitin, D.A., et al.: Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343(6172), 776–779 (2014). https://doi.org/10.1126/science.1247651
Ji, Z., Ji, H.: TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44(13), e117–e117 (2016). https://doi.org/10.1093/nar/gkw430
Ji, Z., Ji, H.: TSCAN: Tools for Single-Cell ANalysis, October 2018. https://bioconductor.org/packages/release/bioc/html/TSCAN.html
Kiselev, V.Y., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483–486 (2017). https://doi.org/10.1038/nmeth.4236
Levine, J.H., et al.: Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1), 184–197 (2015). https://doi.org/10.1016/j.cell.2015.05.047
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/tit.1982.1056489
Macosko, E.Z., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015). https://doi.org/10.1016/j.cell.2015.05.002
Ramazzotti, D., Wang, B., Sano, L.D., Batzoglou, S.: Single-cell Interpretation via Multi-kernel LeaRning (SIMLR), January 2019
Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., Regev, A.: Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33(5), 495–502 (2015). https://doi.org/10.1038/nbt.3192
Tang, F., et al.: mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377–382 (2009). https://doi.org/10.1038/nmeth.1315
Treutlein, B., et al.: Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509(7500), 371–375 (2014). https://doi.org/10.1038/nature13173
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414–416 (2017). https://doi.org/10.1038/nmeth.4207
Wang, D., Bodovitz, S.: Single cell analysis: the new frontier in ‘omics’. Trends Biotechnol. 28(6), 281–290 (2010). https://doi.org/10.1016/j.tibtech.2010.03.002
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963). https://doi.org/10.1080/01621459.1963.10500845
Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12), 1974–1980 (2015). https://doi.org/10.1093/bioinformatics/btv088
Yan, L., et al.: Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20(9), 1131–1139 (2013). https://doi.org/10.1038/nsmb.2660
Žurauskienė, J., Yau, C.: pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinform. 17(1) (2016). https://doi.org/10.1186/s12859-016-0984-y
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vans, E., Sharma, A., Patil, A., Shigemizu, D., Tsunoda, T. (2019). Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-29894-4_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)