Abstract
Single cell RNA sequencing (scRNA-seq) provides a view of high-resolution to reveal the cellular heterogenicity. A series of analysis, such as cell-type identification, differential expression analysis, regulatory relationship detection, could uncover unprecedented biological findings. Prior to these downstream analysis, it’s crucial to remove low-quality cells because they are technical noises which weaken true biological signal and mislead downstream analysis. Existing methods either require setting threshold manually or require true labels for supervised training, which is not appropriate in many cases. We present an unsupervised ensemble learning method, which could automatically identify low-quality cells from single cell RNA-seq sequencing data. This method integrates weak classifiers base on five selected features from housekeeping genes, reads mapping rate and detected genes. To avoid setting thresholds of classifiers manually, it enumerates threshold values within a reasonable range and chooses the most suitable threshold values based on a scoring function. In experiments, it exhibits high and steady accuracy on multiple datasets.
Code is available at https://github.com/mzhq/EnsembleKQC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bacher, R., Kendziorski, C.: Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 17(1), 63 (2016)
Breunig, M.M., Kriegel, H.P., Ng, R.T., et al.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29(2), pp. 93–104. ACM (2000)
Buettner, F., Natarajan, K.N., Casale, F.P., et al.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155 (2015)
Butte, A.J., Dzau, V.J., Glueck, S.B.: Further defining housekeeping, or “maintenance”, genes Focus on “A compendium of gene expression in normal human tissues”. Physiol. Genomics 7(2), 95–96 (2001)
Diaz, A., Liu, S.J., Sandoval, C., et al.: SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 32(14), 2219–2220 (2016)
Eisenberg, E., Levanon, E.Y.: Human housekeeping genes are compact. Trends Genet. 19(7), 362–365 (2003)
Gunning, P.W., Ghoshdastider, U., Whitaker, S., et al.: The evolution of compositionally and functionally distinct actin filaments. J. Cell Sci. 128(11), 2009–2019 (2015)
Hanukogle, I., Tanese, N., Fuchs, E.: Complementary DNA sequence of a human cytoplasmic actin: interspecies divergence of 3′ non-coding regions. J. Mol. Biol. 163(4), 673–678 (1983)
Huo, X., Hu, S., Zhao, C., et al.: Dr.seq: a quality control and analysis pipeline for droplet sequencing. Bioinformatics 32(14), 2221–2223 (2016)
Ilicic, T., Kim, J.K., Kolodziejczyk, A.A., et al.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17(1), 29 (2016)
Islam, S., Zeisel, A., Joost, S., et al.: Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11(2), 163 (2014)
Jiang, P., Thomson, J.A., Stewart, R.: Quality control of single-cell RNA-seq by SinQC. Bioinformatics 32(16), 2514–2516 (2016)
Kolodziejczyk, A.A., Kim, J.K., Tsang, J.C.H., et al.: Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4), 471–485 (2015)
Laschet, J.J., Minier, F., Kurcewicz, I., et al.: Glyceraldehyde-3-phosphate dehydrogenase is a GABAA receptor kinase linking glycolysis to neuronal inhibition. J. Neurosci. 24(35), 7614–7622 (2004)
Leng, N., Chu, L.F., Barry, C., et al.: Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments. Nat. Methods 12(10), 947 (2015)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)
McCarthy, D.J., Campbell, K.R., Lun, A.T.L., et al.: Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8), 1179–1186 (2017)
Pace, N.R.: The universal nature of biochemistry. Proc. Natl. Acad. Sci. 98(3), 805–808 (2001)
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
Sawa, A., Khan, A.A., Hester, L.D., et al.: Glyceraldehyde-3-phosphate dehydrogenase: nuclear translocation participates in neuronal and nonneuronal cell death. Proc. Natl. Acad. Sci. 94(21), 11669–11674 (1997)
Shalek, A.K., Satija, R., Shuga, J., et al.: Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510(7505), 363 (2014)
Sirover, M.A.: Role of the glycolytic protein, glyceraldehyde-3-phosphate dehydrogenase, in normal cell function and in cell pathology. J. Cell. Biochem. 66(2), 133–140 (1997)
Zhu, J., He, F., Hu, S., et al.: On the nature of human housekeeping genes. Trends Genet. 24(10), 481–484 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, A., Zhu, Z., Ye, M., Wang, F. (2019). EnsembleKQC: An Unsupervised Ensemble Learning Method for Quality Control of Single Cell RNA-seq Sequencing Data. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-26969-2_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)