Data Analysis in Single-Cell Transcriptome Sequencing
Single-cell transcriptome sequencing, often referred to as single-cell RNA sequencing (scRNA-seq), is used to measure gene expression at the single-cell level and provides a higher resolution of cellular differences than bulk RNA-seq. With more detailed and accurate information, scRNA-seq will greatly promote the understanding of cell functions, disease progression, and treatment response. Although the scRNA-seq experimental protocols have been improved very quickly, many challenges in the scRNA-seq data analysis still need to be overcome. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization and cluster analysis, which are the two most important challenges in the scRNA-seq data analysis. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs) using scRNA-seq. Suggestions have also been made to help researchers rationally design their scRNA-seq experiments and data analysis in their future studies.
Key wordsscRNA-seq Single-cell transcriptome sequencing Normalization Cluster analysis
I appreciate help equally from the people listed below. They are Professor Wenjun Bu; Professor Lin Liu; Ph.D. student Hua Wang; Master’s student Yu Sun and Deshui Yu from College of Life Sciences, Nankai University; Professor Jishou Ruan; PhD student Zhenfeng Wu from School of Mathematical Sciences, Nankai University; and Associate Professor Weixiang Liu from Shenzhen University.
- 1.Gao S, Ou J, Xiao K (2014) R language and Bioconductor in bioinformatics applications (Chinese Edition). Tianjin Science and Technology Translation Publishing, Co. Ltd, TianjinGoogle Scholar
- 3.Zhang M, Sun H, Fei Z, Zhan F, Gong X, Gao S (2014) Fastq_clean: an optimized pipeline to clean the Illumina sequencing data with quality control. 2014 I.E. international conference on bioinformatics and biomedicine, pp 44–48Google Scholar
- 5.Gao S, Tian X, Chang H, Sun Y, Wu Z, Cheng Z, Dong P, Zhao Q, Ruan J, Bu W (2017) Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion. https://doi.org/10.1016/j.mito.2017.08.002
- 15.Ren Y, Zhang J, Sun Y, Wu Z, Ruan J, He B, Liu G, Gao S, Bu W (2016) Full-length transcriptome sequencing on PacBio platform (in Chinese). Chin Sci Bull 11(61):1250–1254Google Scholar
- 19.Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis – a brief tutorial. Procof Intjoint Confon Neural Networks 3(94):387–391Google Scholar
- 21.Maaten LVD, Hinton G (2008) Viualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605Google Scholar
- 22.Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. Adv Neural Inf Proces Syst 17:777–784Google Scholar
- 26.Kfgl B (2002) Intrinsic dimension estimation using packing numbers. Adv Neural Inform Process Syst NIPS-02:697–704Google Scholar
- 30.Wu Z, Liu W, Jin X, Yu D, Wang H, Liu L, Ruan J, Gao S (2018) NormExpression: an R package to normalize gene expression data using evaluated methods. bioRxiv. https://doi.org/10.1101/251140