Abstract
Over the past two decades, the advances in genomics technology have opened the door for rapid biological data acquisition and have revolutionized many aspects of biomedical research. Given the complex and noisy nature of the large-scale biological data, there is a high demand for developing variable selection approaches to identifying disease biomarkers in the field of translational bioinformatics. These biomarkers offer early detection of pathogenesis, inform prognosis, provide guidance for the treatment, and monitor disease progresses. In this chapter, we focused on developing a variety of methods that systematically analyzed whole-genome gene expression data for identifying feature genes associated with patient clinical parameters. In the first method, we constructed a gene co-expression network and then selected genes that are informative for classifying different cancer subtypes based on gene connectivity within the co-expression network. In the second method, we incorporated prior biological pathway information to reconstruct a gene network and then identified hub genes that are associated with cancer prognosis. Finally, we identified protein subnetworks instead of individual genes as biomarkers for classifying different types of brain injuries. Our study has set up a framework that can be easily generalized to integrate different types of genomics and proteomics information for better identifying feature genes to improve accuracy of disease diagnosis and treatment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Majewski IJ, Bernards R. Taming the dragon: genomic biomarkers to individualize the treatment of cancer. Nat Med. 2011;17(3):304–12.
Kohavi R, John G. Wrappers for feature subset selection. Artif Intell. 1997;97:52.
Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res. 2004;5:20.
Dy FG, Brodley CE. Feature selection for unsupervised learning. J Mach Learn Res. 2004;5:45.
Law MH, Jain AK, Figueiredo M. Feature selection in mixture-based clustering. In: NIPS; 2002. p. 8.
Alelyani S, Tang J, Liu H. Feature selection for clustering: review. In: Aggarwal C, Reddy C, editors. Data clustering: algorithms and applications. Boca Raton: CRC Press; 2013.
Cawley GC, Talbot NL, Girolami M. Sparse multinomial logistic regression via bayesian l1 regularisation. In: Neural information processing systems. 2006.
Mitra P, Murthy CA, Pal S. Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell. 2002;24:12.
He X, Cai D, Niyogi P. Laplacian score for feature selection. Adv Neural Info Process Syst. 2006;18:8.
Golub T, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7.
Wang Z, et al. Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection. BMC Bioinf. 2014;15(1):153.
Wang Z, et al. Spectral feature selection and its application in high dimensional gene expression studies. In: Proceedings of the 5th ACM conference on bioinformatics, computational biology, and health informatics. ACM; 2014.
Wang Z, et al. Incorporating prior knowledge into Gene network study. Bioinformatics. 2013;29(20):2633–40.
Wang Z, et al. A Bayesian framework to improve microRNA target prediction by incorporating external information. Cancer Info. 2014;13(Suppl 7):19.
Strehl A, Ghosh J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2003;3:35.
Zhang B, Horvath S. Stat Appl Genet Mol Biol. 2005;4 (Article17).
Qiu P, Gentles AJ, Plevritis SK. Discovering biological progression underlying microarray samples. PLoS Comput Biol. 2011;7(4):e1001123.
Witten D, Tibshirani R. A framework for feature selection in clustering. J Am Stat Assoc. 2010;105(490):14.
Golub TR, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7.
Alon U, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999;96(12):6745–50.
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 2002;3(7):RESEARCH0036.
Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci USA. 2000;97(22):12079–84.
Meinshausen N, Buhlmann P. High dimensional graphs and variable selection with the lasso. Ann Stat. 2006;34:27.
Kramer N, Schafer J, Boulesteix AL. Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinform. 2009;10:384.
Parikh AP, et al. TREEGL: reverse engineering tree-evolving gene networks underlying developing biological lineages. Bioinformatics. 2011;27(13):i196–204.
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006;1436–1462.
Tibshirani, R. Regression shrinkage and selection via the lasso, J Royal Stat Soci Series B. 1996;58:22.
Wang Y, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9.
Chen Y, Park B, Han K. Qualitative reasoning of dynamic gene regulatory interactions from gene expression data. BMC Genom. 2010;11(Suppl 4):S14.
Gusev Y, et al. In silico discovery of mitosis regulation networks associated with early distant metastases in estrogen receptor positive breast cancers. Cancer Inform. 2013;12:31–51.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JR Stat Soc. 1995;57(1):289–300.
van de Vijver MJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347(25):1999–2009.
Bayes A, Grant SG. Neuroproteomics: understanding the molecular organization and complexity of the brain. Nat Rev Neurosci. 2009;10(9):635–46.
Laird AR, et al. ALE Meta-analysis workflows via the brainmap database: progress towards a probabilistic functional brain atlas. Front Neuroinform. 2009;3:23.
Zaldivar A, Krichmar JL. Allen Brain Atlas-driven visualizations: a web-based gene expression energy visualization tool. Front Neuroinform. 2014;8:51.
Emes RD, et al. Evolutionary expansion and anatomical specialization of synapse proteome complexity. Nat Neurosci. 2008;11(7):799–806.
Nagasaka Y, et al. A unique gene expression signature discriminates familial Alzheimer’s disease mutation carriers from their wild-type siblings. Proc Natl Acad Sci USA. 2005;102(41):14854–9.
Gaetz M. The neurophysiology of brain injury. Clin Neurophysiol. 2004;115(1):4–18.
Albert-Weissenberger C, Siren AL. Experimental traumatic brain injury. Exp Transl Stroke Med. 2010;2(1):16.
Chuang HY, et al. Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007;3:140.
Peri S, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003;13(10):2363–71.
Stark C, et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34(Database issue):D535–9.
Dong J, Horvath S. Understanding network concepts in modules. BMC Syst Biol. 2007;1:24.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Wang, Z., Xu, W., Liu, Y. (2016). Systematic and Integrative Analysis of Gene Expression to Identify Feature Genes Underlying Human Diseases. In: Wu, J. (eds) Transcriptomics and Gene Regulation . Translational Bioinformatics, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7450-5_7
Download citation
DOI: https://doi.org/10.1007/978-94-017-7450-5_7
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7448-2
Online ISBN: 978-94-017-7450-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)