Abstract
Classification approaches have been developed, adopted, and applied to distinguish disease classes at the molecular level using microarray data. Recently, a novel class of hierarchical probabilistic models based on a kernel-imbedding technique has become one of the best classification tools for microarray data analysis. These models were first developed as kernel-imbedded Gaussian processes (KIGPs) for binary class classification problems using microarray gene expression data, then they were further improved for multiclass classification problems under a unifying Bayesian framework. Specifically, an adaptive algorithm with a cascading structure was designed to find appropriate featuring kernels, to discover potentially significant genes, and to make optimal disease (e.g., tumor/cancer) class predictions with associated Bayesian posterior probabilities. Simulation studies and applications to publish real data showed that KIGPs performed very close to the Bayesian bound and consistently outperformed or performed among the best of a lot of state-of-the-art methods. The most unique advantage of the KIGP approach is its ability to explore both the linear and the nonlinear underlying relationships between the target features of a given disease classification problem and the involved explanatory gene expression data. This line of research has shed light on the broader usability of the KIGP approach for the analysis of other high-throughput omics data and omics data collected in time series fashion, especially when linear model based methods fail to work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Golub TR, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537.
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97:77–87.
Dudoit S, Shaffer J, Boldrick J (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18:71–103.
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Amer. Statis. Assoc. 99:96–104.
Bair E, Hastie T, Paul D et al (2006) Prediction by supervised principal component. J. Amer. Statis. Assoc. 101:119–137.
Tibshirani R, Hastie T, Narasimhan B et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. USA 99:6567–6572.
Guyon I, Weston J, Barnhill S (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46:389–422.
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5:427–443.
Lönnstedt I, Britton T (2005) Hierarchical Bayes models for cDNA microarray gene expression. Biostatistics 6:279–291.
Chu W, Ghahramani Z, Falciani F et al (2005) Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics 21:3385–3393.
Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics19:90–97.
Zhou X, Wang X, Dougherty ER (2004) Gene prediction using multinomial probit regression with Bayesian gene selection. EURASIP Journal on Applied Signal Processing 1: 115–124.
Zhou X, Liu K, Wong STC (2004) Cancer classification and prediction using logistic regression with Bayesian gene selection. Journal of Biomedical Informatics 37:249–259.
Pochet N, Smet FD, Suykens JAK et al (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20:3185–3195.
Zhou X, Wang X, Dougherty ER (2004) A Bayesian approach to nonlinear probit gene selection and classification. Journal of the Franklin Institute 341:137–156.
Zhao X, Cheung LWK (2007) A hierarchical Bayesian approach with kernel-imbedded Gaussian processes for micoarray gene expression data analysis. BMC Bioinformatics 8:67.
Zhao X, Cheung LWK (2011) Multi-class kernel-imbedded Gaussian processes for microarray data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(4):1041–1053.
Lin Y (2002) Support vector machines and the Bayes rule in classification. Data Mining and Knowledge Discovery 6:259–275.
MacKay DJC (1992) The evidence framework applied to classification networks. Neural Computation 4:720–736.
Kwok JT (2000) The evidence framework applied to support vector machines. IEEE Trans. on Neural Networks 11:1162–1173.
Gestel TV, Suykens JVK, Lanckriet G et al (2002) Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel fisher discriminant analysis. Neural Computation 14:1115–1147.
Neal RM (1996) Bayesian learning for neural networks. Springer, New York.
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge, Massachusetts.
Cristianini N, Shawe-Tayer J (2000) An introduction to support vector machines. Cambridge University Press.
Kuh A (2004) Least Square Kernel Methods and Applications. In: Soft Computing in Communications. Wang L (ed) p:361–383. Springer, Berlin.
Müller K, Mika S, Rätsch G et al (2001) An Introduction to Kernel-Based Learning Algorithms. IEEE Trans. Neural Networks 12:181–202.
Diaz-Uriarte R, Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:1–13.
Cheung LWK (2004) Use of runs statistics for pattern recognition in genomic DNA sequences. Journal of Computational Biology 11:107–124.
Nuel G (2006) Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics. Algorithms Mol Biol 1:5.
Aston J, Martin D (2007) Distributions associated with general runs and patterns in hidden Markov models. The Annals of Applied Statistics 1: 585–611.
Martin J, Regad L, Camproux A-C et al (2010) Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences. In: Advances in Data Analysis- Statistics for Industry and Technology: Theory and Applications to Reliability and Inference, Data Mining, Bioinformatics, Lifetime Data, and Neural Networks. Skiadas C (ed). p.171-180. Springer.
Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-Cell-lymphoma identified by gene expression profiling. Nature 403:503–511.
Hedenfalk I, Duggan D, Chen Y et al (2001) Gene expression profiles in hereditary breast cancer. The New England Journal of Medicine 344:539–548.
Zangrando A, Dell’orto MC, Te Kronnie G et al (2009) MLL rearrangements in pediatric acute lymphoblastic and myeloblastic leukemias: MLL specific and lineage specific signatures. BMC Med Genomics 2:36.
Chiang DY, Villanueva A, Hoshida Y et al (2008) Focal gains of VEGFA and molecular classification of hepatocellular carcinoma. Cancer Res 68:6779–6788.
Pomeroy S, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumoroutcome based on gene expression. Nature 415:436–442.
Jones J, Otu H, Spentzos D et al (2005) Gene signatures of progression and metastasis in renal cell cancer. Clin Cancer Res 11: 5730–5739.
Alon U, Barkai N, Notterman D et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96:6745–6750.
Acknowledgments
This work was partially supported by the Loyola University Medical Center Research Development Funds and the SUN Microsystems Academic Equipment Grant for Bioinformatics. The author would like to thank Dr. Xin Zhao at Sanjole Inc. for his involvement on the KIGP work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Cheung, L.WK. (2012). Classification Approaches for Microarray Gene Expression Data Analysis. In: Wang, J., Tan, A., Tian, T. (eds) Next Generation Microarray Bioinformatics. Methods in Molecular Biology, vol 802. Humana Press. https://doi.org/10.1007/978-1-61779-400-1_5
Download citation
DOI: https://doi.org/10.1007/978-1-61779-400-1_5
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-399-8
Online ISBN: 978-1-61779-400-1
eBook Packages: Springer Protocols