Skip to main content

Classification Approaches for Microarray Gene Expression Data Analysis

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 802))

Abstract

Classification approaches have been developed, adopted, and applied to distinguish disease classes at the molecular level using microarray data. Recently, a novel class of hierarchical probabilistic models based on a kernel-imbedding technique has become one of the best classification tools for microarray data analysis. These models were first developed as kernel-imbedded Gaussian processes (KIGPs) for binary class classification problems using microarray gene expression data, then they were further improved for multiclass classification problems under a unifying Bayesian framework. Specifically, an adaptive algorithm with a cascading structure was designed to find appropriate featuring kernels, to discover potentially significant genes, and to make optimal disease (e.g., tumor/cancer) class predictions with associated Bayesian posterior probabilities. Simulation studies and applications to publish real data showed that KIGPs performed very close to the Bayesian bound and consistently outperformed or performed among the best of a lot of state-of-the-art methods. The most unique advantage of the KIGP approach is its ability to explore both the linear and the nonlinear underlying relationships between the target features of a given disease classification problem and the involved explanatory gene expression data. This line of research has shed light on the broader usability of the KIGP approach for the analysis of other high-throughput omics data and omics data collected in time series fashion, especially when linear model based methods fail to work.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Golub TR, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537.

    Article  PubMed  CAS  Google Scholar 

  2. Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97:77–87.

    CAS  Google Scholar 

  3. Dudoit S, Shaffer J, Boldrick J (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18:71–103.

    Article  Google Scholar 

  4. Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Amer. Statis. Assoc. 99:96–104.

    Google Scholar 

  5. Bair E, Hastie T, Paul D et al (2006) Prediction by supervised principal component. J. Amer. Statis. Assoc. 101:119–137.

    Google Scholar 

  6. Tibshirani R, Hastie T, Narasimhan B et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. USA 99:6567–6572.

    Article  PubMed  CAS  Google Scholar 

  7. Guyon I, Weston J, Barnhill S (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46:389–422.

    Article  Google Scholar 

  8. Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5:427–443.

    Article  PubMed  Google Scholar 

  9. Lönnstedt I, Britton T (2005) Hierarchical Bayes models for cDNA microarray gene expression. Biostatistics 6:279–291.

    Article  PubMed  Google Scholar 

  10. Chu W, Ghahramani Z, Falciani F et al (2005) Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics 21:3385–3393.

    Article  PubMed  CAS  Google Scholar 

  11. Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics19:90–97.

    Article  PubMed  CAS  Google Scholar 

  12. Zhou X, Wang X, Dougherty ER (2004) Gene prediction using multinomial probit regression with Bayesian gene selection. EURASIP Journal on Applied Signal Processing 1: 115–124.

    Google Scholar 

  13. Zhou X, Liu K, Wong STC (2004) Cancer classification and prediction using logistic regression with Bayesian gene selection. Journal of Biomedical Informatics 37:249–259.

    Article  PubMed  CAS  Google Scholar 

  14. Pochet N, Smet FD, Suykens JAK et al (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20:3185–3195.

    Article  PubMed  CAS  Google Scholar 

  15. Zhou X, Wang X, Dougherty ER (2004) A Bayesian approach to nonlinear probit gene selection and classification. Journal of the Franklin Institute 341:137–156.

    Article  Google Scholar 

  16. Zhao X, Cheung LWK (2007) A hierarchical Bayesian approach with kernel-imbedded Gaussian processes for micoarray gene expression data analysis. BMC Bioinformatics 8:67.

    Article  PubMed  Google Scholar 

  17. Zhao X, Cheung LWK (2011) Multi-class kernel-imbedded Gaussian processes for microarray data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(4):1041–1053.

    Google Scholar 

  18. Lin Y (2002) Support vector machines and the Bayes rule in classification. Data Mining and Knowledge Discovery 6:259–275.

    Article  Google Scholar 

  19. MacKay DJC (1992) The evidence framework applied to classification networks. Neural Computation 4:720–736.

    Article  Google Scholar 

  20. Kwok JT (2000) The evidence framework applied to support vector machines. IEEE Trans. on Neural Networks 11:1162–1173.

    Article  CAS  Google Scholar 

  21. Gestel TV, Suykens JVK, Lanckriet G et al (2002) Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel fisher discriminant analysis. Neural Computation 14:1115–1147.

    Article  PubMed  Google Scholar 

  22. Neal RM (1996) Bayesian learning for neural networks. Springer, New York.

    Book  Google Scholar 

  23. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge, Massachusetts.

    Google Scholar 

  24. Cristianini N, Shawe-Tayer J (2000) An introduction to support vector machines. Cambridge University Press.

    Google Scholar 

  25. Kuh A (2004) Least Square Kernel Methods and Applications. In: Soft Computing in Communications. Wang L (ed) p:361–383. Springer, Berlin.

    Google Scholar 

  26. Müller K, Mika S, Rätsch G et al (2001) An Introduction to Kernel-Based Learning Algorithms. IEEE Trans. Neural Networks 12:181–202.

    Article  Google Scholar 

  27. Diaz-Uriarte R, Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:1–13.

    Article  Google Scholar 

  28. Cheung LWK (2004) Use of runs statistics for pattern recognition in genomic DNA sequences. Journal of Computational Biology 11:107–124.

    Article  PubMed  CAS  Google Scholar 

  29. Nuel G (2006) Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics. Algorithms Mol Biol 1:5.

    Article  PubMed  Google Scholar 

  30. Aston J, Martin D (2007) Distributions associated with general runs and patterns in hidden Markov models. The Annals of Applied Statistics 1: 585–611.

    Article  Google Scholar 

  31. Martin J, Regad L, Camproux A-C et al (2010) Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences. In: Advances in Data Analysis- Statistics for Industry and Technology: Theory and Applications to Reliability and Inference, Data Mining, Bioinformatics, Lifetime Data, and Neural Networks. Skiadas C (ed). p.171-180. Springer.

    Google Scholar 

  32. Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-Cell-lymphoma identified by gene expression profiling. Nature 403:503–511.

    Article  PubMed  CAS  Google Scholar 

  33. Hedenfalk I, Duggan D, Chen Y et al (2001) Gene expression profiles in hereditary breast cancer. The New England Journal of Medicine 344:539–548.

    Article  PubMed  CAS  Google Scholar 

  34. Zangrando A, Dell’orto MC, Te Kronnie G et al (2009) MLL rearrangements in pediatric acute lymphoblastic and myeloblastic leukemias: MLL specific and lineage specific signatures. BMC Med Genomics 2:36.

    Article  PubMed  Google Scholar 

  35. Chiang DY, Villanueva A, Hoshida Y et al (2008) Focal gains of VEGFA and molecular classification of hepatocellular carcinoma. Cancer Res 68:6779–6788.

    Article  PubMed  CAS  Google Scholar 

  36. Pomeroy S, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumoroutcome based on gene expression. Nature 415:436–442.

    Article  PubMed  CAS  Google Scholar 

  37. Jones J, Otu H, Spentzos D et al (2005) Gene signatures of progression and metastasis in renal cell cancer. Clin Cancer Res 11: 5730–5739.

    Article  PubMed  CAS  Google Scholar 

  38. Alon U, Barkai N, Notterman D et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96:6745–6750.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Loyola University Medical Center Research Development Funds and the SUN Microsystems Academic Equipment Grant for Bioinformatics. The author would like to thank Dr. Xin Zhao at Sanjole Inc. for his involvement on the KIGP work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leo Wang-Kit Cheung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Cheung, L.WK. (2012). Classification Approaches for Microarray Gene Expression Data Analysis. In: Wang, J., Tan, A., Tian, T. (eds) Next Generation Microarray Bioinformatics. Methods in Molecular Biology, vol 802. Humana Press. https://doi.org/10.1007/978-1-61779-400-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-400-1_5

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-399-8

  • Online ISBN: 978-1-61779-400-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics