Skip to main content

Comparative Evaluation of Set-Level Techniques in Microarray Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6674))

Abstract

Analysis of gene expression data in terms of a priori-defined gene sets typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted in predictive classification tasks accomplished with machine learning algorithms. Here, sample features originally corresponding to genes are replaced by a much smaller number of features, each corresponding to a gene set and aggregating expressions of its members into a single real value. Classifiers learned from such transformed features promise better interpretability in that they derive class predictions from overall expressions of selected gene sets (e.g. corresponding to pathways) rather than expressions of specific genes. In a large collection of experiments we test how accurate such classifiers are compared to traditional classifiers based on genes. Furthermore, we translate some recently published gene set analysis techniques to the above proposed machine learning setting and assess their contributions to the classification accuracies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Armstrong, S.A., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)

    Article  Google Scholar 

  2. Beer, D.G., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)

    Google Scholar 

  3. Bhattacharjee, A., et al.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. 98(24), 13790–13795 (2001)

    Article  Google Scholar 

  4. Burczynski, M.E., et al.: Molecular classification of Crohn’s disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells.  8(1), 51–61 (2006)

    Google Scholar 

  5. Carolan, B.J., et al.: Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer Res. 66(22), 10729–10740 (2006)

    Article  Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMRL 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Dinu, I.: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 8(1), 242 (2007)

    Article  Google Scholar 

  8. Freije, W.A., et al.: Gene expression profiling of gliomas strongly predicts survival. Cancer Res. 64(18), 6503–6510 (2004)

    Article  Google Scholar 

  9. Goeman, J.J., Bühlmann, P.: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23(8), 980–987 (2007)

    Article  Google Scholar 

  10. Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  11. Hastie, T., et al.: The Elements of Statistical Learning. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

  12. Hippo, Y., et al.: Global Gene Expression Analysis of Gastric Cancer by Oligonucleotide Microarrays. Cancer Res. 62(1), 233–240 (2002)

    Google Scholar 

  13. Holec, M., et al.: Integrating multiple-platform expression data through gene set features. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds.) ISBRA 2009. LNCS, vol. 5542, Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Huang, D.W., et al.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. (2008)

    Google Scholar 

  15. Huang, J., et al.: Decision forest for classification of gene expression data. Comput. Biol. Med. 40, 698–704 (2010)

    Article  Google Scholar 

  16. Libalova, H., et al.: Gene expression profiling in blood of asthmatic children living in polluted region of the czech republic (project airgen). In: 10th International Conference on Environmental Mutagens (2010)

    Google Scholar 

  17. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht (1998)

    Book  MATH  Google Scholar 

  18. Mootha, V.K., et al.: Pgc-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nat. Genet. 34, 267–273 (2003)

    Article  Google Scholar 

  19. Scherzer, C.R., et al.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3), 955–960 (2007)

    Article  Google Scholar 

  20. Subramanian, A., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  21. Talantov, D., et al.: Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clin. Cancer Res. 11(20), 7234–7242 (2005)

    Article  Google Scholar 

  22. Tarca, A.L., et al.: A novel signaling pathway impact analysis. Bioinformatics 25(1), 77–82 (2009)

    Article  Google Scholar 

  23. Tomfohr, J., et al.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005)

    Article  Google Scholar 

  24. Vapnik, V.N.: The Nature of Statistical Learning. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  26. Yoon, S.S., et al.: Angiogenic profile of soft tissue sarcomas based on analysis of circulating factors and microarray gene expression. J. Surg. Res. 135(2), 282–290 (2006)

    Article  Google Scholar 

  27. Zintzaras, E., Kowald, A.: Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. Cell Cycle 40(5), 519–524 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klema, J., Holec, M., Zelezny, F., Tolar, J. (2011). Comparative Evaluation of Set-Level Techniques in Microarray Classification. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21260-4_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21259-8

  • Online ISBN: 978-3-642-21260-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics