Abstract
Analysis of gene expression data in terms of a priori-defined gene sets typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted in predictive classification tasks accomplished with machine learning algorithms. Here, sample features originally corresponding to genes are replaced by a much smaller number of features, each corresponding to a gene set and aggregating expressions of its members into a single real value. Classifiers learned from such transformed features promise better interpretability in that they derive class predictions from overall expressions of selected gene sets (e.g. corresponding to pathways) rather than expressions of specific genes. In a large collection of experiments we test how accurate such classifiers are compared to traditional classifiers based on genes. Furthermore, we translate some recently published gene set analysis techniques to the above proposed machine learning setting and assess their contributions to the classification accuracies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Armstrong, S.A., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)
Beer, D.G., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)
Bhattacharjee, A., et al.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. 98(24), 13790–13795 (2001)
Burczynski, M.E., et al.: Molecular classification of Crohn’s disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. 8(1), 51–61 (2006)
Carolan, B.J., et al.: Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer Res. 66(22), 10729–10740 (2006)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMRL 7, 1–30 (2006)
Dinu, I.: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 8(1), 242 (2007)
Freije, W.A., et al.: Gene expression profiling of gliomas strongly predicts survival. Cancer Res. 64(18), 6503–6510 (2004)
Goeman, J.J., Bühlmann, P.: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23(8), 980–987 (2007)
Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Hastie, T., et al.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Hippo, Y., et al.: Global Gene Expression Analysis of Gastric Cancer by Oligonucleotide Microarrays. Cancer Res. 62(1), 233–240 (2002)
Holec, M., et al.: Integrating multiple-platform expression data through gene set features. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds.) ISBRA 2009. LNCS, vol. 5542, Springer, Heidelberg (2009)
Huang, D.W., et al.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. (2008)
Huang, J., et al.: Decision forest for classification of gene expression data. Comput. Biol. Med. 40, 698–704 (2010)
Libalova, H., et al.: Gene expression profiling in blood of asthmatic children living in polluted region of the czech republic (project airgen). In: 10th International Conference on Environmental Mutagens (2010)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht (1998)
Mootha, V.K., et al.: Pgc-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nat. Genet. 34, 267–273 (2003)
Scherzer, C.R., et al.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3), 955–960 (2007)
Subramanian, A., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)
Talantov, D., et al.: Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clin. Cancer Res. 11(20), 7234–7242 (2005)
Tarca, A.L., et al.: A novel signaling pathway impact analysis. Bioinformatics 25(1), 77–82 (2009)
Tomfohr, J., et al.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005)
Vapnik, V.N.: The Nature of Statistical Learning. Springer, Heidelberg (2000)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Yoon, S.S., et al.: Angiogenic profile of soft tissue sarcomas based on analysis of circulating factors and microarray gene expression. J. Surg. Res. 135(2), 282–290 (2006)
Zintzaras, E., Kowald, A.: Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. Cell Cycle 40(5), 519–524 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klema, J., Holec, M., Zelezny, F., Tolar, J. (2011). Comparative Evaluation of Set-Level Techniques in Microarray Classification. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)