Abstract
Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
These figures were obtained in a standard ×86 machine at 2.666 GHz.
References
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences USA 96(12) 6745–6750 (1999)
Amin, K., et al.: Wilms’ tumor 1 susceptibility (wt1) gene products are selectively expressed in malignant mesothelioma. The American Journal of Pathology 146(2) 344–356 (1995)
Duan, K.B., et al.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE/ACM Transactions on Nanobioscience 4(3) 228–234 (2005)
Bu, H.L., et al.: Reducing error of tumor classification by using dimension reduction with feature selection. In: The First International Symposium on Optimization and Systems Biology, Beijing, China, 232–241 (2007)
Cai, R., et al.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72 991–999 (2009)
Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics and Data Analysis 53(4) 1462–1474 (2009)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European working session on Machine learning, Springer, New York, 164–178 (1991)
Chu, F., Wang, L.: Applications of support vector machines to cancer classification with microarray data. International Journal of Neural Systems 15(6) 475–484 (2005)
Chu, W., et al.: Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics 21(16) 3385–3393 (June 2005)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of IEEE Computational Systems Bioinformatics (2003)
Dumont, N., Arteaga, C.: Transforming growth factor-β and breast cancer: Tumor promoting effects of transforming growth factor-β. Breast Cancer Research 2 125–132 (2000)
Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439) 531–537 (October 1999)
Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62 4963–4967 (September 2002)
Goutebroze, L., et al.: Cloning and characterization of SCHIP-1, a novel protein interacting specifically with spliced isoforms and naturally occurring mutant NF2 proteins. Molecular and Cellular Biology 20(5) 1699–1712 (2000)
Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344 539–548 (2001)
Hewett, R., Kijsanayothin, F.: Tumor classification ranking from microarray data. BMC Genomics 9(2) (2008)
Hong, J.H., Cho, S.B.: Cancer classification with incremental gene selection based on DNA microarray data. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 70–74 (2008)
Hong-Qiang, W., et al.: Extracting gene regulation information for cancer classification. Pattern Recognition 40(12) 3379–3392 (2007)
Jiang, W., et al.: Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements. BMC Systems Biology 2 (2008)
Johansson, B., et al.: The prostate. Proteomic comparison of prostate cancer cell lines LNCaP-FGC and LNCaP-r reveals heatshock protein 60 as a marker for prostate malignancy 66(12) 1235–1244 (2006)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2) 145–153 (2004)
Lisboa, P., et al.: Cluster based visualisation with scatter matrices. Pattern Recognition Letters 29(13) 1814–1823 (2008)
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric atributes. In: IEEE 7th International Conference on Tools with Artificial Intelligence, 338–395 (1995)
Lurje, G., et al.: Polymorphisms in VEGF and IL-8 predict tumor recurrence in stage III colon cancer. Annals of Oncology 19 1734–1741 (2008)
Meyer, P.E., Schretter C., Bontempi, G. Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing 2(3) (2008)
National center of biothecnology information. http://www.ncbi.nlm.nih.gov/
Ng, M., Chan, L.: Informative gene discovery for cancer classification from microarray expression data. In: IEEE Machine Learning for Signal Processing, 393–398 (2005)
Öhrvik, A., et al.: Sensitive nonradiometric method for determining thymidine kinase 1 activity. Clinical Chemistry 50(9) 1597–1606 (2004)
Plesa, C., et al.: Prognostic value of immunophenotyping in elderly patients with acute myeloid leukemia: A single-institution experience. Cancer 112(3) 572–580 (2007)
Potamias, G., et al.: Gene selection via discretized gene-expression profiles and greedy feature-elimination. In: SETN, 256–266 (2004)
Ruiz, R., et al.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition 39 2383–2392 (2006)
Scherz-Shouval, R., et al.: Reactive oxygen species are essential for autophagy and specifically regulate the activity of Atg4. The EMBO Journal 26 1749–1760 (2007)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209 (March 2002)
Tang, Y., et al.: Development of two-stage svm-rfe gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3) 365–381 (2007)
Wang, H.: Towards a Unified Framework of Relevance. PhD thesis, University of Ulster (1996)
Wang, L., et al.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3) 412–419 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
González-Navarro, F.F., Belanche-Muñoz, L.A. (2011). Parsimonious Selection of Useful Genes in Microarray Gene Expression Data. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7046-6_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)