Abstract
Gene-expression microarray is a novel technology that allows to examine tens of thousands of genes at a time. For this reason, manual observation is not feasible anymore and machine learning methods are progressing to analyze these new data. Specifically, since the number of genes is very high, feature selection methods have proven valuable to deal with this unbalanced – high dimensionality and low cardinality – datasets. Our method is composed by a discretizer, a filter and the FVQIT (Frontier Vector Quantization using Information Theory) classifier. It is employed to classify eight DNA gene-expression microarray datasets of different kinds of cancer. A comparative study with other classifiers such as Support Vector Machine (SVM), C4.5, naïve Bayes and k-Nearest Neighbor is performed. Our approach shows excellent results outperforming all other classifiers.
This work was supported in part by Xunta de Galicia under Project Code 08TIC012105PR and under the program “Axudas para a consolidación e a estruturación de unidades de investigación competitivas” (code 2007/134), and by Spanish Ministerio de Ciencia e Innovación under Project Code TIN2009-10748. These last two are partially supported by the European Union ERDF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S., Fodor, S.: Accessing Genetic Information with High-Density DNA Arrays. Science 274(5287) (1996)
Eisen, M.B., Brown, P.O.: DNA Arrays for Analysis of Gene Expression. Methods in Enzymology, pp. 179–204. Academic Press Inc. Ltd., London (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proceedings of the National Academy of Sciences 97(1) (2000)
Der, S.D., Zhou, A., Williams, B.R.G., Silverman, R.H.: Identification of Genes Differentially Regulated by Interferon α, β, or γ Using Oligonucleotide Arrays. Proceedings of the National Academy of Sciences 95(26) (1998)
Lim, S.M., Johnson, K.F.: Methods of Microarray Data Analysis. In: Proceedings of the First Conference on Critical Assessment of Microarray Data Analysis CAMDA 2000. Kluwer Academic Publishers, Dordrecht (2001)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439) (1999)
Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F.X., Mewes, H.W.: Gene Selection from Microarray Data for Cancer Classification. A Machine Learning Approach. Journal of Computational Biology and Chemistry 29(1), 37–46 (2005)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental Wrapper-Based Gene Selection from Microarray Data for Cancer Classification. Pattern Recognition 39(12), 2383–2392 (2006)
Ambroise, C., McLachlan, G.J.: Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data. Proceedings of the National Academy of Sciences 99(10), 6562–6566 (2002)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: On the Efectiveness of Discretization on Gene Selection of Microarray Data. In: Proceedings of International Joint Conference on Neural Networks, IJCNN (in press, 2010)
Saeys, Y., Inza, I., Larranaga, P.: A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Martinez-Rego, D., Fontenla-Romero, O., Porto-Diaz, I., Alonso-Betanzos, A.: A New Supervised Local Modelling Classifier Based on Information Theory. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN, pp. 2014–2020 (2009)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029. Morgan Kaufmann, San Francisco (1993)
Yang, Y., Webb, G.I.: Proportional k-Interval Discretization for Naive-Bayes Classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 564–575. Springer, Heidelberg (2001)
Hall, M.A.: Correlation-Based Feature Selection for Machine Learning. PhD Thesis, University of Waikato, Hamilton, New Zealand (1999)
Dash, M., Liu, H.: Consistency-Based Search in Feature Selection. Artificial Intelligence Journal 151(1-2), 155–176 (2003)
Zhao, Z. and Liu H. Searching for Interacting Features. In: Proceedings of International Joint Conference on Artificial Intelligence, IJCAI, pp. 155–176 (2003)
Castillo, E., Fontenla-Romero, O., Guijarro-Berdiñas, B., Alonso-Betanzos, A.: A Global Optimum Approach for One-Layer Neural Networks. Neural Computation 14(6), 1429–1449 (2002)
Ridge, K.: Kent Ridge Bio-Medical Dataset (2009), http://datam.i2r.a-star.edu.sg/datasets/krbd (Last access: March 2010)
Van’t Veer, L.J., Dai, H., Van de Vijver, M.J., et al.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415(6871), 530–536 (2002)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, P., et al.: Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nature 415(6870), 436–442 (2002)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proceedings of the National Academy of Sciences 96(12), 6745–6750 (1999)
Alizadeh, A.A., Elisen, M.B., Davis, R.E., et al.: Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling. Nature 403(6769), 503–511 (2000)
Gordon, G.J., Jenson, R.V., Hsiao, L.L., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research 62(17), 4963–4967 (2002)
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., et al.: Use of Proteomic Patterns in Serum to Identify Ovarian Cancer. The Lancet 359(9306), 572–577 (2002)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., et al.: Gene Expression Correlates of Clinical Prostate Cancer Behavior. Cancer Cell 1(2), 203–209 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A., Fontenla-Romero, Ó. (2010). Local Modeling Classifier for Microarray Gene-Expression Data. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15825-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-15825-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15824-7
Online ISBN: 978-3-642-15825-4
eBook Packages: Computer ScienceComputer Science (R0)