Abstract
Numerous computational techniques have been applied to identify vital features of gene expression datasets that aim to increase efficiency of biomedical applications. Classification of samples is an important task to correctly recognize diseased people by identifying small but clinically meaningful genes. Conversely, it is a challenging issue for machine learning algorithms. In this paper, we apply a two-stage feature selection approach by using ensemble filter methods and Pareto Optimality. Although filter methods provide ranked lists of all features, they do not give any information about required (optimum) subset sizes of the features, namely, genes in this study. In order to address this issue, PO is incorporated with filter methods. The main aim of this study is therefore to develop a robust framework with PO, multiple feature selection methods and cross-validated subsets of the samples, which is also applicable to not only similar datasets but also different feature selection methods. The robustness of the framework has been successfully demonstrated over three well-known microarray gene expression data sets. The framework has been shown to yield equal or higher predictive accuracy with comparatively smaller feature sizes. Furthermore, the cross-validation and data variation approaches are considered in the framework. Consequently, the framework reduces the over-fitting problem and is observed to have made the gene selection more stable on different conditions.
The original version of this chapter was inadvertently published with an incorrect chapter pagination 483–488 and DOI 10.1007/978-3-319-32703-7_94. The page range and the DOI has been re-assigned. The correct page range is 489–494 and the DOI is 10.1007/978-3-319-32703-7_95. The erratum to this chapter is available at DOI: 10.1007/978-3-319-32703-7_260
An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-32703-7_260
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nguyen T, Khosravi A, Creighton D et al. (2015) Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification. PLoS ONE, 10(3), p.e0120364
Chuang L, Yang C, Wu K et al. (2011) A hybrid feature selection method for DNA microarray data. Computers in Biology and Medicine, 41(4), pp.228-237
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A et al. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, pp.111-135
Yang F, Mao K, Lee G et al. (2015) Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems. IEEE Trans. Knowl. Data Eng., 27(1), pp.88-101
Gormez Z, Kursun O, Sertbas A et al. (2012) Statistical bias and variance of gene selection and cross validation methods: A case study on hypertension prediction. In: IEEE BHI2012, Hong Kong and Shenzhen,China, 2012, pp.616-619
Luo L, Ye L, Luo M et al. (2011) Methods of forward feature selection based on the aggregation of classifiers generated by single attribute. Computers in Biology and Medicine, 41(7), pp.435-441
Shoval O, Sheftel H, Shinar G et al. (2012) Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space. Science, 336(6085), pp.1157-1160
Fernandez-Lozano C, Fernández-Blanco E, Dave K et al. (2014) Improving enzyme regulatory protein classification by means of SVM-RFE feature selection. Mol. BioSyst., 10(5), p.1063-1071
Hidalgo-Muñoz A, López M, Santos I et al. (2013) Application of SVM-RFE on EEG signals for detecting the most relevant scalp regions linked to affective valence processing. Expert Systems with Applications, 40(6), pp.2102-2108
Ruigang F, Ping W, Yinghui G et al. (2014) A New Feature Selection Method Based On Relief And SVM-RFE. In: ICSP2014, Beijing, 2014, pp.1363-1366
Golub T R, Slonim D K, Tamayo P et al. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), pp.531-537 DOI 10.1126/science.286.5439.531
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), pp.257-271
Nocco A, Ottaviano G, Salto M (2014) Monopolistic Competition and Optimum Product Selection. American Economic Review, 104(5), pp.304-309
Feng B (2011) Multisourcing suppliers selection in service outsourcing. J Oper Res Soc, 63(5), pp.582-596
Aziz H, Brandt F, Harrenstein P (2013) Pareto optimality in coalition formation. Games and Economic Behavior, 82, pp.562-581
Yaochu J, Sendhoff B (2008) Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), pp.397-415
Sudeng S, Wattanapongsakorn N (2015) Post Pareto-optimal pruning algorithm for multiple objective optimization using specific extended angle dominance. Engineering Applications of Artificial Intelligence, 38, pp.221-236
Hong J, and Cho S, (2009) Gene boosting for cancer classification based on gene expression profiles. Pattern Recognition, 42(9), pp.1761-1767
Gormez Z (2014) Biyoenformatik uygulamalarında makine öğrenme yöntemlerinin geliştirilmesine yönelik çok kriterli yaklaşım (Multi-criteria approach to development of machine learning methods in bioinformatics). Ph.D. Thesis, Istanbul University
Fox R J, Dimmic M W (2006) A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7, 126. DOI 10.1186/1471-2105-7-126
Alon U, Barkai N, Notterman D et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), pp.6745-6750
West M, Blanchette C, Dressman H et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 98(20), pp.11462-11467
Shipp M, Ross K, Tamayo P et al. (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1), pp.68-74
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ogutcen, O.F., Gormez, Z., Tahir, M.A., Seker, H. (2016). An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes. In: Kyriacou, E., Christofides, S., Pattichis, C. (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016. IFMBE Proceedings, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-32703-7_95
Download citation
DOI: https://doi.org/10.1007/978-3-319-32703-7_95
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32701-3
Online ISBN: 978-3-319-32703-7
eBook Packages: EngineeringEngineering (R0)