An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes

Ogutcen, Omer Faruk; Gormez, Zeliha; Tahir, Muhammad Atif; Seker, Huseyin

doi:10.1007/978-3-319-32703-7_95

Omer Faruk Ogutcen⁹,
Zeliha Gormez¹⁰,
Muhammad Atif Tahir⁹ &
…
Huseyin Seker⁹

Part of the book series: IFMBE Proceedings ((IFMBE,volume 57))

73 Accesses
1 Citations

Abstract

Numerous computational techniques have been applied to identify vital features of gene expression datasets that aim to increase efficiency of biomedical applications. Classification of samples is an important task to correctly recognize diseased people by identifying small but clinically meaningful genes. Conversely, it is a challenging issue for machine learning algorithms. In this paper, we apply a two-stage feature selection approach by using ensemble filter methods and Pareto Optimality. Although filter methods provide ranked lists of all features, they do not give any information about required (optimum) subset sizes of the features, namely, genes in this study. In order to address this issue, PO is incorporated with filter methods. The main aim of this study is therefore to develop a robust framework with PO, multiple feature selection methods and cross-validated subsets of the samples, which is also applicable to not only similar datasets but also different feature selection methods. The robustness of the framework has been successfully demonstrated over three well-known microarray gene expression data sets. The framework has been shown to yield equal or higher predictive accuracy with comparatively smaller feature sizes. Furthermore, the cross-validation and data variation approaches are considered in the framework. Consequently, the framework reduces the over-fitting problem and is observed to have made the gene selection more stable on different conditions.

The original version of this chapter was inadvertently published with an incorrect chapter pagination 483–488 and DOI 10.1007/978-3-319-32703-7_94. The page range and the DOI has been re-assigned. The correct page range is 489–494 and the DOI is 10.1007/978-3-319-32703-7_95. The erratum to this chapter is available at DOI: 10.1007/978-3-319-32703-7_260

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-32703-7_260

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nguyen T, Khosravi A, Creighton D et al. (2015) Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification. PLoS ONE, 10(3), p.e0120364
Google Scholar
Chuang L, Yang C, Wu K et al. (2011) A hybrid feature selection method for DNA microarray data. Computers in Biology and Medicine, 41(4), pp.228-237
Google Scholar
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A et al. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, pp.111-135
Google Scholar
Yang F, Mao K, Lee G et al. (2015) Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems. IEEE Trans. Knowl. Data Eng., 27(1), pp.88-101
Google Scholar
Gormez Z, Kursun O, Sertbas A et al. (2012) Statistical bias and variance of gene selection and cross validation methods: A case study on hypertension prediction. In: IEEE BHI2012, Hong Kong and Shenzhen,China, 2012, pp.616-619
Google Scholar
Luo L, Ye L, Luo M et al. (2011) Methods of forward feature selection based on the aggregation of classifiers generated by single attribute. Computers in Biology and Medicine, 41(7), pp.435-441
Google Scholar
Shoval O, Sheftel H, Shinar G et al. (2012) Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space. Science, 336(6085), pp.1157-1160
Google Scholar
Fernandez-Lozano C, Fernández-Blanco E, Dave K et al. (2014) Improving enzyme regulatory protein classification by means of SVM-RFE feature selection. Mol. BioSyst., 10(5), p.1063-1071
Google Scholar
Hidalgo-Muñoz A, López M, Santos I et al. (2013) Application of SVM-RFE on EEG signals for detecting the most relevant scalp regions linked to affective valence processing. Expert Systems with Applications, 40(6), pp.2102-2108
Google Scholar
Ruigang F, Ping W, Yinghui G et al. (2014) A New Feature Selection Method Based On Relief And SVM-RFE. In: ICSP2014, Beijing, 2014, pp.1363-1366
Google Scholar
Golub T R, Slonim D K, Tamayo P et al. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), pp.531-537 DOI 10.1126/science.286.5439.531
Google Scholar
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), pp.257-271
Google Scholar
Nocco A, Ottaviano G, Salto M (2014) Monopolistic Competition and Optimum Product Selection. American Economic Review, 104(5), pp.304-309
Google Scholar
Feng B (2011) Multisourcing suppliers selection in service outsourcing. J Oper Res Soc, 63(5), pp.582-596
Google Scholar
Aziz H, Brandt F, Harrenstein P (2013) Pareto optimality in coalition formation. Games and Economic Behavior, 82, pp.562-581
Google Scholar
Yaochu J, Sendhoff B (2008) Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), pp.397-415
Google Scholar
Sudeng S, Wattanapongsakorn N (2015) Post Pareto-optimal pruning algorithm for multiple objective optimization using specific extended angle dominance. Engineering Applications of Artificial Intelligence, 38, pp.221-236
Google Scholar
Hong J, and Cho S, (2009) Gene boosting for cancer classification based on gene expression profiles. Pattern Recognition, 42(9), pp.1761-1767
Google Scholar
Gormez Z (2014) Biyoenformatik uygulamalarında makine öğrenme yöntemlerinin geliştirilmesine yönelik çok kriterli yaklaşım (Multi-criteria approach to development of machine learning methods in bioinformatics). Ph.D. Thesis, Istanbul University
Google Scholar
Fox R J, Dimmic M W (2006) A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7, 126. DOI 10.1186/1471-2105-7-126
Google Scholar
Alon U, Barkai N, Notterman D et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), pp.6745-6750
Google Scholar
West M, Blanchette C, Dressman H et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 98(20), pp.11462-11467
Google Scholar
Shipp M, Ross K, Tamayo P et al. (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1), pp.68-74
Google Scholar

Download references

Author information

Authors and Affiliations

Bio-Health Informatics Research Team, Department of Computer Science and Digital Technologies, Faculty of Engineering and Environment, The University of Northumbria at Newcastle, Newcastle-upon-Tyne, NE2 1XE, UK
Omer Faruk Ogutcen, Muhammad Atif Tahir & Huseyin Seker
TUBITAK-BILGEM-UEKAE, Gebze, Kocaeli, Turkey
Zeliha Gormez

Authors

Omer Faruk Ogutcen
View author publications
You can also search for this author in PubMed Google Scholar
Zeliha Gormez
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Atif Tahir
View author publications
You can also search for this author in PubMed Google Scholar
Huseyin Seker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omer Faruk Ogutcen .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Frederick University, Nicosia, Cyprus
Efthyvoulos Kyriacou
Biomedical Research Foundation, Nicosia, Cyprus
Stelios Christofides
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
Constantinos S. Pattichis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ogutcen, O.F., Gormez, Z., Tahir, M.A., Seker, H. (2016). An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes. In: Kyriacou, E., Christofides, S., Pattichis, C. (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016. IFMBE Proceedings, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-32703-7_95

Download citation

DOI: https://doi.org/10.1007/978-3-319-32703-7_95
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32701-3
Online ISBN: 978-3-319-32703-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics