Skip to main content

An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes

  • Conference paper
  • First Online:
XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016

Part of the book series: IFMBE Proceedings ((IFMBE,volume 57))

Abstract

Numerous computational techniques have been applied to identify vital features of gene expression datasets that aim to increase efficiency of biomedical applications. Classification of samples is an important task to correctly recognize diseased people by identifying small but clinically meaningful genes. Conversely, it is a challenging issue for machine learning algorithms. In this paper, we apply a two-stage feature selection approach by using ensemble filter methods and Pareto Optimality. Although filter methods provide ranked lists of all features, they do not give any information about required (optimum) subset sizes of the features, namely, genes in this study. In order to address this issue, PO is incorporated with filter methods. The main aim of this study is therefore to develop a robust framework with PO, multiple feature selection methods and cross-validated subsets of the samples, which is also applicable to not only similar datasets but also different feature selection methods. The robustness of the framework has been successfully demonstrated over three well-known microarray gene expression data sets. The framework has been shown to yield equal or higher predictive accuracy with comparatively smaller feature sizes. Furthermore, the cross-validation and data variation approaches are considered in the framework. Consequently, the framework reduces the over-fitting problem and is observed to have made the gene selection more stable on different conditions.

The original version of this chapter was inadvertently published with an incorrect chapter pagination 483–488 and DOI 10.1007/978-3-319-32703-7_94. The page range and the DOI has been re-assigned. The correct page range is 489–494 and the DOI is 10.1007/978-3-319-32703-7_95. The erratum to this chapter is available at DOI: 10.1007/978-3-319-32703-7_260

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-32703-7_260

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nguyen T, Khosravi A, Creighton D et al. (2015) Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification. PLoS ONE, 10(3), p.e0120364

    Google Scholar 

  2. Chuang L, Yang C, Wu K et al. (2011) A hybrid feature selection method for DNA microarray data. Computers in Biology and Medicine, 41(4), pp.228-237

    Google Scholar 

  3. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A et al. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, pp.111-135

    Google Scholar 

  4. Yang F, Mao K, Lee G et al. (2015) Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems. IEEE Trans. Knowl. Data Eng., 27(1), pp.88-101

    Google Scholar 

  5. Gormez Z, Kursun O, Sertbas A et al. (2012) Statistical bias and variance of gene selection and cross validation methods: A case study on hypertension prediction. In: IEEE BHI2012, Hong Kong and Shenzhen,China, 2012, pp.616-619

    Google Scholar 

  6. Luo L, Ye L, Luo M et al. (2011) Methods of forward feature selection based on the aggregation of classifiers generated by single attribute. Computers in Biology and Medicine, 41(7), pp.435-441

    Google Scholar 

  7. Shoval O, Sheftel H, Shinar G et al. (2012) Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space. Science, 336(6085), pp.1157-1160

    Google Scholar 

  8. Fernandez-Lozano C, Fernández-Blanco E, Dave K et al. (2014) Improving enzyme regulatory protein classification by means of SVM-RFE feature selection. Mol. BioSyst., 10(5), p.1063-1071

    Google Scholar 

  9. Hidalgo-Muñoz A, López M, Santos I et al. (2013) Application of SVM-RFE on EEG signals for detecting the most relevant scalp regions linked to affective valence processing. Expert Systems with Applications, 40(6), pp.2102-2108

    Google Scholar 

  10. Ruigang F, Ping W, Yinghui G et al. (2014) A New Feature Selection Method Based On Relief And SVM-RFE. In: ICSP2014, Beijing, 2014, pp.1363-1366

    Google Scholar 

  11. Golub T R, Slonim D K, Tamayo P et al. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), pp.531-537 DOI 10.1126/science.286.5439.531

    Google Scholar 

  12. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), pp.257-271

    Google Scholar 

  13. Nocco A, Ottaviano G, Salto M (2014) Monopolistic Competition and Optimum Product Selection. American Economic Review, 104(5), pp.304-309

    Google Scholar 

  14. Feng B (2011) Multisourcing suppliers selection in service outsourcing. J Oper Res Soc, 63(5), pp.582-596

    Google Scholar 

  15. Aziz H, Brandt F, Harrenstein P (2013) Pareto optimality in coalition formation. Games and Economic Behavior, 82, pp.562-581

    Google Scholar 

  16. Yaochu J, Sendhoff B (2008) Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), pp.397-415

    Google Scholar 

  17. Sudeng S, Wattanapongsakorn N (2015) Post Pareto-optimal pruning algorithm for multiple objective optimization using specific extended angle dominance. Engineering Applications of Artificial Intelligence, 38, pp.221-236

    Google Scholar 

  18. Hong J, and Cho S, (2009) Gene boosting for cancer classification based on gene expression profiles. Pattern Recognition, 42(9), pp.1761-1767

    Google Scholar 

  19. Gormez Z (2014) Biyoenformatik uygulamalarında makine öğrenme yöntemlerinin geliştirilmesine yönelik çok kriterli yaklaşım (Multi-criteria approach to development of machine learning methods in bioinformatics). Ph.D. Thesis, Istanbul University

    Google Scholar 

  20. Fox R J, Dimmic M W (2006) A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7, 126. DOI 10.1186/1471-2105-7-126

    Google Scholar 

  21. Alon U, Barkai N, Notterman D et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), pp.6745-6750

    Google Scholar 

  22. West M, Blanchette C, Dressman H et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 98(20), pp.11462-11467

    Google Scholar 

  23. Shipp M, Ross K, Tamayo P et al. (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1), pp.68-74

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omer Faruk Ogutcen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ogutcen, O.F., Gormez, Z., Tahir, M.A., Seker, H. (2016). An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes. In: Kyriacou, E., Christofides, S., Pattichis, C. (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016. IFMBE Proceedings, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-32703-7_95

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32703-7_95

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32701-3

  • Online ISBN: 978-3-319-32703-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics