Skip to main content

Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper - A Model Approach

  • Conference paper
  • First Online:
Strategic Innovative Marketing and Tourism

Abstract

The aim of this paper is to describe an approach to a sophisticated model of optimised subsets of data classification. This effort refers to a seemingly parallel processing of two algorithms, in order to successfully classify features through optimization processing, using a wrapping method in order to decrease overfitting and maintain accuracy. A wrapping method measures how useful the features are through the classifier’s performance optimisation. In cases where big datasets are classified the risk of overfitting to occur is high. Thus, instead of classifying big datasets, a “smarter” approach is used by classifying subsets of data, also called chromosomes, using a genetic algorithm. The genetic algorithm is used to find the best combinations of chromosomes from a series of combinations called generations. The genetic algorithm will produce a big number of chromosomes of certain number of attributes, also called genes, that will be classified from the decision tree and they will get a fitness number. This fitness number refers to classification accuracy that each chromosome got from the classification process. Only the strongest chromosomes will pass on the next generation. This method reduces the size of genes classified, eliminating at the same time the risk of overfitting. At the end, the fittest chromosomes or sets of genes or subsets of attributes will be represented. This method helps on faster and more accurate decision making. Applications of this wrapper can be used in digital marketing campaigns metrics, analytics metrics, website ranking factors, content curation, keyword research, consumer/visitor behavior analysis and other areas of marketing and business interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kohavi R (1996) Wrappers for performance enhancement and oblivious decision graphs. Stanford University, Stanford, CA. http://robotics.stanford.edu/users/ronnyk/teza.pdf

  2. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324. https://doi.org/10.1016/S0004-3702(97)00043-X

    Article  Google Scholar 

  3. William HΗ (2003) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Department of Computing and Information Sciences, Kansas State University, Manhattan, KS

    Google Scholar 

  4. Yu E, Cho S (2003) GA-SVM wrapper approach for feature selection in keystroke dynamics identity verification. In: Proceedings of 2003 INNS-IEEE International Joint Conference on Neural Networks, pp 2253–2257

    Google Scholar 

  5. Sung K, Cho S (2005) GA SVM wrapper ensemble for keystroke dynamics authentication. In: Zhang D, Jain AK (eds) Advances in biometrics. ICB 2006. Lecture notes in computer science, vol 3832. Springer, Berlin. https://doi.org/10.1007/11608288_87

    Chapter  Google Scholar 

  6. Yu E, Cho S (2006) Ensemble based on GA wrapper feature selection. Comput Ind Eng 51:111–116. https://doi.org/10.1016/j.cie.2006.07.004

    Article  Google Scholar 

  7. Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28:1825–1844. https://doi.org/10.1016/j.patrec.2007.05.011

    Article  Google Scholar 

  8. Rokach L (2007) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recogn 41:1676–1700. https://doi.org/10.1016/j.patcog.2007.10.013

    Article  Google Scholar 

  9. Huang J, Wang H, Wang W, Xiong Z (2013) A computational study for feature selection on customer credit evaluation. In: International conference on systems, man, and cybernetics

    Google Scholar 

  10. Soufan O, Kleftogiannis D, Kalnis P, Bajic VB (2015) DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PLoS One. https://doi.org/10.1371/journal.pone.0117988

    Article  Google Scholar 

  11. Hammami M, Bechikh S, Hung CC, Said LB (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. In: Conference: IEEE Congress on Evolutionary Computation, Brazil, vol 11, p 193. https://doi.org/10.1007/s12293-018-0269-2

    Article  Google Scholar 

  12. Mitchell TM (1997) Machine learning. McGraw-Hill, New York

    Google Scholar 

  13. Russel S, Norvig P (2003) Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River

    Google Scholar 

  14. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Amsterdam. ISBN: 978-0-12-374856-0

    Chapter  Google Scholar 

  15. Quinlan JR (1986) Induction of decision trees. Mach Learn 1986(1):81–106. https://doi.org/10.1007/BF00116251

    Article  Google Scholar 

  16. Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27:221–234. https://doi.org/10.1016/S0020-7373(87)80053-6

    Article  Google Scholar 

  17. Blockeel H, Raedt LD (1998) Top – down induction of first – order logical decision trees. Artif Intell 101:285–297. https://doi.org/10.1016/S0004-3702(98)00034-4

    Article  Google Scholar 

  18. Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge, MA

    Google Scholar 

  19. Whitley D (1994) A genetic algorithm tutorial. Stat Comput. https://doi.org/10.1007/BF00175354

  20. Hsu WH, Genetic algorithms. Kansas State University, Manhattan. https://pdfs.semanticscholar.org/770e/e6cfc6004739b04f7209bfa94e6f94c2d16c.pdf

  21. Davis L (1991) Handbook of genetic algorithms, vol 115. Van Nostrand Reinhold, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris C. Gkikas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Theodoridis, P.K., Gkikas, D.C. (2020). Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper - A Model Approach. In: Kavoura, A., Kefallonitis, E., Theodoridis, P. (eds) Strategic Innovative Marketing and Tourism. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-36126-6_65

Download citation

Publish with us

Policies and ethics