Advertisement

On Improving the Prediction Accuracy of a Decision Tree Using Genetic Algorithm

  • Md. Nasim Adnan
  • Md. Zahidul Islam
  • Md. Mostofa Akbar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11323)

Abstract

Decision trees are one of the most popular classifiers used in a wide range of real-world problems. Thus, it is very important to achieve higher prediction accuracy for decision trees. Most of the well-known decision tree induction algorithms used in practice are based on greedy approaches and hence do not consider conditional dependencies among the attributes. As a result, they may generate suboptimal solutions. In literature, often genetic programming-based (a complex variant of genetic algorithm) decision tree induction algorithms have been proposed to eliminate some of the problems of greedy approaches. However, none of the algorithms proposed so far can effectively address conditional dependencies among the attributes. In this paper, we propose a new, easy-to-implement genetic algorithm-based decision tree induction technique which is more likely to ascertain conditional dependencies among the attributes. An elaborate experimentation is conducted on thirty well known data sets from the UCI Machine Learning Repository in order to validate the effectiveness of the proposed technique.

Keywords

Decision tree Genetic algorithm Prediction accuracy Knowledge discovery 

References

  1. 1.
    Abellan, J.: Ensembles of decision trees based on imprecise probabilities and uncertainty measures. Inf. Fusion 14, 423–430 (2013)CrossRefGoogle Scholar
  2. 2.
    Adnan, M.N., Islam, M.Z.: ComboSplit: combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)Google Scholar
  3. 3.
    Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)CrossRefGoogle Scholar
  4. 4.
    Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)CrossRefGoogle Scholar
  5. 5.
    Adnan, M.N., Islam, M.Z., Kwan, P.W.H.: Extended space decision tree. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 219–230. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-45652-1_23CrossRefGoogle Scholar
  6. 6.
    Aitkenhead, M.J.: A co-evolving decision tree classification method. Expert Syst. Appl. 34(1), 18–25 (2008)CrossRefGoogle Scholar
  7. 7.
    Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 42(3), 291–312 (2012)CrossRefGoogle Scholar
  9. 9.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)zbMATHGoogle Scholar
  10. 10.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1985)zbMATHGoogle Scholar
  11. 11.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)CrossRefGoogle Scholar
  12. 12.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Espejo, P.G., Sebastian, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 40(2), 121–144 (2010)CrossRefGoogle Scholar
  14. 14.
    Fu, Z., Golden, B., Lele, S., Raghavan, S., Wasli, E.: Genetically engineered decision trees: population diversity produces smarter trees. Oper. Res. 51(6), 894–907 (2003)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)zbMATHGoogle Scholar
  16. 16.
    Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge (1992)Google Scholar
  17. 17.
    Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)Google Scholar
  18. 18.
    Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient classification in data mining. In: Proceedings of the International Workshop Research Issues on Data Engineering, pp. 111–120 (1997)Google Scholar
  19. 19.
    Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)Google Scholar
  20. 20.
    Kim, Y.W., Oh, I.S.: Classifier ensemble selection using hybrid genetic algorithms. Pattern Recogn. Lett. 29, 796–802 (2008)CrossRefGoogle Scholar
  21. 21.
    Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)CrossRefGoogle Scholar
  22. 22.
    Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)Google Scholar
  23. 23.
    Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016
  24. 24.
    Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–229 (2000)CrossRefGoogle Scholar
  25. 25.
    Liu, Y., Shen, Y., Wu, X.: Automatic clustering using genetic algorithms. Appl. Math. Comput. 218, 1267–1279 (2011)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Mason, R., Lind, D., Marchal, W.: Statistics: An Introduction. Brooks/Cole Publishing Company, New York (1998)Google Scholar
  27. 27.
    Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)Google Scholar
  28. 28.
    Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)CrossRefGoogle Scholar
  29. 29.
    Murthy, S.K., Kasif, S., Salzberg, S.S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)CrossRefGoogle Scholar
  30. 30.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  31. 31.
    Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)CrossRefGoogle Scholar
  32. 32.
    Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl.-Based Syst. 71, 345–365 (2014)CrossRefGoogle Scholar
  33. 33.
    Shirasaka, M., Zhao, Q., Hammami, O., Kuroda, K., Saito, K.: Automatic design of binary decision trees based on genetic programming. In: Second Asia-Pacific Conference on Simulated Evolution and Learning. Australian Defense Force Academy, Canberra (1998)Google Scholar
  34. 34.
    Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45164-1_41CrossRefGoogle Scholar
  35. 35.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)Google Scholar
  36. 36.
    Tanigawa, T., Zhao, Q.: A study on efficient generation of decision trees using genetic programming. In: Genetic and Evolutionary Computation Conference (GECCO’2000), pp. 1047–1052. Morgan Kaufmann (2000)Google Scholar
  37. 37.
    Triola, M.F.: Elementary Statistics. Addison Wesley Longman Inc., Reading (2001)zbMATHGoogle Scholar
  38. 38.
    Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4, 65–85 (1994)CrossRefGoogle Scholar
  39. 39.
    Wilcoxon, F.: Individual comparison by ranking methods. Biometrics 1, 80–83 (1945)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)CrossRefGoogle Scholar
  41. 41.
    Zhao, H.: A multi-objective genetic programming programming approach to developing pareto optimal decision trees. Decis. Support Syst. 43(3), 809–826 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Md. Nasim Adnan
    • 1
  • Md. Zahidul Islam
    • 1
  • Md. Mostofa Akbar
    • 2
  1. 1.School of Computing and MathematicsCharles Sturt UniversityBathurstAustralia
  2. 2.Department of Computer Science and EngineeringBangladesh University of Engineering and Technology (BUET)DhakaBangladesh

Personalised recommendations