Advertisement

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

  • Muhammad Quamber Ali
  • Hammad MajeedEmail author
Research Paper

Abstract

Genetic programming (GP) is a top performer in solving classification and clustering problems, in general and symbolic regression problems, in particular. GP has produced impressive results and has outperformed human generated results for 76 different problems taken from 22 different fields. There remain a number of significant open issues despite its impressive results. Among them are high computational cost, premature convergence and high error rate. These issues must be addressed for GP to realise its full potential. In this paper a simple and cost effective technique called Difficult First Strategy-GP (DFS-GP) is proposed to address the aforementioned problems. The proposed technique involves pre-processing and sampling steps. In the pre-processing step, difficult to evolve data points by GP from the given data set are marked and in the sampling step, they are introduced in the evolutionary run by using two newly defined sampling techniques, called difficult points first and difficulty proportionate selection. These techniques are biased towards selecting difficult data points during the initial stage of a run and of easy points in the latter stage of a run. This ensures that GP does not ignore difficult-to-evolve data points during a run. Experiments have shown that GP coupled with DFS avoids premature convergence and attained higher fitness than standard GP using same fitness evaluations. Performance of the proposed technique was evaluated on three commonly known metrics, which are convergence speed, fitness and variance in the best results. Our results have shown that the proposed setups had achieved 10–15% better fitness values than Standard GP. Furthermore, the proposed setups had consistently generated better quality solutions on all the problems and utilized 30–50% less computations to match the best performance of Standard GP.

Keywords

Difficult first strategy Genetic programming Pre-processing Sampling techniques Dataset sampling Machine learning 

Notes

References

  1. 1.
    Darwaish A, Majeed H, Ali MQ, Rafay A (2017) Dynamic programming inspired genetic programming to solve regression problems. Int J Adv Comput Sci Appl.  https://doi.org/10.14569/IJACSA.2017.080463 CrossRefGoogle Scholar
  2. 2.
    Dhiman G, Kaur A (2017) Spotted hyena optimizer for solving engineering design problems. In: 2017 international conference on machine learning and data science (MLDS), pp 114–119.  https://doi.org/10.1109/MLDS.2017.5
  3. 3.
    Dhiman G, Kumar V (2018a) Emperor penguin optimizer: a bio-inspired algorithm for engineering problems. Knowl Based Syst 159:20–50.  https://doi.org/10.1016/j.knosys.2018.06.001 CrossRefGoogle Scholar
  4. 4.
    Dhiman G, Kumar V (2018b) Multi-objective spotted hyena optimizer: a multi-objective optimization algorithm for engineering problems. Knowl Based Syst 150:175–197.  https://doi.org/10.1016/j.knosys.2018.03.011 CrossRefGoogle Scholar
  5. 5.
    Dhiman G, Kumar V (2019) Spotted hyena optimizer for solving complex and non-linear constrained engineering problems. In: Yadav N, Yadav A, Bansal JC, Deep K, Kim JH (eds) Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 857–867CrossRefGoogle Scholar
  6. 6.
    Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: O’Neill M, Vanneschi L, Gustafson S, Esparcia Alcázar AI, De Falco I, Della Cioppa A, Tarantino E (eds) Genetic programming. Springer, Berlin, pp 266–277CrossRefGoogle Scholar
  7. 7.
    Gathercole C, Ross P (1994) Dynamic training subset selection for supervised learning in genetic programming. In: Davidor Y, Schwefel HP, Männer R (eds) Parallel problem solving from nature—PPSN III. Springer, Berlin, pp 312–321CrossRefGoogle Scholar
  8. 8.
    Giacobini M, Tomassini M, Vanneschi L (2002) Limiting the number of fitness cases in genetic programming using statistics. In: Guervós JJM, Adamidis P, Beyer HG, Schwefel HP, Fernández-Villacañas JL (eds) Parallel problem solving from nature—PPSN VII. Springer, Berlin, pp 371–380CrossRefGoogle Scholar
  9. 9.
    Gonçalves I, Silva S (2013) Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming. Springer, Berlin, pp 73–84CrossRefGoogle Scholar
  10. 10.
    Gonçalves I, Silva S, Melo JB, Carreiras JM (2012) Random sampling technique for overfitting control in genetic programming. In: European conference on genetic programming. Springer, pp 218–229Google Scholar
  11. 11.
    Kommenda M, Affenzeller M, Burlacu B, Kronberger G, Winkler SM (2014) Genetic programming with data migration for symbolic regression. In: Proceedings of the companion publication of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 1361–1366Google Scholar
  12. 12.
    La Cava W, Spector L, Danai K (2016) Epsilon-lexicase selection for regression. In: Proceedings of the genetic and evolutionary computation conference 2016, ACM, New York, NY, USA, GECCO ’16, pp 741–748.  https://doi.org/10.1145/2908812.2908898,
  13. 13.
    Lasarczyk CW, Dittrich P, Banzhaf W (2004a) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242.  https://doi.org/10.1162/106365604773955157 CrossRefGoogle Scholar
  14. 14.
    Lasarczyk CWG, Dittrich P, Banzhaf W (2004b) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242.  https://doi.org/10.1162/106365604773955157 CrossRefGoogle Scholar
  15. 15.
    Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, ACM, New York, NY, USA, GECCO ’07, pp 1659–1666.  https://doi.org/10.1145/1276958.1277286,
  16. 16.
    Martínez Y, Trujillo L, Naredo E, Legrand P (2014) A comparison offitness-case sampling methods for symbolic regression with genetic programming V. In: Tantar AA, Tantar E, Sun JQ, Zhang W, Ding Q, Schütze O, Emmerich M, Legrand P, Del Moral P, Coello Coello CA (eds) EVOLVE—a bridge between probability, set oriented numerics, and evolutionary computation. Springer, Cham, pp 201–212Google Scholar
  17. 17.
    Martínez Y, Naredo E, Trujillo L, Galván-López E (2013) Searching for novel regression functions. In: 2013 IEEE congress on evolutionary computation, pp 16–23.  https://doi.org/10.1109/CEC.2013.6557548
  18. 18.
    Martínez Y, Naredo E, Trujillo L, Legrand P, López U (2017) A comparison of fitness-case sampling methods for genetic programming. J Exp Theor Artif Intell 29(6):1203–1224.  https://doi.org/10.1080/0952813X.2017.1328461 CrossRefGoogle Scholar
  19. 19.
    Robilliard D, Fonlupt C (2002) Backwarding: an overfitting control for geneticprogramming in a remote sensing application. In: Collet P, Fonlupt C, Hao JK, Lutton E, Schoenauer M (eds) Artificial evolution. Springer, Berlin, pp 245–254CrossRefGoogle Scholar
  20. 20.
    Schmidt M, Lipson H (2011) Age-fitness Pareto optimization. Springer, New York, pp 129–146.  https://doi.org/10.1007/978-1-4419-7747-2_8 CrossRefGoogle Scholar
  21. 21.
    Sikulova M, Hulva J, Sekanina L (2015) Indirectly encoded fitness predictors coevolved with cartesian programs. In: Machado P, Heywood MI, McDermott J, Castelli M, García-Sánchez P, Burelli P, Risi S, Sim K (eds) Genetic programming. Springer, Cham, pp 113–125CrossRefGoogle Scholar
  22. 22.
    Spector L (2012) Assessment of problem modality by differential performance of lexicase selection in genetic programming: A preliminary report. In: Proceedings of the 14th annual conference companion on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO ’12, pp 401–408,  https://doi.org/10.1145/2330784.2330846,
  23. 23.
    Žegklitz J, Pošík P (2015) Model selection and overfitting in genetic programming: empirical study. In: Proceedings of the companion publication of the 2015 annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO companion ’15, pp 1527–1528.  https://doi.org/10.1145/2739482.2764678,

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceNational University of Computer and Emerging SciencesIslamabadPakistan

Personalised recommendations