Genetic programming (GP) is a top performer in solving classification and clustering problems, in general and symbolic regression problems, in particular. GP has produced impressive results and has outperformed human generated results for 76 different problems taken from 22 different fields. There remain a number of significant open issues despite its impressive results. Among them are high computational cost, premature convergence and high error rate. These issues must be addressed for GP to realise its full potential. In this paper a simple and cost effective technique called Difficult First Strategy-GP (DFS-GP) is proposed to address the aforementioned problems. The proposed technique involves pre-processing and sampling steps. In the pre-processing step, difficult to evolve data points by GP from the given data set are marked and in the sampling step, they are introduced in the evolutionary run by using two newly defined sampling techniques, called difficult points first and difficulty proportionate selection. These techniques are biased towards selecting difficult data points during the initial stage of a run and of easy points in the latter stage of a run. This ensures that GP does not ignore difficult-to-evolve data points during a run. Experiments have shown that GP coupled with DFS avoids premature convergence and attained higher fitness than standard GP using same fitness evaluations. Performance of the proposed technique was evaluated on three commonly known metrics, which are convergence speed, fitness and variance in the best results. Our results have shown that the proposed setups had achieved 10–15% better fitness values than Standard GP. Furthermore, the proposed setups had consistently generated better quality solutions on all the problems and utilized 30–50% less computations to match the best performance of Standard GP.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Darwaish A, Majeed H, Ali MQ, Rafay A (2017) Dynamic programming inspired genetic programming to solve regression problems. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2017.080463
Dhiman G, Kaur A (2017) Spotted hyena optimizer for solving engineering design problems. In: 2017 international conference on machine learning and data science (MLDS), pp 114–119. https://doi.org/10.1109/MLDS.2017.5
Dhiman G, Kumar V (2018a) Emperor penguin optimizer: a bio-inspired algorithm for engineering problems. Knowl Based Syst 159:20–50. https://doi.org/10.1016/j.knosys.2018.06.001
Dhiman G, Kumar V (2018b) Multi-objective spotted hyena optimizer: a multi-objective optimization algorithm for engineering problems. Knowl Based Syst 150:175–197. https://doi.org/10.1016/j.knosys.2018.03.011
Dhiman G, Kumar V (2019) Spotted hyena optimizer for solving complex and non-linear constrained engineering problems. In: Yadav N, Yadav A, Bansal JC, Deep K, Kim JH (eds) Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 857–867
Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: O’Neill M, Vanneschi L, Gustafson S, Esparcia Alcázar AI, De Falco I, Della Cioppa A, Tarantino E (eds) Genetic programming. Springer, Berlin, pp 266–277
Gathercole C, Ross P (1994) Dynamic training subset selection for supervised learning in genetic programming. In: Davidor Y, Schwefel HP, Männer R (eds) Parallel problem solving from nature—PPSN III. Springer, Berlin, pp 312–321
Giacobini M, Tomassini M, Vanneschi L (2002) Limiting the number of fitness cases in genetic programming using statistics. In: Guervós JJM, Adamidis P, Beyer HG, Schwefel HP, Fernández-Villacañas JL (eds) Parallel problem solving from nature—PPSN VII. Springer, Berlin, pp 371–380
Gonçalves I, Silva S (2013) Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming. Springer, Berlin, pp 73–84
Gonçalves I, Silva S, Melo JB, Carreiras JM (2012) Random sampling technique for overfitting control in genetic programming. In: European conference on genetic programming. Springer, pp 218–229
Kommenda M, Affenzeller M, Burlacu B, Kronberger G, Winkler SM (2014) Genetic programming with data migration for symbolic regression. In: Proceedings of the companion publication of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 1361–1366
La Cava W, Spector L, Danai K (2016) Epsilon-lexicase selection for regression. In: Proceedings of the genetic and evolutionary computation conference 2016, ACM, New York, NY, USA, GECCO ’16, pp 741–748. https://doi.org/10.1145/2908812.2908898,
Lasarczyk CW, Dittrich P, Banzhaf W (2004a) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242. https://doi.org/10.1162/106365604773955157
Lasarczyk CWG, Dittrich P, Banzhaf W (2004b) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242. https://doi.org/10.1162/106365604773955157
Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, ACM, New York, NY, USA, GECCO ’07, pp 1659–1666. https://doi.org/10.1145/1276958.1277286,
Martínez Y, Trujillo L, Naredo E, Legrand P (2014) A comparison offitness-case sampling methods for symbolic regression with genetic programming V. In: Tantar AA, Tantar E, Sun JQ, Zhang W, Ding Q, Schütze O, Emmerich M, Legrand P, Del Moral P, Coello Coello CA (eds) EVOLVE—a bridge between probability, set oriented numerics, and evolutionary computation. Springer, Cham, pp 201–212
Martínez Y, Naredo E, Trujillo L, Galván-López E (2013) Searching for novel regression functions. In: 2013 IEEE congress on evolutionary computation, pp 16–23. https://doi.org/10.1109/CEC.2013.6557548
Martínez Y, Naredo E, Trujillo L, Legrand P, López U (2017) A comparison of fitness-case sampling methods for genetic programming. J Exp Theor Artif Intell 29(6):1203–1224. https://doi.org/10.1080/0952813X.2017.1328461
Robilliard D, Fonlupt C (2002) Backwarding: an overfitting control for geneticprogramming in a remote sensing application. In: Collet P, Fonlupt C, Hao JK, Lutton E, Schoenauer M (eds) Artificial evolution. Springer, Berlin, pp 245–254
Schmidt M, Lipson H (2011) Age-fitness Pareto optimization. Springer, New York, pp 129–146. https://doi.org/10.1007/978-1-4419-7747-2_8
Sikulova M, Hulva J, Sekanina L (2015) Indirectly encoded fitness predictors coevolved with cartesian programs. In: Machado P, Heywood MI, McDermott J, Castelli M, García-Sánchez P, Burelli P, Risi S, Sim K (eds) Genetic programming. Springer, Cham, pp 113–125
Spector L (2012) Assessment of problem modality by differential performance of lexicase selection in genetic programming: A preliminary report. In: Proceedings of the 14th annual conference companion on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO ’12, pp 401–408, https://doi.org/10.1145/2330784.2330846,
Žegklitz J, Pošík P (2015) Model selection and overfitting in genetic programming: empirical study. In: Proceedings of the companion publication of the 2015 annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO companion ’15, pp 1527–1528. https://doi.org/10.1145/2739482.2764678,
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ali, M.Q., Majeed, H. Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming. Evol. Intel. 13, 537–549 (2020). https://doi.org/10.1007/s12065-020-00355-2
- Difficult first strategy
- Genetic programming
- Sampling techniques
- Dataset sampling
- Machine learning