Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Ali, Muhammad Quamber; Majeed, Hammad

doi:10.1007/s12065-020-00355-2

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Research Paper
Published: 14 February 2020

Volume 13, pages 537–549, (2020)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

220 Accesses
Explore all metrics

Abstract

Genetic programming (GP) is a top performer in solving classification and clustering problems, in general and symbolic regression problems, in particular. GP has produced impressive results and has outperformed human generated results for 76 different problems taken from 22 different fields. There remain a number of significant open issues despite its impressive results. Among them are high computational cost, premature convergence and high error rate. These issues must be addressed for GP to realise its full potential. In this paper a simple and cost effective technique called Difficult First Strategy-GP (DFS-GP) is proposed to address the aforementioned problems. The proposed technique involves pre-processing and sampling steps. In the pre-processing step, difficult to evolve data points by GP from the given data set are marked and in the sampling step, they are introduced in the evolutionary run by using two newly defined sampling techniques, called difficult points first and difficulty proportionate selection. These techniques are biased towards selecting difficult data points during the initial stage of a run and of easy points in the latter stage of a run. This ensures that GP does not ignore difficult-to-evolve data points during a run. Experiments have shown that GP coupled with DFS avoids premature convergence and attained higher fitness than standard GP using same fitness evaluations. Performance of the proposed technique was evaluated on three commonly known metrics, which are convergence speed, fitness and variance in the best results. Our results have shown that the proposed setups had achieved 10–15% better fitness values than Standard GP. Furthermore, the proposed setups had consistently generated better quality solutions on all the problems and utilized 30–50% less computations to match the best performance of Standard GP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on genetic algorithm: past, present, and future

Article 31 October 2020

Multi-objective Geometric Mean Optimizer (MOGMO): A Novel Metaphor-Free Population-Based Math-Inspired Multi-objective Algorithm

Article Open access 11 April 2024

Evolutionary algorithms and their applications to engineering problems

Article Open access 16 March 2020

References

Darwaish A, Majeed H, Ali MQ, Rafay A (2017) Dynamic programming inspired genetic programming to solve regression problems. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2017.080463
Article Google Scholar
Dhiman G, Kaur A (2017) Spotted hyena optimizer for solving engineering design problems. In: 2017 international conference on machine learning and data science (MLDS), pp 114–119. https://doi.org/10.1109/MLDS.2017.5
Dhiman G, Kumar V (2018a) Emperor penguin optimizer: a bio-inspired algorithm for engineering problems. Knowl Based Syst 159:20–50. https://doi.org/10.1016/j.knosys.2018.06.001
Article Google Scholar
Dhiman G, Kumar V (2018b) Multi-objective spotted hyena optimizer: a multi-objective optimization algorithm for engineering problems. Knowl Based Syst 150:175–197. https://doi.org/10.1016/j.knosys.2018.03.011
Article Google Scholar
Dhiman G, Kumar V (2019) Spotted hyena optimizer for solving complex and non-linear constrained engineering problems. In: Yadav N, Yadav A, Bansal JC, Deep K, Kim JH (eds) Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 857–867
Chapter Google Scholar
Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: O’Neill M, Vanneschi L, Gustafson S, Esparcia Alcázar AI, De Falco I, Della Cioppa A, Tarantino E (eds) Genetic programming. Springer, Berlin, pp 266–277
Chapter Google Scholar
Gathercole C, Ross P (1994) Dynamic training subset selection for supervised learning in genetic programming. In: Davidor Y, Schwefel HP, Männer R (eds) Parallel problem solving from nature—PPSN III. Springer, Berlin, pp 312–321
Chapter Google Scholar
Giacobini M, Tomassini M, Vanneschi L (2002) Limiting the number of fitness cases in genetic programming using statistics. In: Guervós JJM, Adamidis P, Beyer HG, Schwefel HP, Fernández-Villacañas JL (eds) Parallel problem solving from nature—PPSN VII. Springer, Berlin, pp 371–380
Chapter Google Scholar
Gonçalves I, Silva S (2013) Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming. Springer, Berlin, pp 73–84
Chapter Google Scholar
Gonçalves I, Silva S, Melo JB, Carreiras JM (2012) Random sampling technique for overfitting control in genetic programming. In: European conference on genetic programming. Springer, pp 218–229
Kommenda M, Affenzeller M, Burlacu B, Kronberger G, Winkler SM (2014) Genetic programming with data migration for symbolic regression. In: Proceedings of the companion publication of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 1361–1366
La Cava W, Spector L, Danai K (2016) Epsilon-lexicase selection for regression. In: Proceedings of the genetic and evolutionary computation conference 2016, ACM, New York, NY, USA, GECCO ’16, pp 741–748. https://doi.org/10.1145/2908812.2908898,
Lasarczyk CW, Dittrich P, Banzhaf W (2004a) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242. https://doi.org/10.1162/106365604773955157
Article Google Scholar
Lasarczyk CWG, Dittrich P, Banzhaf W (2004b) Dynamic subset selection based on a fitness case topology. Evolut Comput 12(2):223–242. https://doi.org/10.1162/106365604773955157
Article Google Scholar
Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, ACM, New York, NY, USA, GECCO ’07, pp 1659–1666. https://doi.org/10.1145/1276958.1277286,
Martínez Y, Trujillo L, Naredo E, Legrand P (2014) A comparison offitness-case sampling methods for symbolic regression with genetic programming V. In: Tantar AA, Tantar E, Sun JQ, Zhang W, Ding Q, Schütze O, Emmerich M, Legrand P, Del Moral P, Coello Coello CA (eds) EVOLVE—a bridge between probability, set oriented numerics, and evolutionary computation. Springer, Cham, pp 201–212
Google Scholar
Martínez Y, Naredo E, Trujillo L, Galván-López E (2013) Searching for novel regression functions. In: 2013 IEEE congress on evolutionary computation, pp 16–23. https://doi.org/10.1109/CEC.2013.6557548
Martínez Y, Naredo E, Trujillo L, Legrand P, López U (2017) A comparison of fitness-case sampling methods for genetic programming. J Exp Theor Artif Intell 29(6):1203–1224. https://doi.org/10.1080/0952813X.2017.1328461
Article Google Scholar
Robilliard D, Fonlupt C (2002) Backwarding: an overfitting control for geneticprogramming in a remote sensing application. In: Collet P, Fonlupt C, Hao JK, Lutton E, Schoenauer M (eds) Artificial evolution. Springer, Berlin, pp 245–254
Chapter Google Scholar
Schmidt M, Lipson H (2011) Age-fitness Pareto optimization. Springer, New York, pp 129–146. https://doi.org/10.1007/978-1-4419-7747-2_8
Book Google Scholar
Sikulova M, Hulva J, Sekanina L (2015) Indirectly encoded fitness predictors coevolved with cartesian programs. In: Machado P, Heywood MI, McDermott J, Castelli M, García-Sánchez P, Burelli P, Risi S, Sim K (eds) Genetic programming. Springer, Cham, pp 113–125
Chapter Google Scholar
Spector L (2012) Assessment of problem modality by differential performance of lexicase selection in genetic programming: A preliminary report. In: Proceedings of the 14th annual conference companion on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO ’12, pp 401–408, https://doi.org/10.1145/2330784.2330846,
Žegklitz J, Pošík P (2015) Model selection and overfitting in genetic programming: empirical study. In: Proceedings of the companion publication of the 2015 annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO companion ’15, pp 1527–1528. https://doi.org/10.1145/2739482.2764678,

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan
Muhammad Quamber Ali & Hammad Majeed

Authors

Muhammad Quamber Ali
View author publications
You can also search for this author in PubMed Google Scholar
Hammad Majeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hammad Majeed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, M.Q., Majeed, H. Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming. Evol. Intel. 13, 537–549 (2020). https://doi.org/10.1007/s12065-020-00355-2

Download citation

Received: 21 January 2019
Revised: 16 December 2019
Accepted: 25 January 2020
Published: 14 February 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s12065-020-00355-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

Multi-objective Geometric Mean Optimizer (MOGMO): A Novel Metaphor-Free Population-Based Math-Inspired Multi-objective Algorithm

Evolutionary algorithms and their applications to engineering problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

Multi-objective Geometric Mean Optimizer (MOGMO): A Novel Metaphor-Free Population-Based Math-Inspired Multi-objective Algorithm

Evolutionary algorithms and their applications to engineering problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation