Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data

Gonçalves, Ivo; Silva, Sara

doi:10.1007/978-3-642-37207-0_7

Ivo Gonçalves²¹ &
Sara Silva^22,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7831))

Included in the following conference series:

European Conference on Genetic Programming

1687 Accesses
34 Citations
1 Altmetric

Abstract

Generalization is the ability of a model to perform well on cases not seen during the training phase. In Genetic Programming generalization has recently been recognized as an important open issue, and increased efforts are being made towards evolving models that do not overfit. In this work we expand on recent developments that showed that using a small and frequently changing subset of the training data is effective in reducing overfitting and improving generalization. Particularly, we build upon the idea of randomly choosing a single training instance at each generation and balance it with periodically using all training data. The motivation for this approach is based on trying to keep overfitting low (represented by using a single training instance) and still presenting enough information so that a general pattern can be found (represented by using all training data). We propose two approaches called interleaved sampling and random interleaved sampling that respectively represent doing this balancing in a deterministic or a probabilistic way. Experiments are conducted on three high-dimensional real-life datasets on the pharmacokinetics domain. Results show that most of the variants of the proposed approaches are able to consistently improve generalization and reduce overfitting when compared to standard Genetic Programming. The best variants are even able of such improvements on a dataset where a recent and representative state-of-the-art method could not. Furthermore, the resulting models are short and hence easier to interpret, an important achievement from the applications’ point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 413–432 (2007)
Article Google Scholar
Gathercole, C., Ross, P.: Dynamic Training Subset Selection for Supervised Learning in Genetic Programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994)
Chapter Google Scholar
Gonçalves, I., Silva, S.: Experiments on controlling overfitting in genetic programming. In: Proceedings of the 15th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, EPIA 2011 (2011)
Google Scholar
Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random Sampling Technique for Overfitting Control in Genetic Programming. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244, pp. 218–229. Springer, Heidelberg (2012)
Chapter Google Scholar
Kennedy, T.: Managing the drug discovery/development interface. Drug Discovery Today 2(10), 436–444 (1997)
Article Google Scholar
Kola, I., Landis, J.: Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3(8), 711–716 (2004)
Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems), 1st edn. The MIT Press (1992)
Google Scholar
Koza, J.R.: Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines 11(3-4), 251–284 (2010)
Article Google Scholar
Kushchu, I.: An evaluation of evolutionary generalisation in genetic programming. Artif. Intell. Rev. 18, 3–14 (2002)
Article MATH Google Scholar
Langdon, W.B.: Minimising testing in genetic programming. Tech. Rep. RN/11/10, Computer Science, University College London, Gower Street, London WC1E 6BT, UK (2011)
Google Scholar
Liu, Y., Khoshgoftaar, T.: Reducing overfitting in genetic programming models for software quality classification. In: Proceedings of the Eighth IEEE International Conference on High Assurance Systems Engineering, HASE 2004, pp. 56–65. IEEE Computer Society, Washington, DC (2004)
Chapter Google Scholar
O’Neill, M., Vanneschi, L., Gustafson, S., Banzhaf, W.: Open issues in genetic programming. Genetic Programming and Evolvable Machines 11, 339–363 (2010)
Article Google Scholar
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (2008)
Google Scholar
Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10, 141–179 (2009)
Article Google Scholar
Silva, S., Vanneschi, L.: Bloat free genetic programming: Application to human oral bioavailability prediction. International Journal of Data Mining and Bioinformatics 6(6), 585–601 (2012)
Google Scholar
Vanneschi, L., Silva, S.: Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 65–76. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
Ivo Gonçalves & Sara Silva
INESC-ID Lisboa, IST, Technical University of Lisbon, Portugal
Sara Silva

Authors

Ivo Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Sara Silva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Science, Poznan University of Technology, Piotowo 2, 60-965, Poznań, Poland
Krzysztof Krawiec
School of Computer Science, The University of Birmingham, B15 2TT, Edgbaston, Birmingham, UK
Alberto Moraglio
Geisel School of Medicine, Dartmouth College, 03755, Hanover, NH, USA
Ting Hu
Department of Computer Engineering, Istanbul Technical University, 34469, Masla, Istanbul, Turkey
A. Şima Etaner-Uyar
Institute of Computer Graphics and Algorithms, Vienna University of Technology, 1040, Vienna, Austria
Bin Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonçalves, I., Silva, S. (2013). Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds) Genetic Programming. EuroGP 2013. Lecture Notes in Computer Science, vol 7831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37207-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-37207-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37206-3
Online ISBN: 978-3-642-37207-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics