Validation Sets, Genetic Programming and Generalisation

Fitzgerald, Jeannie; Ryan, Conor

doi:10.1007/978-1-4471-2318-7_6

Jeannie Fitzgerald⁴ &
Conor Ryan⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

648 Accesses

Abstract

This paper investigates a new application of a validation set when using a three data set methodology with Genetic Programming (GP). Our system uses Validation Pressure combined with Validation Elitism to influence fitness evaluation and population structure with the aim of improving the system’s ability to evolve individuals with an enhanced capacity for generalisation. This strategy facilitates the use of a validation set to reduce over-fitting while mitigating the loss of training data associated with traditional methods employing a validation set.

The method is tested on five benchmark binary classification data sets and results obtained suggest that the strategy can deliver improved generalisation on unseen test data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. M. A. Azad and C. Ryan. Abstract functions and lifetime learning in genetic programming for symbolic regression. In J. Branke, M. Pelikan, E. Alba, D. V. Arnold, J. Bongard, A. Brabazon, J. Branke, M. V. Butz, J. Clune, M. Cohen, K. Deb, A. P. Engelbrecht, N. Krasnogor, J. F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. van Hemert, L. Vanneschi, and C.Witt, editors, GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 893–900, Portland, Oregon, USA, 7-11 July 2010. ACM.
Google Scholar
K. Badran and P. I. Rockett. The influence of mutation on population dynamics in multiobjective genetic programming. Genetic Programming and Evolvable Machines, 11(1):5–33, Mar. 2010.
Article Google Scholar
B. Baesens, M. Egmont-Petersen, R. Castelo, and J. Vanthienen. Learning bayesian network classifiers for credit scoring using markov chain monte carlo search. In Proceedings of the 16 th International Conference on Pattern Recognition (ICPR’02) Volume 3 - Volume 3, ICPR ’02, pages 30049–, Washington, DC, USA, 2002. IEEE Computer Society.
Google Scholar
B. Cetisli. Development of an adaptive neuro-fuzzy classifier using linguistic hedges: Part 1. Expert Syst. Appl., 37:6093–6101, August 2010.
Article Google Scholar
D. Costelloe and C. Ryan. On improving generalisation in genetic programming. In L. Vanneschi, S. Gustafson, A. Moraglio, I. De Falco, and M. Ebner, editors, Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, volume 5481 of LNCS, pages 61–72, Tuebingen, Apr. 15-17 2009. Springer.
Google Scholar
M. Darwiche, M. Feuilloy, G. Bousaleh, and D. Schang. Prediction of blood transfusion donation. In Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on, pages 51 –56, may 2010.
Google Scholar
J. Eggermont, J. N. Kok, and W. A. Kosters. Genetic programming for data classification: Partitioning the search space. In Proceedings of the 2004 Symposium on Applied Computing (ACM SAC’04), pages 1001–1005, Nicosia, Cyprus, 14-17 Mar. 2004.
Google Scholar
J. Fitzgerald and C. Ryan. Drawing boundaries: using individual evolved class boundaries for binary classification problems. In N. Krasnogor and P. L. Lanzi, editors, GECCO, pages 1347–1354. ACM, 2011.
Google Scholar
N. Foreman and M. Evett. Preventing overfitting in GP with canary functions. In H.-G. Beyer, U.-M. O’Reilly, D. V. Arnold, W. Banzhaf, C. Blum, E. W. Bonabeau, E. Cantu-Paz, D. Dasgupta, K. Deb, J. A. Foster, E. D. de Jong, H. Lipson, X. Llora, S. Mancoridis, M. Pelikan, G. R. Raidl, T. Soule, A. M. Tyrrell, J.-P. Watson, and E. Zitzler, editors, GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, volume 2, pages 1779–1780, Washington DC, USA, 25-29 June 2005. ACM Press.
Google Scholar
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
Google Scholar
C. Gagné and M. Parizeau. Open beagle: A new c++ evolutionary computation framework. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 888–, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
Google Scholar
C. Gagné, M. Schoenauer, M. Parizeau, and M. Tomassini. Genetic programming, validation sets, and parsimony pressure. In P. Collet, M. Tomassini, M. Ebner, S. Gustafson, and A. Ekárt, editors, Proceedings of the 9th European Conference on Genetic Programming, volume 3905 of Lecture Notes in Computer Science, pages 109–120, Budapest, Hungary, 10 - 12 Apr. 2006. Springer.
Google Scholar
H. Jabeen and A. Baig. A Framework for Optimization of Genetic Programming Evolved Classifier Expressions Using Particle Swarm Optimization. In a. n. u. e. l. GraÃa, Romay, E. Corchado, and Garcia, Sebastian, editors, Hybrid Artificial Intelligence Systems, volume 6076 of Lecture Notes in Computer Science, chapter 7, pages 56–63–63. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010.
Google Scholar
M. Johnston, T. Liddle, and M. Zhang. A linear regression approach to numerical simplification in tree-based genetic programming. Research report 09-7, School of Mathematics Statistics and Operations Research, Victoria University of Wellington, New Zealand, 14 Dec. 2009.
Google Scholar
J. R. Koza. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical report, 1990.
Google Scholar
I. Kushchu. Genetic programming and evolutionary generalization. IEEE Transactions on Evolutionary Computation, 6(5):431–442, Oct. 2002.
Article Google Scholar
T.-S. Lim, W.-Y. LOH, and W. Cohen. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, 2000.
Google Scholar
Y. Liu and T. M. Khoshgoftaar. Genetic programming model for software quality classification. In Sixth IEEE International Symposium on High Assurance Systems Engineering, HASE’01, pages 127–136, Boco Raton, FL, USA, Oct. 22-24 2001. IEEE.
Google Scholar
T. Loveard and V. Ciesielski. Representing classification problems in genetic programming. In Proceedings of the Congress on Evolutionary Computation, volume 2, pages 1070–1077, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 May 2001. IEEE Press.
Google Scholar
S. Luke and L. Panait. Lexicographic parsimony pressure. In W. B. Langdon, E. Cantú-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke, and N. Jonoska, editors, GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 829–836, New York, 9-13 July 2002. Morgan Kaufmann Publishers.
Google Scholar
D. Meyer. The support vector machine under test. Neurocomputing, 55(1-2):169–186, Sept. 2003.
Article Google Scholar
J. F. Miller and P. Thomson. Aspects of digital evolution: Geometry and learning. In Proceedings of the Second International Conference on Evolvable Systems, pages 25–35. Springer- Verlag, 1998.
Google Scholar
D. P. Muni, N. R. Pal, and J. Das. A novel approach to design classifier using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183–196, Apr. 2004.
Article Google Scholar
D. Parrott, X. Li, and V. Ciesielski. Multi-objective techniques in genetic programming for evolving classifiers. In D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T. K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J. J. M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L. G. Volkert, D. Ashlock, and M. Schoenauer, editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 2, pages 1141–1148, Edinburgh, UK, 2-5 Sept. 2005. IEEE Press.
Google Scholar
K. Polat and S. Günes¸. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis and fft method based new hybrid automated identification system for classification of eeg signals. Expert Syst. Appl., 34:2039–2048, April 2008.
Article Google Scholar
R. Poli, N. F. McPhee, and L. Vanneschi. Elitism reduces bloat in genetic programming. In M. Keijzer, G. Antoniol, C. B. Congdon, K. Deb, B. Doerr, N. Hansen, J. H. Holmes, G. S. Hornby, D. Howard, J. Kennedy, S. Kumar, F. G. Lobo, J. F. Miller, J. Moore, F. Neumann, M. Pelikan, J. Pollack, K. Sastry, K. Stanley, A. Stoica, E.-G. Talbi, and I. Wegener, editors, GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1343–1344, Atlanta, GA, USA, 12-16 July 2008. ACM.
Google Scholar
D. Robilliard and C. Fonlupt. Backwarding : An overfitting control for genetic programming in a remote sensing application. In P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and M. Schoenauer, editors, Artificial Evolution 5th International Conference, Evolution Artificielle, EA 2001, volume 2310 of LNCS, pages 245–254, Creusot, France, Oct. 29-31 2001. Springer Verlag.
Google Scholar
A. Thammano and J. Moolwong. Classification algorithm based on human social behavior. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology, pages 105–109, Washington, DC, USA, 2007. IEEE Computer Society.
Google Scholar
J. D. Thomas and K. Sycara. The importance of simplicity and validation in genetic programming for data mining in financial data. In A. A. Freitas, editor, Data Mining with Evolutionary Algorithms: Research Directions, pages 7–11, Orlando, Florida, 18 July 1999. AAAI Press. Technical Report WS-99-06.
Google Scholar
C. Tuite, A. Agapitos, M. O’Neill, and A. Brabazon. A preliminary investigation of overfitting in evolutionary driven model induction: Implications for financial modelling. In C. Di Chio, A. Brabazon, G. Di Caro, R. Drechsler, M. Ebner, M. Farooq, J. Grahl, G. Greenfield, C. Prins, J. Romero, G. Squillero, E. Tarantino, A. G. B. Tettamanzi, N. Urquhart, and A. S. Uyar, editors, Applications of Evolutionary Computing, EvoApplications 2011: EvoCOMNET, EvoFIN, EvoHOT, EvoMUSART, EvoSTIM, EvoTRANSLOG, volume 6625 of LNCS, pages 121–130, Turin, Italy, 27-29 Apr. 2011. Springer Verlag.
Google Scholar
L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO ’10, pages 877–884, New York, NY, USA, 2010. ACM.
Google Scholar
S. M. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis - an empirical study. In S. L. Smith, S. Cagnoni, and J. van Hemert, editors, MedGEC 2006 GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, Seattle, WA, USA, 8 July 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

BDS Group, CSIS Department, University of Limerick, Limerick, Ireland
Jeannie Fitzgerald & Conor Ryan

Authors

Jeannie Fitzgerald
View author publications
You can also search for this author in PubMed Google Scholar
Conor Ryan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeannie Fitzgerald .

Editor information

Editors and Affiliations

University of Portsmouth, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Max Bramer
School of Computing &, Mathematical Sciences, University of Greenwich, Park Row 30, London, SE10 9LS, United Kingdom
Miltos Petridis
, School of Computing and Informatics, Nottingham Trent University, Burton Street, Nottingham, NG1 4BU, United Kingdom
Lars Nolle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fitzgerald, J., Ryan, C. (2011). Validation Sets, Genetic Programming and Generalisation. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_6

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2318-7_6
Published: 14 October 2011
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2317-0
Online ISBN: 978-1-4471-2318-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics