Validation Sets, Genetic Programming and Generalisation

  • Jeannie Fitzgerald
  • Conor Ryan
Conference paper


This paper investigates a new application of a validation set when using a three data set methodology with Genetic Programming (GP). Our system uses Validation Pressure combined with Validation Elitism to influence fitness evaluation and population structure with the aim of improving the system’s ability to evolve individuals with an enhanced capacity for generalisation. This strategy facilitates the use of a validation set to reduce over-fitting while mitigating the loss of training data associated with traditional methods employing a validation set.

The method is tested on five benchmark binary classification data sets and results obtained suggest that the strategy can deliver improved generalisation on unseen test data.


Genetic Programming Evolutionary Computation Symbolic Regression German Credit Unseen Test Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R. M. A. Azad and C. Ryan. Abstract functions and lifetime learning in genetic programming for symbolic regression. In J. Branke, M. Pelikan, E. Alba, D. V. Arnold, J. Bongard, A. Brabazon, J. Branke, M. V. Butz, J. Clune, M. Cohen, K. Deb, A. P. Engelbrecht, N. Krasnogor, J. F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. van Hemert, L. Vanneschi, and C.Witt, editors, GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 893–900, Portland, Oregon, USA, 7-11 July 2010. ACM.Google Scholar
  2. 2.
    K. Badran and P. I. Rockett. The influence of mutation on population dynamics in multiobjective genetic programming. Genetic Programming and Evolvable Machines, 11(1):5–33, Mar. 2010.CrossRefGoogle Scholar
  3. 3.
    B. Baesens, M. Egmont-Petersen, R. Castelo, and J. Vanthienen. Learning bayesian network classifiers for credit scoring using markov chain monte carlo search. In Proceedings of the 16 th International Conference on Pattern Recognition (ICPR’02) Volume 3 - Volume 3, ICPR ’02, pages 30049–, Washington, DC, USA, 2002. IEEE Computer Society.Google Scholar
  4. 4.
    B. Cetisli. Development of an adaptive neuro-fuzzy classifier using linguistic hedges: Part 1. Expert Syst. Appl., 37:6093–6101, August 2010.CrossRefGoogle Scholar
  5. 5.
    D. Costelloe and C. Ryan. On improving generalisation in genetic programming. In L. Vanneschi, S. Gustafson, A. Moraglio, I. De Falco, and M. Ebner, editors, Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, volume 5481 of LNCS, pages 61–72, Tuebingen, Apr. 15-17 2009. Springer.Google Scholar
  6. 6.
    M. Darwiche, M. Feuilloy, G. Bousaleh, and D. Schang. Prediction of blood transfusion donation. In Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on, pages 51 –56, may 2010.Google Scholar
  7. 7.
    J. Eggermont, J. N. Kok, and W. A. Kosters. Genetic programming for data classification: Partitioning the search space. In Proceedings of the 2004 Symposium on Applied Computing (ACM SAC’04), pages 1001–1005, Nicosia, Cyprus, 14-17 Mar. 2004.Google Scholar
  8. 8.
    J. Fitzgerald and C. Ryan. Drawing boundaries: using individual evolved class boundaries for binary classification problems. In N. Krasnogor and P. L. Lanzi, editors, GECCO, pages 1347–1354. ACM, 2011.Google Scholar
  9. 9.
    N. Foreman and M. Evett. Preventing overfitting in GP with canary functions. In H.-G. Beyer, U.-M. O’Reilly, D. V. Arnold, W. Banzhaf, C. Blum, E. W. Bonabeau, E. Cantu-Paz, D. Dasgupta, K. Deb, J. A. Foster, E. D. de Jong, H. Lipson, X. Llora, S. Mancoridis, M. Pelikan, G. R. Raidl, T. Soule, A. M. Tyrrell, J.-P. Watson, and E. Zitzler, editors, GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, volume 2, pages 1779–1780, Washington DC, USA, 25-29 June 2005. ACM Press.Google Scholar
  10. 10.
    A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google Scholar
  11. 11.
    C. Gagné and M. Parizeau. Open beagle: A new c++ evolutionary computation framework. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 888–, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.Google Scholar
  12. 12.
    C. Gagné, M. Schoenauer, M. Parizeau, and M. Tomassini. Genetic programming, validation sets, and parsimony pressure. In P. Collet, M. Tomassini, M. Ebner, S. Gustafson, and A. Ekárt, editors, Proceedings of the 9th European Conference on Genetic Programming, volume 3905 of Lecture Notes in Computer Science, pages 109–120, Budapest, Hungary, 10 - 12 Apr. 2006. Springer.Google Scholar
  13. 13.
    H. Jabeen and A. Baig. A Framework for Optimization of Genetic Programming Evolved Classifier Expressions Using Particle Swarm Optimization. In a. n. u. e. l. GraÃa, Romay, E. Corchado, and Garcia, Sebastian, editors, Hybrid Artificial Intelligence Systems, volume 6076 of Lecture Notes in Computer Science, chapter 7, pages 56–63–63. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010.Google Scholar
  14. 14.
    M. Johnston, T. Liddle, and M. Zhang. A linear regression approach to numerical simplification in tree-based genetic programming. Research report 09-7, School of Mathematics Statistics and Operations Research, Victoria University of Wellington, New Zealand, 14 Dec. 2009.Google Scholar
  15. 15.
    J. R. Koza. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical report, 1990.Google Scholar
  16. 16.
    I. Kushchu. Genetic programming and evolutionary generalization. IEEE Transactions on Evolutionary Computation, 6(5):431–442, Oct. 2002.CrossRefGoogle Scholar
  17. 17.
    T.-S. Lim, W.-Y. LOH, and W. Cohen. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, 2000.Google Scholar
  18. 18.
    Y. Liu and T. M. Khoshgoftaar. Genetic programming model for software quality classification. In Sixth IEEE International Symposium on High Assurance Systems Engineering, HASE’01, pages 127–136, Boco Raton, FL, USA, Oct. 22-24 2001. IEEE.Google Scholar
  19. 19.
    T. Loveard and V. Ciesielski. Representing classification problems in genetic programming. In Proceedings of the Congress on Evolutionary Computation, volume 2, pages 1070–1077, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 May 2001. IEEE Press.Google Scholar
  20. 20.
    S. Luke and L. Panait. Lexicographic parsimony pressure. In W. B. Langdon, E. Cantú-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke, and N. Jonoska, editors, GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 829–836, New York, 9-13 July 2002. Morgan Kaufmann Publishers.Google Scholar
  21. 21.
    D. Meyer. The support vector machine under test. Neurocomputing, 55(1-2):169–186, Sept. 2003.CrossRefGoogle Scholar
  22. 22.
    J. F. Miller and P. Thomson. Aspects of digital evolution: Geometry and learning. In Proceedings of the Second International Conference on Evolvable Systems, pages 25–35. Springer- Verlag, 1998.Google Scholar
  23. 23.
    D. P. Muni, N. R. Pal, and J. Das. A novel approach to design classifier using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183–196, Apr. 2004.CrossRefGoogle Scholar
  24. 24.
    D. Parrott, X. Li, and V. Ciesielski. Multi-objective techniques in genetic programming for evolving classifiers. In D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T. K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J. J. M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L. G. Volkert, D. Ashlock, and M. Schoenauer, editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 2, pages 1141–1148, Edinburgh, UK, 2-5 Sept. 2005. IEEE Press.Google Scholar
  25. 25.
    K. Polat and S. Günes¸. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis and fft method based new hybrid automated identification system for classification of eeg signals. Expert Syst. Appl., 34:2039–2048, April 2008.CrossRefGoogle Scholar
  26. 26.
    R. Poli, N. F. McPhee, and L. Vanneschi. Elitism reduces bloat in genetic programming. In M. Keijzer, G. Antoniol, C. B. Congdon, K. Deb, B. Doerr, N. Hansen, J. H. Holmes, G. S. Hornby, D. Howard, J. Kennedy, S. Kumar, F. G. Lobo, J. F. Miller, J. Moore, F. Neumann, M. Pelikan, J. Pollack, K. Sastry, K. Stanley, A. Stoica, E.-G. Talbi, and I. Wegener, editors, GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1343–1344, Atlanta, GA, USA, 12-16 July 2008. ACM.Google Scholar
  27. 27.
    D. Robilliard and C. Fonlupt. Backwarding : An overfitting control for genetic programming in a remote sensing application. In P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and M. Schoenauer, editors, Artificial Evolution 5th International Conference, Evolution Artificielle, EA 2001, volume 2310 of LNCS, pages 245–254, Creusot, France, Oct. 29-31 2001. Springer Verlag.Google Scholar
  28. 28.
    A. Thammano and J. Moolwong. Classification algorithm based on human social behavior. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology, pages 105–109, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  29. 29.
    J. D. Thomas and K. Sycara. The importance of simplicity and validation in genetic programming for data mining in financial data. In A. A. Freitas, editor, Data Mining with Evolutionary Algorithms: Research Directions, pages 7–11, Orlando, Florida, 18 July 1999. AAAI Press. Technical Report WS-99-06.Google Scholar
  30. 30.
    C. Tuite, A. Agapitos, M. O’Neill, and A. Brabazon. A preliminary investigation of overfitting in evolutionary driven model induction: Implications for financial modelling. In C. Di Chio, A. Brabazon, G. Di Caro, R. Drechsler, M. Ebner, M. Farooq, J. Grahl, G. Greenfield, C. Prins, J. Romero, G. Squillero, E. Tarantino, A. G. B. Tettamanzi, N. Urquhart, and A. S. Uyar, editors, Applications of Evolutionary Computing, EvoApplications 2011: EvoCOMNET, EvoFIN, EvoHOT, EvoMUSART, EvoSTIM, EvoTRANSLOG, volume 6625 of LNCS, pages 121–130, Turin, Italy, 27-29 Apr. 2011. Springer Verlag.Google Scholar
  31. 31.
    L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO ’10, pages 877–884, New York, NY, USA, 2010. ACM.Google Scholar
  32. 32.
    S. M. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis - an empirical study. In S. L. Smith, S. Cagnoni, and J. van Hemert, editors, MedGEC 2006 GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, Seattle, WA, USA, 8 July 2006.Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.BDS Group, CSIS DepartmentUniversity of LimerickLimerickIreland

Personalised recommendations