Advertisement

A Statistical Learning Perspective of Genetic Programming

  • Nur Merve Amil
  • Nicolas Bredeche
  • Christian Gagné
  • Sylvain Gelly
  • Marc Schoenauer
  • Olivier Teytaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5481)

Abstract

This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

Keywords

Genetic Programming Complexity Level Generalization Error Structural Risk Minimization Symbolic Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: an introduction. Morgan Kaufmann Publisher Inc., San Francisco (1998)CrossRefzbMATHGoogle Scholar
  2. 2.
    Bleuler, S., Brack, M., Thiele, L., Zitzler, E.: Multiobjective genetic programming: Reducing bloat using SPEA2. In: Proceedings of the 2001 Congress on Evolutionary Computation CEC 2001, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, pp. 536–543. IEEE Press, Los Alamitos (2001)Google Scholar
  3. 3.
    Blickle, T., Thiele, L.: Genetic programming and redundancy. In: Hopf, J. (ed.) Genetic Algorithms Workshop at KI 1994, pp. 33–38. Max-Planck-Institut für Informatik (1994)Google Scholar
  4. 4.
    Daida, J.M., Bertram, R.R., Stanhope, S.A., Khoo, J.C., Chaudhary, S.A., Chaudhri, O.A., Polito II, J.A.: What makes a problem GP-Hard? Analysis of a tunably difficult problem in genetic programming. Genetic Programming and Evolvable Machines 2(2), 165–191 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    De Jong, E.D., Watson, R.A., Pollack, J.B.: Reducing bloat and promoting diversity using multi-objective methods. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2001, pp. 11–18. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  6. 6.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1997)zbMATHGoogle Scholar
  7. 7.
    Ekart, A., Nemeth, S.: Maintaining the diversity of genetic programs. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 162–171. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Gagné, C., Parizeau, M.: Genericity in evolutionary computation software tools: Principles and case study. International Journal on Artificial Intelligence Tools 15(2), 173–194 (2006)CrossRefGoogle Scholar
  9. 9.
    Gustafson, S., Ekart, A., Burke, E., Kendall, G.: Problem difficulty and code growth in genetic programming. Genetic Programming and Evolvable Machines 4(3), 271–290 (2004)CrossRefGoogle Scholar
  10. 10.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  11. 11.
    Langdon, W.B.: The evolution of size in variable length representations. In: IEEE International Congress on Evolutionary Computations (ICEC 1998), pp. 633–638. IEEE Press, Los Alamitos (1998)Google Scholar
  12. 12.
    Langdon, W.B.: Size fair and homologous tree genetic programming crossovers. Genetic Programming And Evolvable Machines 1(1/2), 95–119 (2000)CrossRefzbMATHGoogle Scholar
  13. 13.
    Langdon, W.B., Poli, R.: Fitness causes bloat: Mutation. In: Late Breaking Papers at GP 1997, pp. 132–140. Stanford Bookstore (1997)Google Scholar
  14. 14.
    Langdon, W.B., Soule, T., Poli, R., Foster, J.A.: The evolution of size and shape. In: Advances in Genetic Programming III, pp. 163–190. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Luke, S., Panait, L.: Lexicographic parsimony pressure. In: GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 829–836. Morgan Kaufmann Publishers, San Francisco (2002)Google Scholar
  16. 16.
    McPhee, N.F., Miller, J.D.: Accurate replication in genetic programming. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 303–309. Morgan Kaufmann, San Francisco (1995)Google Scholar
  17. 17.
    Nordin, P., Banzhaf, W.: Complexity compression and evolution. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 310–317. Morgan Kaufmann, San Francisco (1995)Google Scholar
  18. 18.
    Ratle, A., Sebag, M.: Avoiding the bloat with probabilistic grammar-guided genetic programming. In: Artificial Evolution VI. Springer, Heidelberg (2001)Google Scholar
  19. 19.
    Silva, S., Almeida, J.: Dynamic maximum tree depth: A simple technique for avoiding bloat in tree-based GP. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1776–1787. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Silva, S., Costa, E.: Dynamic limits for bloat control: Variations on size and depth. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 666–677. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Soule, T.: Exons and code growth in genetic programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 142–151. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Soule, T., Foster, J.A.: Effects of code growth and parsimony pressure on populations in genetic programming. Evolutionary Computation 6(4), 293–309 (1998)CrossRefGoogle Scholar
  23. 23.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)CrossRefzbMATHGoogle Scholar
  24. 24.
    Zhang, B.-T., Mühlenbein, H.: Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3(1) (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nur Merve Amil
    • 1
  • Nicolas Bredeche
    • 1
  • Christian Gagné
    • 1
  • Sylvain Gelly
    • 1
  • Marc Schoenauer
    • 1
  • Olivier Teytaud
    • 1
  1. 1.TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, QubecCanada

Personalised recommendations