Penalty Functions for Genetic Programming Algorithms
Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by this situation we review the problem of Symbolic Regression under the perspective of Machine Learning, a well founded mathematical toolbox for predictive learning. We perform empirical comparisons between classical statistical methods (AIC and BIC) and methods based on Vapnik-Chrevonenkis (VC) theory for regression problems under genetic training. Empirical comparisons of the different methods suggest practical advantages of VC-based model selection. We conclude that VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable. As main practical advantage, precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.
KeywordsGenetic Programming Symbolic Regression Inductive Learning Regression Model selection genetic programming symbolic regression
Unable to display preview. Download preview PDF.
- 2.Amil, N.M., Bredeche, N., Gagné, C., Gelly, S., Schoenauer, M., Teytaud, O.: A statistical learning perspective of genetic programming. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 327–338. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 6.Montaña, J.L.: Vcd bounds for some gp genotypes. In: ECAI, pp. 167–171 (2008)Google Scholar
- 7.Montaña, J.L., Alonso, C.L., Borges, C.E., Crespo, J.L.: Adaptation, performance and vapnik-chervonenkis dimension of straight line programs. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 315–326. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 9.Teytaud, O., Gelly, S., Bredeche, N., Schoenauer, M.: Statistical Learning Theory Approach of Bloat. In: Proceedings of the 2005 conference on Genetic and Evolutionary Computation, pp. 1784–1785 (2005)Google Scholar