An Overview of Predictive Learning and Function Approximation

  • Jerome H. Friedman
Part of the NATO ASI Series book series (volume 136)


Predictive learning has been traditionally studied in applied mathematics (function approximation), statistics (nonparametric regression), and engineering (pattern recognition). Recently the fields of artificial intelligence (machine learning) and connectionism (neural networks) have emerged, increasing interest in this problem, both in terms of wider application and methodological advances. This paper reviews the underlying principles of many of the practical approaches developed in these fields, with the goal of placing them in a common perspective and providing a unifying overview.


Training Sample Input Space Strength Parameter Parametric Family Target Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1974). A new look at statistical model identification. IEEE Trans. Auto. Control 19 716–723.CrossRefMATHMathSciNetGoogle Scholar
  2. Barron, A. (1984) Predicted squared error: a criterion for automatic model selection. In Self-Organizing Methods in Modeling. S. Farrow, ed., Marcel Dekker, New York.Google Scholar
  3. Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and its Applications. Wiley, New York, NY.CrossRefMATHGoogle Scholar
  4. Bellman, R. E. (1961). Adaptive Control Proceses. Princeton University Press.Google Scholar
  5. Breiman, L. (1991). The II-method for estimating multivariate functions from noisy data. Technometrics 33 125–160.CrossRefMATHMathSciNetGoogle Scholar
  6. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.MATHGoogle Scholar
  7. Breiman L. and Friedman, J. H. (1994). A new approach to multiple outputs through stacking. Stanford University, Department of Statistics, Technical Report LCS114.Google Scholar
  8. Breiman, L. and Spector, P. (1989). Submodel selection and evaluation in regression X random case. Internat. Statist. Rev. (to appear).Google Scholar
  9. Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 317–403.MathSciNetGoogle Scholar
  10. Denker, J. S. and Le Cun, Y. (1991). Transforming neural-net output levels to probability distributions. In Advances in Neural Information Processing Systems 3. Lippmann, Moody, and Touretzky eds. Morgan Kaufman, San Mateo, CA.Google Scholar
  11. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York, NY.MATHGoogle Scholar
  12. Efron, B. (1983). Estimating the error rate of a prediction rule. J. Amer. Statist. Assoc. 78 316–333.CrossRefMATHMathSciNetGoogle Scholar
  13. Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemo-metrics regression tools (with discussion). Technometrics 35 109–148.CrossRefMATHGoogle Scholar
  14. Friedman, J. H. (1985). Classification and multiple response regression through projection pursuit. Stanford University, Department of Statistics, Technical Report LCS012.Google Scholar
  15. Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.CrossRefMATHMathSciNetGoogle Scholar
  16. Friedman, J. H. (1993). Estimating functions of mixed ordinal and categorical variables using adaptive splines. In: New Directions in Statistical Data Analysis and Robustness, Morgenthaler, Ronchetti, and Stahel, eds. BirkhauserGoogle Scholar
  17. Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.CrossRefMathSciNetGoogle Scholar
  18. Furnival, G. M. and Wilson, R. M. (1974). Regression by leaps and bounds. Technometrics 16 499–512.CrossRefMATHGoogle Scholar
  19. Gill, P. E., Murray, W. and Wright, M. H. (1981). Practical Optimization. Academic Press.MATHGoogle Scholar
  20. Girosi, F., Jones, M. and Poggio, T (1993). Priors, stabilizers and basis functions: from regularization to radial, tensor, and additive splines. Massachusetts Institute of Technology Artificial Intelligence Laboratory Technical Report A. I. 1430.Google Scholar
  21. Hastie, T., Buja, A., and Tibshirani, R. (1992). Flexible discriminant analysis. J. Amer. Statist. Assoc. (to appear).Google Scholar
  22. Holland, J. (1975). Adaptation in Artificial and Neural Systems. University of Michigan Press. Ann Arbor, MI.Google Scholar
  23. Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk. USSR 114 953–956 (In Russian).MATHMathSciNetGoogle Scholar
  24. Lippmann, R. (1989). Pattern classification using neural networks. IEEE Communications Magazine 11 47–64.CrossRefGoogle Scholar
  25. Lorentz, G. G. (1986). Approximation of Functions. Chelsea, New York, NY.MATHGoogle Scholar
  26. Mallows, C. L. (1973). Some comments on C p. Technometrics 15 661–675.CrossRefMATHGoogle Scholar
  27. Moody, J. E. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. In Advances in Neural Information Processing Systems 4, Moody, Hanson, and Lippmann, eds., Morgan Kaufmann Publishers, San Mateo, CA.Google Scholar
  28. Ripley, B. D. (1994). Neural networks and related methods for classification (with discussion). J. Roy. Statist. Soc. B 56 (to appear).Google Scholar
  29. Rissanen, Y. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist. 6 416–431.CrossRefMathSciNetGoogle Scholar
  30. Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Rumelhart, McClelland, eds. MIT Press, Cambridge, MA.Google Scholar
  31. Schwartz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.CrossRefMathSciNetGoogle Scholar
  32. Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.CrossRefMATHGoogle Scholar
  33. Weigend, A. S., Huberman, B. A. and Rumelhart, D. (1991). Predicting the future: a connectionist approach. Intl. J. Neural Syst. 1 193–209.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Jerome H. Friedman
    • 1
  1. 1.Department of Statistics and Stanford Linear Accelerator CenterStanford UniversityUSA

Personalised recommendations