An Introduction to Statistical Learning from a Regression Perspective

  • Richard Berk


Statistical learning is a loose collection of procedures in which key features of the final results are determined inductively. There are clear historical links to exploratory data analysis. There are also clear links to techniques in statistics such as principle components analysis, clustering, and smoothing, and to long-standing concerns in computer science, such as pattern recognition and edge detection. But statistical learning would not exist were it not for recent developments in raw computing power, computer algorithms, and theory from statistics, computer science, and applied mathematics. It can be very computer intensive. Extensive discussions of statistical learning can be found in Hastie et al. (2009) and Bishop (2006). Statistical learning is also sometimes called machine learning or reinforcement learning, especially when discussed in the context of computer science.


Support Vector Machine Random Forest Forecast Error Regression Tree Statistical Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Work on this paper was supported in part by a grant from the National Science Foundation: SES-0437169, “Ensemble Methods for Data Analysis in the Behavioral, Social and Economic Sciences.” That support is gratefully acknowledged.


  1. Berk RA (2003) Regression analysis: a constructive critique. Sage Publications, Newbury Park, CAGoogle Scholar
  2. Berk RA (2008) Statistical learning from a regression perspective. Springer, New YorkGoogle Scholar
  3. Berk RA, Sherman L, Barnes G, Kurtz E, Lindsay A (2009a) Forecasting murder within a population of probationers and parolees: a high stakes application of statistical forecasting. J R Stat Soc Ser A 172(part 1):191–211Google Scholar
  4. Berk RA, Brown L, Zhao L (2009b) Statistical inference after model selection. Working Paper, Department of Statistics, University of PennsylvaniaGoogle Scholar
  5. Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  6. Breiman L (1996) Bagging predictors. Mach Learn J 26:123–140Google Scholar
  7. Breiman L (2001a) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  8. Breiman L (2001b) Statistical modeling: two cultures (with discussion). Stat Sci 16:199–231CrossRefGoogle Scholar
  9. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Press, Monterey, CAGoogle Scholar
  10. Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Unpublished Manuscript, Department of Statistics, The Wharton School, University of PennsylvaniaGoogle Scholar
  11. Bühlmann P, Yu B (2006) Sparse boosting. J Mach Learn Res 7:1001–1024Google Scholar
  12. Cook DR, Weisberg S (1999) Applied regression including computing and graphics. Wiley, New YorkCrossRefGoogle Scholar
  13. Efron B, Tibshirani R (1993) Introduction to the bootstrap. Chapman & Hall, New YorkGoogle Scholar
  14. Freedman DA (2004) Graphical models for causation and the identification problem. Eval Rev 28:267–293CrossRefGoogle Scholar
  15. Freedman DA (2005) Statistical models: theory and practice. Cambridge University Press, CambridgeGoogle Scholar
  16. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings for the 13th international conference. Morgan Kaufmann, San Francisco, pp 148–156Google Scholar
  17. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:189–1232CrossRefGoogle Scholar
  18. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378CrossRefGoogle Scholar
  19. Friedman JH, Hastie T, Tibsharini R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407CrossRefGoogle Scholar
  20. Friedman JH, Hastie T, Rosset S, Tibsharini R, Zhu J (2004) Discussion of boosting papers. Ann Stat 32:102–107Google Scholar
  21. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  22. Holland P (1986) Statistics and causal inference. J Am Stat Assoc 8:945–960CrossRefGoogle Scholar
  23. Keonker R (2005) Quantile regression. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  24. Kriegler B, Berk RA (2009) Estimating the homeless population in Los Angeles: an application of cost-sensitive stochastic gradient boosting. Working paper, Department of Statistics, UCLAGoogle Scholar
  25. Leeb H, Pötscher BM (2006) Can one estimate the conditional distribution of post-model-selection estimators? Ann Stat 34(5):2554–2591CrossRefGoogle Scholar
  26. Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101:578–590CrossRefGoogle Scholar
  27. Mannor S, Meir R, Zhang T (2002) The consistency of greedy algorithms for classification. In: Kivensen J, Sloan RH (eds) COLT 2002. LNAI, vol 2375. pp 319–333Google Scholar
  28. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, New YorkGoogle Scholar
  29. Mease D, Wyner AJ (2008) Evidence contrary to the statistical view of boosting. J Mach Learn 9:1–26Google Scholar
  30. Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn 8:409–439Google Scholar
  31. Morgan SL, Winship C (2007) Counterfactuals and causal inference: methods and principle for social research. Cambridge University Press, CambridgeGoogle Scholar
  32. Schapire RE (1999) A brief introduction to boosting. In: Proceedings of the 16th international joint conference on artificial intelligenceGoogle Scholar
  33. Schapire RE (2002) The boosting approach to machine learning: an overview. In: MSRI workshop on non-linear estimation and classificationGoogle Scholar
  34. Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 25:267–288Google Scholar
  35. Traskin M (2008) The role of bootstrap sample size in the consistency of the random forest algorithm. Technical Report, Department of Statistics, University of PennsylvaniaGoogle Scholar
  36. Vapnick V (1996) The nature of statistical learning theory. Springer, New YorkGoogle Scholar
  37. Wyner AJ (2003) Boosting and exponential loss. In: Bishop CM, Frey BJ (eds) Proceedings of the 9th annual conference on AI and statistics, Jan 3–6, Key West, FLGoogle Scholar
  38. Zhang T, Yu B (2005) Boosting with early stopping: convergence and consistency. Ann Stat 33(4):1538–1579CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Richard Berk
    • 1
  1. 1.Department of StatisticsThe Wharton School, University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations