Handbook of Quantitative Criminology pp 725-740 | Cite as

# An Introduction to Statistical Learning from a Regression Perspective

## Abstract

Statistical learning is a loose collection of procedures in which key features of the final results are determined inductively. There are clear historical links to exploratory data analysis. There are also clear links to techniques in statistics such as principle components analysis, clustering, and smoothing, and to long-standing concerns in computer science, such as pattern recognition and edge detection. But statistical learning would not exist were it not for recent developments in raw computing power, computer algorithms, and theory from statistics, computer science, and applied mathematics. It can be very computer intensive. Extensive discussions of statistical learning can be found in Hastie et al. (2009) and Bishop (2006). Statistical learning is also sometimes called machine learning or reinforcement learning, especially when discussed in the context of computer science.

## Keywords

Support Vector Machine Random Forest Forecast Error Regression Tree Statistical Learning## Notes

### Acknowledgments

Work on this paper was supported in part by a grant from the National Science Foundation: SES-0437169, “Ensemble Methods for Data Analysis in the Behavioral, Social and Economic Sciences.” That support is gratefully acknowledged.

## References

- Berk RA (2003) Regression analysis: a constructive critique. Sage Publications, Newbury Park, CAGoogle Scholar
- Berk RA (2008) Statistical learning from a regression perspective. Springer, New YorkGoogle Scholar
- Berk RA, Sherman L, Barnes G, Kurtz E, Lindsay A (2009a) Forecasting murder within a population of probationers and parolees: a high stakes application of statistical forecasting. J R Stat Soc Ser A 172(part 1):191–211Google Scholar
- Berk RA, Brown L, Zhao L (2009b) Statistical inference after model selection. Working Paper, Department of Statistics, University of PennsylvaniaGoogle Scholar
- Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
- Breiman L (1996) Bagging predictors. Mach Learn J 26:123–140Google Scholar
- Breiman L (2001a) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
- Breiman L (2001b) Statistical modeling: two cultures (with discussion). Stat Sci 16:199–231CrossRefGoogle Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Press, Monterey, CAGoogle Scholar
- Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Unpublished Manuscript, Department of Statistics, The Wharton School, University of PennsylvaniaGoogle Scholar
- Bühlmann P, Yu B (2006) Sparse boosting. J Mach Learn Res 7:1001–1024Google Scholar
- Cook DR, Weisberg S (1999) Applied regression including computing and graphics. Wiley, New YorkCrossRefGoogle Scholar
- Efron B, Tibshirani R (1993) Introduction to the bootstrap. Chapman & Hall, New YorkGoogle Scholar
- Freedman DA (2004) Graphical models for causation and the identification problem. Eval Rev 28:267–293CrossRefGoogle Scholar
- Freedman DA (2005) Statistical models: theory and practice. Cambridge University Press, CambridgeGoogle Scholar
- Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings for the 13th international conference. Morgan Kaufmann, San Francisco, pp 148–156Google Scholar
- Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:189–1232CrossRefGoogle Scholar
- Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378CrossRefGoogle Scholar
- Friedman JH, Hastie T, Tibsharini R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407CrossRefGoogle Scholar
- Friedman JH, Hastie T, Rosset S, Tibsharini R, Zhu J (2004) Discussion of boosting papers. Ann Stat 32:102–107Google Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
- Holland P (1986) Statistics and causal inference. J Am Stat Assoc 8:945–960CrossRefGoogle Scholar
- Keonker R (2005) Quantile regression. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Kriegler B, Berk RA (2009) Estimating the homeless population in Los Angeles: an application of cost-sensitive stochastic gradient boosting. Working paper, Department of Statistics, UCLAGoogle Scholar
- Leeb H, Pötscher BM (2006) Can one estimate the conditional distribution of post-model-selection estimators? Ann Stat 34(5):2554–2591CrossRefGoogle Scholar
- Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101:578–590CrossRefGoogle Scholar
- Mannor S, Meir R, Zhang T (2002) The consistency of greedy algorithms for classification. In: Kivensen J, Sloan RH (eds) COLT 2002. LNAI, vol 2375. pp 319–333Google Scholar
- McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, New YorkGoogle Scholar
- Mease D, Wyner AJ (2008) Evidence contrary to the statistical view of boosting. J Mach Learn 9:1–26Google Scholar
- Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn 8:409–439Google Scholar
- Morgan SL, Winship C (2007) Counterfactuals and causal inference: methods and principle for social research. Cambridge University Press, CambridgeGoogle Scholar
- Schapire RE (1999) A brief introduction to boosting. In: Proceedings of the 16th international joint conference on artificial intelligenceGoogle Scholar
- Schapire RE (2002) The boosting approach to machine learning: an overview. In: MSRI workshop on non-linear estimation and classificationGoogle Scholar
- Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 25:267–288Google Scholar
- Traskin M (2008) The role of bootstrap sample size in the consistency of the random forest algorithm. Technical Report, Department of Statistics, University of PennsylvaniaGoogle Scholar
- Vapnick V (1996) The nature of statistical learning theory. Springer, New YorkGoogle Scholar
- Wyner AJ (2003) Boosting and exponential loss. In: Bishop CM, Frey BJ (eds) Proceedings of the 9th annual conference on AI and statistics, Jan 3–6, Key West, FLGoogle Scholar
- Zhang T, Yu B (2005) Boosting with early stopping: convergence and consistency. Ann Stat 33(4):1538–1579CrossRefGoogle Scholar