A Conceptual Introduction to Classification and Forecasting

Berk, Richard

doi:10.1007/978-3-030-02272-3_3

Richard Berk²

1563 Accesses

Abstract

Because the criminal justice outcomes to be forecast are usually categorical (e.g., fail or not), this chapter considers crime forecasting as a classification problem. The goal is to assign classes to cases. There may be two classes or more than two. Machine learning is broadly considered before turning in later chapters to random forests as a preferred forecasting tool. There is no use of models and at best a secondary interest in explanation. Machine learning is based on algorithms, which should not be confused with models. The material is introduced in a conceptual manner with almost no mathematics. Nevertheless, some readers may find the material challenging because a certain amount of statistical maturity must be assumed. Later chapters will use somewhat more formal expositional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An indicator variable here implies that the relationship with the outcome variable is a spike function. The regression function, therefore, is a linear combination of spike functions. If an intercept is, as usual, included in the regression, one of the indicators would need to be dropped. Otherwise, the regressor cross-product matrix would be singular, and there would be no unique solution from which to obtain estimates of the regression coefficients.
2.
Stagewise should not be confused with stepwise, as in stepwise regression. In stepwise regression, all of the regression coefficients from the previous step are re-estimated as the next predictor is dropped (i.e., backward selection) or added (i.e. forward selection). Were there a procedure called stagewise regression, the earlier regression coefficients would not be re-estimated.
3.
Technically, a Bayes classifier choses the class that has the largest probability. We are choosing the class with the largest proportion. To get from proportions to probabilities depends on how the data were generated. We are not there yet.
4.
Classification trees is a special case of classification and regression trees (CART). A regression tree uses recursive partitioning with a quantitative outcome variable. A classification tree uses recursive partitioning with a categorical outcome variable. The tree representation is upside down because the “roots” are at the top and the “leaves” are at the bottom.
5.
The matter of linearity can be subtle. A single break can be represented by a step function, which is nonlinear. For the collection of all the partition constructed from a single variable, two or more breaks imply two or more step functions which are also nonlinear. One should distinguish between the functions responsible for the each partition and the lines separating the partitions.
6.
In this instance, “bias” refer to a systematic tendency to underestimate or overestimate the terminal node proportions in the population responsible for the data.

References

Berk, R. A. (2016) Statistical Learning from a Regression Perspective second edition New York: Springer.
Book Google Scholar
Breiman, L. (1996) Bagging predictors. Machine Learning 26:123–140.
MATH Google Scholar
Freedman, D.A. (2009) Statistical ModelsCambridge, UK: Cambridge University Press.
Google Scholar
Friedman, J. H. (2002) Stochastic gradient boosting. Computational Statistics and Data Analysis 38: 367–378.
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009) The Elements of Statistical Learning. Second Edition. New York: Springer.
Book Google Scholar
Ho, T.K. (1998) The random subspace method for constructing decision trees. IEEE Transactions on Pattern Recognition and Machine Intelligence 20 (8) 832–844.
Article Google Scholar
Leeb, H., & Pötscher, B.M. (2005) Model selection and inference: facts and fiction,” Econometric Theory21: 21–59.
Google Scholar
Leeb, H., & Pötscher, B.M. (2006) Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics 34(5): 2554–2591.
Article MathSciNet Google Scholar
Monahan, J., & Solver, E. (2003) Judicial decision thresholds for violence risk management. International Journal of Forensic Mental Health 2(1):1–6.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Criminology, University of Pennsylvania, Philadelphia, PA, USA
Richard Berk

Authors

Richard Berk
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berk, R. (2019). A Conceptual Introduction to Classification and Forecasting. In: Machine Learning Risk Assessments in Criminal Justice Settings. Springer, Cham. https://doi.org/10.1007/978-3-030-02272-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-02272-3_3
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02271-6
Online ISBN: 978-3-030-02272-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics