Skip to main content

A Conceptual Introduction to Classification and Forecasting

  • Chapter
  • First Online:
Machine Learning Risk Assessments in Criminal Justice Settings
  • 1563 Accesses

Abstract

Because the criminal justice outcomes to be forecast are usually categorical (e.g., fail or not), this chapter considers crime forecasting as a classification problem. The goal is to assign classes to cases. There may be two classes or more than two. Machine learning is broadly considered before turning in later chapters to random forests as a preferred forecasting tool. There is no use of models and at best a secondary interest in explanation. Machine learning is based on algorithms, which should not be confused with models. The material is introduced in a conceptual manner with almost no mathematics. Nevertheless, some readers may find the material challenging because a certain amount of statistical maturity must be assumed. Later chapters will use somewhat more formal expositional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An indicator variable here implies that the relationship with the outcome variable is a spike function. The regression function, therefore, is a linear combination of spike functions. If an intercept is, as usual, included in the regression, one of the indicators would need to be dropped. Otherwise, the regressor cross-product matrix would be singular, and there would be no unique solution from which to obtain estimates of the regression coefficients.

  2. 2.

    Stagewise should not be confused with stepwise, as in stepwise regression. In stepwise regression, all of the regression coefficients from the previous step are re-estimated as the next predictor is dropped (i.e., backward selection) or added (i.e. forward selection). Were there a procedure called stagewise regression, the earlier regression coefficients would not be re-estimated.

  3. 3.

    Technically, a Bayes classifier choses the class that has the largest probability. We are choosing the class with the largest proportion. To get from proportions to probabilities depends on how the data were generated. We are not there yet.

  4. 4.

    Classification trees is a special case of classification and regression trees (CART). A regression tree uses recursive partitioning with a quantitative outcome variable. A classification tree uses recursive partitioning with a categorical outcome variable. The tree representation is upside down because the “roots” are at the top and the “leaves” are at the bottom.

  5. 5.

    The matter of linearity can be subtle. A single break can be represented by a step function, which is nonlinear. For the collection of all the partition constructed from a single variable, two or more breaks imply two or more step functions which are also nonlinear. One should distinguish between the functions responsible for the each partition and the lines separating the partitions.

  6. 6.

    In this instance, “bias” refer to a systematic tendency to underestimate or overestimate the terminal node proportions in the population responsible for the data.

References

  • Berk, R. A. (2016) Statistical Learning from a Regression Perspective second edition New York: Springer.

    Book  Google Scholar 

  • Breiman, L. (1996) Bagging predictors. Machine Learning 26:123–140.

    MATH  Google Scholar 

  • Freedman, D.A. (2009) Statistical ModelsCambridge, UK: Cambridge University Press.

    Google Scholar 

  • Friedman, J. H. (2002) Stochastic gradient boosting. Computational Statistics and Data Analysis 38: 367–378.

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009) The Elements of Statistical Learning. Second Edition. New York: Springer.

    Book  Google Scholar 

  • Ho, T.K. (1998) The random subspace method for constructing decision trees. IEEE Transactions on Pattern Recognition and Machine Intelligence 20 (8) 832–844.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B.M. (2005) Model selection and inference: facts and fiction,” Econometric Theory21: 21–59.

    Google Scholar 

  • Leeb, H., & Pötscher, B.M. (2006) Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics 34(5): 2554–2591.

    Article  MathSciNet  Google Scholar 

  • Monahan, J., & Solver, E. (2003) Judicial decision thresholds for violence risk management. International Journal of Forensic Mental Health 2(1):1–6.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Berk, R. (2019). A Conceptual Introduction to Classification and Forecasting. In: Machine Learning Risk Assessments in Criminal Justice Settings. Springer, Cham. https://doi.org/10.1007/978-3-030-02272-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02272-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02271-6

  • Online ISBN: 978-3-030-02272-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics