Forecasts of Violence to Inform Sentencing Decisions

Berk, Richard; Bleich, Justin

doi:10.1007/s10940-013-9195-0

Forecasts of Violence to Inform Sentencing Decisions

Original Paper
Published: 17 February 2013

Volume 30, pages 79–96, (2014)
Cite this article

Journal of Quantitative Criminology Aims and scope Submit manuscript

Richard Berk^1,2 &
Justin Bleich^1,2

1148 Accesses
27 Citations
Explore all metrics

Abstract

Objectives

Recent legislation in Pennsylvania mandates that forecasts of "future dangerousness" be provided to judges when sentences are given. Similar requirements already exist in other jurisdictions. Research has shown that machine learning can lead to usefully accurate forecasts of criminal behavior in such setting. But there are settings in which there is insufficient IT infrastructure to support machine learning. The intent of this paper is provide a prototype procedure for making forecasts of future dangerousness that could be used to inform sentencing decisions when machine learning is not practical. We consider how classification trees can be improved so that they may provide an acceptable second choice.

Methods

We apply an version of classifications trees available in R, with some technical enhancements to improve tree stability. Our approach is illustrated with real data that could be used to inform sentencing decisions.

Results

Modest sized trees grown from large samples can forecast well and in a stable fashion, especially if the small fraction of indecisive classifications are found and accounted for in a systematic manner. But machine learning is still to be preferred when practical.

Conclusions

Our enhanced version of classifications trees may well provide a viable alternative to machine learning when machine learning is beyond local IT capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

There can also be very different approaches to how a classification model is developed and evaluated. For example, proper concern about overfitting is sometimes not coupled with the use of held-out, test data when a classification tree is constructed. Out-of-sample performance is today’s gold standard.
Any predictor can be used as many times as needed, but with different splitting values. For instance, the first split for age might be at 21 years, and a later split might be at 32 years.
The procedure is formally a Bayes classifier if the terminal node proportions are treated as probabilities.
This formalization applies equally well when there are more than two outcome classes. Also, the 1/0 coding is convenient but any other binary labels will work. Because a class is categorical, the coding does not even need to be numerical. One might use, for instance, “F” for fail; and “NF” for not fail.
There is nothing special about the 5 to 1 cost ratio, and for the methodological issues raised in the paper, most any reasonable cost ratio would suffice. The cost ratio just could not be so extreme that the same class is assigned to essentially all cases. In that instance, the role of either false positives or false negatives would be obscured.
Cost ratios have ranged from 20 to 1 to 3 to 1. The relative costs of false negatives are more dear. Anecdotally, we have found that criminal justice officials and representatives from a variety of citizen groups broadly agree that at least for crimes of violence, false negatives are substantially more costly than false positives.
In this context, a random forest is an ensemble of classification trees that differ from one another by user-controlled chance processes. In one such process, when each tree is grown, a random sample of observations is “held out” and serves as test data for that tree. There is no need for the researcher to assemble test data before the statistical analysis begins.
Overall model error is reported at the lower right corner of the table. However, when the loss function is asymmetric, the overall proportion correct (or in error) is misleading. Symmetric costs are being assumed.
One important reason stems from the large imbalance in the outcome distribution favoring the absence of violent crime. The absence of violent crime is far more common and, therefore, easier to predict correctly. Random forests earns its keep by forecasting rare events.
We used the same tuning parameter values so that the trees were comparable. We were trying to isolate the impact of new data for a tree of a given complexity. Had we altered the tuning parameter values, changes in how cases were classified could result from either the new data or differing tuning parameters. The cost ratio was also unchanged.
One reason Kuhnert and Mengersen suggest additional measures of stability is that they apparently do not find sufficiently strong justification for using a Bayes classifier. In our context, where the relative costs of false negatives and false positives are elicited from stakeholders, their concerns seem far less troubling.
Even using the same data, it is likely that a machine learning procedure would do better. Recall, however, that in this paper we are assuming that machine learning is not a practical option.

References

Banks S, Robbins PC, Silver E, Vesselinov R, Steadman HJ, Monahan J, Mulvey EP, Applebaum PS, Grisso RT, Roth LH (2004) A multiple-models approach to violence risk assessment among people with mental disorder. Crim Justice Behav 31:324–340
Article Google Scholar
Barnes GC, Ahlman L, Gill C, Sherman LW, Kurtz E, Malvestuto R (2010) Low intensity community supervision for low-risk offenders: a randomized, controlled trial. J Exp Criminol 6:159-189
Article Google Scholar
Berk RA (2008a) Forecasting methods in crime and justice. In: Hagan J, Schepple KL, Tyler TR (eds) Annual review of law and social science, vol 4. Annual Reviews, Palo Alto, pp 173–192
Berk RA (2008b) Statistical learning from a regression perspective. Springer, New York
Google Scholar
Berk RA (2009) The role of race in forecasts of violent crime. Race Soc Probl 1(4):231–242
Article Google Scholar
Berk RA (2011) Asymmetric loss functions for forecasting in criminal justice settings. J Quant Criminol 27:107–123
Article Google Scholar
Berk RA (2012) Criminal justice forecasts of risk: a machine learning approach. Springer, New York
Book Google Scholar
Berk RA (2013) Algorithmic criminology. Secur Inform 2(5) (forthcoming)
Berk RA, Sorenson SB, He Y (2005) Developing a practical forecasting screener for domestic violence incidents. Eval Rev 29(4):358–382
Article Google Scholar
Berk RA, Kriegler B, Baek J-H (2006) Forecasting dangerous inmate misconduct: an application of ensemble statistical procedures. J Quant Criminol 22(2):135–145
Article Google Scholar
Berk RA, Sherman L, Barnes G, Kurtz E, Ahlman L (2009) Forecasting murder within a population of probationers and parolees: a high stakes application of statistical learning. J R Stat Soc Ser A 172(part I):191–211
Article Google Scholar
Borden HG (1928) Factors predicting parole success. J Am Inst Crim Law Criminol 19:328–336
Article Google Scholar
Breiman L (2001a) Random forests. Mach Learn 45:5–32
Article Google Scholar
Breiman L (2001b) Statistical modeling: two cultures (with discussion). Stat Sci 16:199–231
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Press, Monterey
Google Scholar
Burgess EM (1928) Factors determining success or failure on parole. In: Bruce AA, Harno AJ, Burgess EW, Landesco EW (eds) The working of the indeterminate sentence law and the parole system in Illinois. State Board of Parole, Springfield, pp 205–249
Google Scholar
Bushway S (2011) Albany Law Rev 74(3)
Casey PM, Warren RK, Elek JK (2011) Using offender risk and needs assessment information at sentencing: guidance from a national working group. National Center for State Courts. http://www.ncsconline.org/
Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298
Article Google Scholar
Farrington DP, Tarling R (2003) Prediction in criminology. SUNY Press, Albany
Google Scholar
Feeley M, Simon J (1994) Actuarial justice: the emerging new criminal law. In: Nelken D (ed) The futures of criminology. Sage, London, pp 173–201
Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
Article Google Scholar
Gottfredson SD, Moriarty LJ (2006) Statistical risk assessment: old problems and new applications. Crime Delinq 52(1):178–200
Article Google Scholar
Harcourt BW (2007) Against prediction: profiling, policing, and punishing in an actuarial age. University of Chicago Press, Chicago
Google Scholar
Hastie R, Dawes RM (2001) Rational choice in an uncertain world. Sage, Thousand Oaks
Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
Book Google Scholar
Holmes S (2003) Bootstrapping phylogenetic trees: theory and methods. Stat Sci 18(2):241–255
Article Google Scholar
Hyatt JM, Chanenson L, Bergstrom MH (2011) Reform in motion: the promise and profiles of incorporating risk assessments and cost-benefit analysis into Pennsylvania sentencing. Duquesne Law Rev 49(4):707–749
Google Scholar
Kleiman M, Ostrom BJ, Cheeman FL (2007) Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime Delinq 53(1):1–27
Article Google Scholar
Kuhnert PM, Mengersen K (2003) Reliability measures for local nodes assessment in classification trees. J Comput Graph Stat 12(2):398–426
Article Google Scholar
Messinger SL, Berk RA (1987) Dangerous people: a review of the NAS report on career criminals. Criminology 25(3):767–781
Article Google Scholar
Monahan J (1981) Predicting violent behavior: an assessment of clinical techniques. Sage, Newbury Park
Google Scholar
Pew Center of the States, Public Safety Performance Project (2011) Risk/needs assessment 101: science reveals new tools to manage offenders. The Pew Center of the States. http://www.pewcenteronthestates.org/publicsafety.
Silver E, Chow-Martin L (2002) A multiple models approach to assessing recidivism risk: implications for judicial decision making. Crim Justice Behav 29:538–569
Article Google Scholar
Skeem JL, Monahan J (2011) Current directions in violence risk assessment. Curr Dir Psychol Sci 21(1):38–42
Article Google Scholar
Tóth N (2008) Handling classification uncertainty with decision trees in biomedical diagnostic systems. PhD thesis, Department of Measurement and Information Systems, Budapest University of Technology and Economics
Turner S, Hess J, Jannetta J (2009) Development of the California Risk Assessment Instrument. Center for Evidence Based Corrections, The University of California, Irvine
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA
Richard Berk & Justin Bleich
Department of Criminology, University of Pennsylvania, Philadelphia, PA, USA
Richard Berk & Justin Bleich

Authors

Richard Berk
View author publications
You can also search for this author in PubMed Google Scholar
Justin Bleich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Berk.

Appendix: Outline of Software for Stability Assessments

The steps that follow summarize the R code used to discard unreliable forecasts.

1.
Construct a classification tree using a loss-matrix with the desired costs.
2.
For each case, find the terminal node where it was classified.
3.
Create a table that shows for each terminal node, how many observations in that node were labelled “Fail” vs. “No Fail.”
4.
To understand how close the vote in each node was, re-weight the “Fails” in each terminal node by their weight as if they were false negatives (e.g., assign a cost of 5.0 to each failure). This now allows one to check the majority vote (using weighted fails) to see which classification won.
5.
Let F be the weighted sum of all of the “Fails” (e.g., 400 fails × 5.0 cost = 2,000) and NF be total number of non-failures, each with a weight of 1.0. Let p = F/(F + NF), the proportion of the weighted total number of votes that are “Fail” in a given terminal node. This can be done symmetrically with (1 − p) as well.
6.
Set a desired margin. Check if |p − .5| < c is too close to call. How close is too close will require some trial and error.
7.
Store which nodes are too close and which are not. Use the output from the tree with the additional label that some nodes are now too close too call.
8.
One can exclude these too-close observations and run a cross-check against the bootstrapped trees to see how the stability improves.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berk, R., Bleich, J. Forecasts of Violence to Inform Sentencing Decisions. J Quant Criminol 30, 79–96 (2014). https://doi.org/10.1007/s10940-013-9195-0

Download citation

Published: 17 February 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10940-013-9195-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forecasts of Violence to Inform Sentencing Decisions