Transparency, Accuracy and Fairness

Berk, Richard

doi:10.1007/978-3-030-02272-3_6

Richard Berk²

1659 Accesses

Abstract

Criminal justice risk assessments are often far more than academic exercises. They can serve as informational input to a range of real decisions affecting real people. The consequences of these decisions can be enormous, and they can be made in error. Stakeholders need to know about the risk assessment tools being deployed. The need to know includes transparency, accuracy, and fairness. All three raise complicated issues in part because they interact with one another. Each will be addressed in turn. There will be no technical fix and no easy answers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In addition, the suggestions for procedural regularity may have no bite because machine learning used in criminal justice risk assessments is usually based on algorithmic code that has been employed for several years by many different analysts. Regularity problems and performance anomalies are generally reported to sites like github (https://github.com) where developers and others try to make repairs. But such “community” solutions work best for code that is made public. For proprietary code, there are usually reporting protocols as well, but it can be very difficult to determine what is done with the reports.
2.
Which outcome that gets called a “positive” and which outcome gets called a “negative” is formally arbitrary. But in practice, the outcome about which there is greater concern, and which likely motivates the prediction exercise, is called the positive. In examinations of chest X-rays looking for lung tumors, a tumor is often a positive.
3.
There will be cost ratios to tune for each possible pair of outcome classes.
4.
Naming conventions alone get very complicated. What here is called forecasting accuracy equality, some other writers call predictive parity or calibration within groups, although the definitions vary a bit. There is remarkable consistency in use of the terms false positive rates and false negative rates, but the distinction breaks down if there are more than two outcome classes. Suppose there are three such classes. If one is called a positive and another is called a negative, what is the remaining outcome class to be called? Suppose we called it Fred. If the true outcome class is a positive and class assigned by the algorithm is not the true class, one can have both a false negative and a false Fred. That is why the term classification error is used without giving each flavor of classification error a standard name.
5.
There are a variety of imputation methods, some specifically designed for particular machine learning procedures. However, they generally are not derived from first principles, and raise a whole new set of fairness concerns. The best approach, when possible, is to go back to the primary sources of the data and make corrections.
6.
One often finds that 90% of time required to develop useful machine learning forecasts is devoted to data cleaning.
7.
There can be subtle issues. Illegitimate variables are needed to document many kinds of unfairness and in some applications to moderate unfairness in the algorithmic results.
8.
Jittering introduces randomness into the data by adding noise to observed data values. The amount of noise to be added is a tuning parameter. For random forests, one might jitter the data 100 times and plot the distribution of votes. If the class implied is almost always the same, one can have greater confidence in the reliability of the forecasted class.
9.
There can be a strong temptation to apply some form of statistical inference. Cavaet emptor. Proper statistical inference for machine learning is by and large an unsolved problem. It is very unlikely that conventional approaches will perform properly. Currently, there are some useful options with test data but they do not carry over to forecasts for individuals except in special cases (e.g., there is a substantial number of exact replicates of the case for which forecasts are needed in the test data). More will be said about statistical inference later in several different sections.

References

Arrow, K. (1950) A difficulty in the concept of social welfare. Journal of Political Economy 58(4) 328–346.
Article Google Scholar
Berk, R. A., Heirdari, H., Jabbari, S., Kearns, M., & Roth, A. (2018a) Fairness in criminal justice risk assessments: The State of the Art. Sociological Methods and Research, in press.
Google Scholar
Chouldechova, A. (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv:1703.00056v1 [stat. AP].
Article Google Scholar
Coglianese, C., & Lehr, D. (2018a) Transparency and algorithmic governance. Working Paper. Penn Program on Regulation, University of Pennsylvania Law School.
Google Scholar
Coglianese, C., & Lehr, D. (2018b) Algorithm vs. algorithm: placing regulatory use of machine learning in perspective. Working Paper. Penn Program on Regulation, University of Pennsylvania Law School.
Google Scholar
Corbett-Davies, S. & Goel, S. (2018) The measure and mismeasure of fairness: a critical review of fair machine learning. 35th International Conference on Machine Learning (ICML 2018).
Google Scholar
Dana, J., & Dawes, R. M. (2004). The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics 29(3): 317–331.
Article Google Scholar
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science 243(4899): 1668–1674.
Article Google Scholar
Hand, D.J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77: 103–123.
Article Google Scholar
Huq, A.Z. (2019) Racial equality in algorithmic criminal justice. Duke Law Journal 68, forthcoming.
Google Scholar
Kearns, M., Neel, S., Roth, A, & Wu, Z. (2018a) Preventing fairness gerrymandering: auditing and learning subgroup fairness. arXiv:1711.05144v4 [cs.LG].
Google Scholar
Kearns, M., Neel, S., Roth, A, & Wu, Z. (2018b) An empirical study of rich subgroup fairness for machine learning. asXiv:1808.08166v1 [cs.LG]
Google Scholar
Kleinman, M., Ostrom, B. J., & Cheeman, F. L. (2007) Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime & Delinquency 53(1): 1–27.
Google Scholar
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017b) Inherent tradeoffs in the fair determination of risk scores. Proc. 8th Conference on Innovations in Theoretical Computer Science (ITCS).
Google Scholar
Kroll, J.A., Huey, J., Barocas, S., Felten, E.W., Reidenberg, J.R., Robinson, D.G., & Yu, H. (2018) Accountable algorithms. University of Pennsylvania Law Review 165: 633–705.
Google Scholar
Lobo, J. M.; Jiménez-Valverde, A., & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145–151.
Article Google Scholar
Mease, D., Wyner, A.J., & Buja, A. (2007) Boosted classification trees and class probability/quantile estimation. Journal of Machine Learning Research 8: 409–439.
MATH Google Scholar
Sen, A. (2018) Collective Choice and Social Welfare Cambridge: Harvard university Press
MATH Google Scholar
Zeng, J., Ustan, B., & Rudin, C. (2017) Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A 180(3): 689–722.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Criminology, University of Pennsylvania, Philadelphia, PA, USA
Richard Berk

Authors

Richard Berk
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berk, R. (2019). Transparency, Accuracy and Fairness. In: Machine Learning Risk Assessments in Criminal Justice Settings. Springer, Cham. https://doi.org/10.1007/978-3-030-02272-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-02272-3_6
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02271-6
Online ISBN: 978-3-030-02272-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics