Skip to main content

Transparency, Accuracy and Fairness

  • Chapter
  • First Online:
Book cover Machine Learning Risk Assessments in Criminal Justice Settings
  • 1659 Accesses

Abstract

Criminal justice risk assessments are often far more than academic exercises. They can serve as informational input to a range of real decisions affecting real people. The consequences of these decisions can be enormous, and they can be made in error. Stakeholders need to know about the risk assessment tools being deployed. The need to know includes transparency, accuracy, and fairness. All three raise complicated issues in part because they interact with one another. Each will be addressed in turn. There will be no technical fix and no easy answers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In addition, the suggestions for procedural regularity may have no bite because machine learning used in criminal justice risk assessments is usually based on algorithmic code that has been employed for several years by many different analysts. Regularity problems and performance anomalies are generally reported to sites like github (https://github.com) where developers and others try to make repairs. But such “community” solutions work best for code that is made public. For proprietary code, there are usually reporting protocols as well, but it can be very difficult to determine what is done with the reports.

  2. 2.

    Which outcome that gets called a “positive” and which outcome gets called a “negative” is formally arbitrary. But in practice, the outcome about which there is greater concern, and which likely motivates the prediction exercise, is called the positive. In examinations of chest X-rays looking for lung tumors, a tumor is often a positive.

  3. 3.

    There will be cost ratios to tune for each possible pair of outcome classes.

  4. 4.

    Naming conventions alone get very complicated. What here is called forecasting accuracy equality, some other writers call predictive parity or calibration within groups, although the definitions vary a bit. There is remarkable consistency in use of the terms false positive rates and false negative rates, but the distinction breaks down if there are more than two outcome classes. Suppose there are three such classes. If one is called a positive and another is called a negative, what is the remaining outcome class to be called? Suppose we called it Fred. If the true outcome class is a positive and class assigned by the algorithm is not the true class, one can have both a false negative and a false Fred. That is why the term classification error is used without giving each flavor of classification error a standard name.

  5. 5.

    There are a variety of imputation methods, some specifically designed for particular machine learning procedures. However, they generally are not derived from first principles, and raise a whole new set of fairness concerns. The best approach, when possible, is to go back to the primary sources of the data and make corrections.

  6. 6.

    One often finds that 90% of time required to develop useful machine learning forecasts is devoted to data cleaning.

  7. 7.

    There can be subtle issues. Illegitimate variables are needed to document many kinds of unfairness and in some applications to moderate unfairness in the algorithmic results.

  8. 8.

    Jittering introduces randomness into the data by adding noise to observed data values. The amount of noise to be added is a tuning parameter. For random forests, one might jitter the data 100 times and plot the distribution of votes. If the class implied is almost always the same, one can have greater confidence in the reliability of the forecasted class.

  9. 9.

    There can be a strong temptation to apply some form of statistical inference. Cavaet emptor. Proper statistical inference for machine learning is by and large an unsolved problem. It is very unlikely that conventional approaches will perform properly. Currently, there are some useful options with test data but they do not carry over to forecasts for individuals except in special cases (e.g., there is a substantial number of exact replicates of the case for which forecasts are needed in the test data). More will be said about statistical inference later in several different sections.

References

  • Arrow, K. (1950) A difficulty in the concept of social welfare. Journal of Political Economy 58(4) 328–346.

    Article  Google Scholar 

  • Berk, R. A., Heirdari, H., Jabbari, S., Kearns, M., & Roth, A. (2018a) Fairness in criminal justice risk assessments: The State of the Art. Sociological Methods and Research, in press.

    Google Scholar 

  • Chouldechova, A. (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv:1703.00056v1 [stat. AP].

    Article  Google Scholar 

  • Coglianese, C., & Lehr, D. (2018a) Transparency and algorithmic governance. Working Paper. Penn Program on Regulation, University of Pennsylvania Law School.

    Google Scholar 

  • Coglianese, C., & Lehr, D. (2018b) Algorithm vs. algorithm: placing regulatory use of machine learning in perspective. Working Paper. Penn Program on Regulation, University of Pennsylvania Law School.

    Google Scholar 

  • Corbett-Davies, S. & Goel, S. (2018) The measure and mismeasure of fairness: a critical review of fair machine learning. 35th International Conference on Machine Learning (ICML 2018).

    Google Scholar 

  • Dana, J., & Dawes, R. M. (2004). The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics 29(3): 317–331.

    Article  Google Scholar 

  • Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science 243(4899): 1668–1674.

    Article  Google Scholar 

  • Hand, D.J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77: 103–123.

    Article  Google Scholar 

  • Huq, A.Z. (2019) Racial equality in algorithmic criminal justice. Duke Law Journal 68, forthcoming.

    Google Scholar 

  • Kearns, M., Neel, S., Roth, A, & Wu, Z. (2018a) Preventing fairness gerrymandering: auditing and learning subgroup fairness. arXiv:1711.05144v4 [cs.LG].

    Google Scholar 

  • Kearns, M., Neel, S., Roth, A, & Wu, Z. (2018b) An empirical study of rich subgroup fairness for machine learning. asXiv:1808.08166v1 [cs.LG]

    Google Scholar 

  • Kleinman, M., Ostrom, B. J., & Cheeman, F. L. (2007) Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime & Delinquency 53(1): 1–27.

    Google Scholar 

  • Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017b) Inherent tradeoffs in the fair determination of risk scores. Proc. 8th Conference on Innovations in Theoretical Computer Science (ITCS).

    Google Scholar 

  • Kroll, J.A., Huey, J., Barocas, S., Felten, E.W., Reidenberg, J.R., Robinson, D.G., & Yu, H. (2018) Accountable algorithms. University of Pennsylvania Law Review 165: 633–705.

    Google Scholar 

  • Lobo, J. M.; Jiménez-Valverde, A., & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145–151.

    Article  Google Scholar 

  • Mease, D., Wyner, A.J., & Buja, A. (2007) Boosted classification trees and class probability/quantile estimation. Journal of Machine Learning Research 8: 409–439.

    MATH  Google Scholar 

  • Sen, A. (2018) Collective Choice and Social Welfare Cambridge: Harvard university Press

    MATH  Google Scholar 

  • Zeng, J., Ustan, B., & Rudin, C. (2017) Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A 180(3): 689–722.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Berk, R. (2019). Transparency, Accuracy and Fairness. In: Machine Learning Risk Assessments in Criminal Justice Settings. Springer, Cham. https://doi.org/10.1007/978-3-030-02272-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02272-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02271-6

  • Online ISBN: 978-3-030-02272-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics