# What do the experts know? Calibration, precision, and the wisdom of crowds among forensic handwriting experts

- 939 Downloads
- 1 Citations

## Abstract

Forensic handwriting examiners currently testify to the origin of questioned handwriting for legal purposes. However, forensic scientists are increasingly being encouraged to assign probabilities to their observations in the form of a likelihood ratio. This study is the first to examine whether handwriting experts are able to estimate the frequency of US handwriting features more accurately than novices. The results indicate that the absolute error for experts was lower than novices, but the size of the effect is modest, and the overall error rate even for experts is large enough as to raise questions about whether their estimates can be sufficiently trustworthy for presentation in courts. When errors are separated into effects caused by miscalibration and those caused by imprecision, we find systematic differences between individuals. Finally, we consider several ways of aggregating predictions from multiple experts, suggesting that quite substantial improvements in expert predictions are possible when a suitable aggregation method is used.

## Keywords

Judgment and decision-making Bayesian modeling Expertise Wisdom of crowds## References

- Aitken, C., Berger, C. E. H., Buckleton, J. S., Champod, C., Curran, J., Dawid, A., & Jackson, G. (2011). Expressing evaluative opinions: a position statement.
*Science and Justice*,*51*, 1–2.CrossRefGoogle Scholar - Biedermann, A., Garbolino, P., & Taroni, F. (2013). The subjectivist interpretation of probability and the problem of individualisation in forensic science.
*Science and Justice*,*53*, 192–200.CrossRefPubMedGoogle Scholar - Budescu, D. V., & Johnson, T. R. (2011). A model-based approach for the analysis of the calibration of probability judgments.
*Judgment and Decision Making*,*6*, 857–869.Google Scholar - Cochran, W. G. (1968). Errors of measurement in statistics.
*Technometrics*,*10*(4), 637–666.CrossRefGoogle Scholar - Dror, I. E. (2016). A hierarchy of expert performance.
*Journal of Applied Research in Memory and Cognition*,*5*(2), 121–127.CrossRefGoogle Scholar - Dror, I. E., & Cole, S. A. (2010). The vision in “blind” justice: expert perception, judgment, and visual cognition in forensic pattern recognition.
*Psychonomic Bulletin & Review*,*17*(2), 161–167.CrossRefGoogle Scholar - Dyer, A. G., Found, B., & Rogers, D. (2006). Visual attention and expertise for forensic signature analysis.
*Journal of Forensic Sciences*,*51*(6), 1397–1404.CrossRefPubMedGoogle Scholar - Edwards, H., & Gotsonis, C. (2009).
*Strengthening forensic science in the United States: A path forward*. Washington, DC: National Academies Press.Google Scholar - Ericsson, K. A., & Lehmann, A. C. (1996). Expert and exceptional performance: Evidence of maximal adaptation to task constraints.
*Annual Review of Psychology*,*47*(1), 273–305.CrossRefPubMedGoogle Scholar - Ericsson, K. A., & Pool, R. (2016). Peak: Secrets from the new science of expertise. Houghton Mifflin Harcourt.Google Scholar
- Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance.
*Psychological Review*,*100*(3), 363.CrossRefGoogle Scholar - Faigman, D. L. (2007). Anecdotal forensics, phrenology, and other abject lessons from the history of science.
*Hastings Law Journal*,*59*, 979–1000.Google Scholar - Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes.
*Psychological Science*,*12*, 499–503.CrossRefPubMedGoogle Scholar - Fiser, J., & Aslin, R. N. (2002). Statistical learning of higher-order temporal structure from visual shape sequences.
*Journal of Experimental Psychology: Learning, Memory, and Cognition*,*28*(3), 458–467.PubMedGoogle Scholar - Found, B., & Rogers, D. (2008). The probative character of forensic handwriting examiners’ identification and elimination opinions on questioned signatures.
*Forensic Science International*,*178*(1), 54–60.CrossRefPubMedGoogle Scholar - Gladwell, M. (2008).
*Outliers: The story of success*. UK: Hachette.Google Scholar - Goertzel, T. (1994). Belief in conspiracy theories. Political Psychology, 731–742.Google Scholar
- Janis, I. L. (1982) Vol. 349. Boston: Houghton Mifflin.Google Scholar
- Johnson, M. E., Vastrick, T. W., Boulanger, M., & Schuetzner, E. (2016). Measuring the frequency occurrence of handwriting and handprinting characteristics. Journal of Forensic Sciences.Google Scholar
- Kam, M., Gummadidala, K., Fielding, G., & Conn, R. (2001). Signature authentication by forensic document examiners.
*Journal of Forensic Science*,*46*(4), 884–888.CrossRefGoogle Scholar - Lee, M. D., & Danileiko, I. (2014). Using cognitive models to combine probability estimates.
*Judgment and Decision Making*,*9*(3), 259–273.Google Scholar - Lichtenstein, S., Fischhoff, B., & Phillips, L. (1982). Calibration of probabilities: The state of the art to 1980. In Kahneman, D., Slovic, P., & Tversky, A. (Eds.)
*Judgement under uncertainty: Heuristics and biases*. New York: Cambridge University Press.Google Scholar - Martire, K. A., & Edmond, G. (2017). Rethinking expert opinion evidence.
*Melbourne University Law Review*,*40*, 967–998.Google Scholar - Merkle, E. C. (2010). Calibrating subjective probabilities using hierarchical Bayesian models. In
*International conference on social computing, behavioral modeling, and prediction*, (pp. 13–22).Google Scholar - Morey, R. D., & Rouder, J. N. (2015). BayesFactor: Computation of Bayes factors for common designs [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=BayesFactor (R package version 0.9.12-2).
- Murphy, A. H., & Daan, H. (1984). Impacts of feedback and experience on the quality of subjective probability forecasts. Comparison of results from the first and second years of the Zierikzee experiment.
*Monthly Weather Review*,*112*(3), 413–423.CrossRefGoogle Scholar - Pearson, K. (1902). On the mathematical theory of errors of judgment, with special reference to the personal equation.
*Philosophical Transactions of the Royal Society of London, Series A*,*198*, 235–299.CrossRefGoogle Scholar - Plummer, M. (2003). JAGS: A Program for analysis of Bayesian graphical models using Gibbs sampling. In
*Proceedings of the 3rd international workshop on distributed statistical computing*, (Vol. 124, p. 125).Google Scholar - Prelec, D. (1998). The probability weighting function.
*Econometrica*,*66*, 497–527.CrossRefGoogle Scholar - President’s Council of Advisors on Science and Technology (2016). Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods. Washington, DC: Executive Office of the President of the United States.Google Scholar
- Saks, M. J., & Koehler, J. J. (2005). The coming paradigm shift in forensic identification science.
*Science*,*309*(5736), 892–895.CrossRefPubMedGoogle Scholar - Satopää, V. A., Baron, J., Foster, D. P., Mellers, B. A., Tetlock, P. E., & Ungar, L. H. (2014). Combining multiple probability predictions using a simple logit model.
*International Journal of Forecasting*,*30*(2), 344–356.CrossRefGoogle Scholar - Schön, D., & François, C. (2011). Musical expertise and statistical learning of musical and linguistic structures.
*Frontiers in Psychology*,*2*(167), 1–9.Google Scholar - Sita, J., Found, B., & Rogers, D. K. (2002). Forensic handwriting examiners’ expertise for signature comparison.
*Journal of Forensic Science*,*47*(5), 1–8.CrossRefGoogle Scholar - Surowiecki, J. (2005). The wisdom of crowds. Anchor.Google Scholar
- Taroni, F., Aitken, C., & Garbolino, P. (2001). De Finetti’s subjectivism, the assessment of probabilities and the evaluation of evidence: A commentary for forensic scientists.
*Science & Justice*,*41*(3), 145–150.CrossRefGoogle Scholar - Turk-Browne, N. B., Jungé, J., & Scholl, B. J. (2005). The automaticity of visual statistical learning.
*Journal of Experimental Psychology: General*,*134*(4), 552–564.CrossRefGoogle Scholar - Weiss, D. J., & Shanteau, J. (2003). Empirical assessment of expertise.
*Human Factors*,*45*(1), 104–116.CrossRefPubMedGoogle Scholar - Weiss, D. J., Shanteau, J., & Harries, P. (2006). People who judge people.
*Journal of Behavioral Decision Making*,*19*(5), 441–454.CrossRefGoogle Scholar