Artificial Intelligence in Medicine: Validation and Study Design

Oakden-Rayner, Luke; Palmer, Lyle John

doi:10.1007/978-3-319-94878-2_8

Artificial Intelligence in Medicine: Validation and Study Design

Luke Oakden-Rayner^4,5 &
Lyle John Palmer^4,5

Chapter
First Online: 30 January 2019

20k Accesses
7 Citations
11 Altmetric

Abstract

There has been a vast expansion in the volume of artificial intelligence (AI) research in biomedicine over the last several years. Simultaneously, we have begun to see the first medical AI systems rapidly translating from research into clinical practice. Evaluating AI systems for clinical tasks can be quite different than for other applications of AI. In medicine, the stakes are often higher—both risks and rewards. In this chapter, we explore key concepts underpinning the design, performance and validation of medical AI experiments. We also discuss several unresolved challenges the field currently faces.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Giger ML. Machine learning in medical imaging. J Am Coll Radiol. 2018;15:512–20.
Article Google Scholar
Harris S. Record year for investment in medical imaging AI companies. 2017. <https://www.signifyresearch.net/medical-imaging/record-year-investment-medical-imaging-ai-companies/>
Petryna A. When experiments travel: clinical trials and the global search for human subjects. Princeton, NJ: Princeton University Press; 2009.
Book Google Scholar
Simonite T. Google’s AI doctor gets ready to go to work in India. 2017. <https://www.wired.com/2017/06/googles-ai-eye-doctor-gets-ready-go-work-india/>
Enlitic. Enlitic to partner with Paiyipai to deploy deep learning in health check centers across China. 2017. <https://www.prnewswire.com/news-releases/enlitic-to-partner-with-paiyipai-to-deploy-deep-learning-in-health-check-centers-across-china-300433790.html>
U.S. Food and Drug Administration. FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems. 2018. <https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm>
Euser AM, Zoccali C, Jager KJ, Dekker FW. Cohort studies: prospective versus retrospective. Nephron Clin Pract. 2009;113:c214–7.
Article Google Scholar
Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Philadelphia, PA: Wolters Kluwer Health; 2008.
Google Scholar
Wang X, et al. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2017. p. 3462–3471.
Google Scholar
Anderson E, Muir B, Walsh J, Kirkpatrick A. The efficacy of double reading mammograms in breast screening. Clin Radiol. 1994;49:248–51.
Article CAS Google Scholar
Manrai AK, Patel CJ, Ioannidis JP. In the era of precision medicine and big data, who is normal? JAMA. 2018;319:1981–2.
Article Google Scholar
Gulshan V, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–10.
Article Google Scholar
Punjabi NM. The epidemiology of adult obstructive sleep apnea. Proc Am Thorac Soc. 2008;5:136–43.
Article Google Scholar
Ogasawara KK. Variation in fetal ultrasound biometry based on differences in fetal ethnicity. Am J Obstet Gynecol. 2009;200:676. e671–4.
Article Google Scholar
Shipp TD, Bromley B, Mascola M, Benacerraf B. Variation in fetal femur length with respect to maternal race. J Ultrasound Med. 2001;20:141–4.
Article CAS Google Scholar
BBC News. Google apologises for Photos app’s racist blunder. 2015. <http://www.bbc.com/news/technology-33347866>
Agarwala A. Automatic photography with google clips. 2018. <https://ai.googleblog.com/2018/05/automatic-photography-with-google-clips.html>
Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists’ interpretations of mammograms. N Engl J Med. 1994;331:1493–9.
Article CAS Google Scholar
Esteva A, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8.
Article CAS Google Scholar
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.
Article CAS Google Scholar
Shiraishi J, Pesce LL, Metz CE, Doi K. Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. Radiology. 2009;253:822–30.
Article Google Scholar
U.S. Food and Drug Administration. Software as a medical device: clinical evaluation. 2017. <https://www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocuments/ucm524904.pdf>
Gal Y, Ghahramani Z. In: International conference on machine learning. 2016. p. 1050–1059.
Google Scholar
Obuchowski NA, et al. Multireader, multicase receiver operating characteristic analysis:: an empirical comparison of five methods. Acad Radiol. 2004;11:980–95.
PubMed Google Scholar
Obuchowski NA. Sample size tables for receiver operating characteristic studies. Am J Roentgenol. 2000;175:603–8.
Article CAS Google Scholar
Efron B. Bootstrap methods: another look at the jackknife. In: Kotz S, Johnson NL, editors. Breakthroughs in statistics. New York, NY: Springer; 1992. p. 569–93.
Chapter Google Scholar
Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016;70:129–33.
Article Google Scholar
Halsey LG, Curran-Everett D, Vowler SL, Drummond GB. The fickle P value generates irreproducible results. Nat Methods. 2015;12:179.
Article CAS Google Scholar
Ioannidis JP. The proposal to lower P value thresholds to .005. JAMA. 2018;319:1429–30.
Article Google Scholar
Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2:e124.
Article Google Scholar
Deng J, et al. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 248–255.
Google Scholar
Kohli A, Jha S. Why CAD failed in mammography. J Am Coll Radiol. 2018;15:535–7.
Article Google Scholar
Google. Google self-driving car project monthly report. 2015. <https://static.googleusercontent.com/media/www.google.com/en//selfdrivingcar/files/reports/report-1015.pdf>

Download references

Author information

Authors and Affiliations

School of Public Health, The University of Adelaide, Adelaide, Australia
Luke Oakden-Rayner & Lyle John Palmer
Australian Institute of Machine Learning, Adelaide, Australia
Luke Oakden-Rayner & Lyle John Palmer

Authors

Luke Oakden-Rayner
View author publications
You can also search for this author in PubMed Google Scholar
Lyle John Palmer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETZ Hospital, Tilburg, The Netherlands
Erik R. Ranschaert
Radiology Research and Practical Centre, Moscow, Russia
Sergey Morozov
Department of Radiology, Northwest Hospital Group, Alkmaar, The Netherlands
Paul R. Algra

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oakden-Rayner, L., Palmer, L.J. (2019). Artificial Intelligence in Medicine: Validation and Study Design. In: Ranschaert, E., Morozov, S., Algra, P. (eds) Artificial Intelligence in Medical Imaging. Springer, Cham. https://doi.org/10.1007/978-3-319-94878-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-94878-2_8
Published: 30 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94877-5
Online ISBN: 978-3-319-94878-2
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics