A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology

Harvey, Hugh; Glocker, Ben

doi:10.1007/978-3-319-94878-2_6

Hugh Harvey⁴ &
Ben Glocker⁵

20k Accesses
28 Citations
15 Altmetric

Abstract

Medical imaging data is now extremely abundant due to over two decades of digitisation of imaging protocols and data storage formats. However, clean, well-curated data, that is amenable to machine learning, is relatively scarce, and AI developers are paradoxically data starved. Imaging and clinical data is also heterogeneous, often unstructured and unlabelled, whereas current supervised and semi-supervised machine learning techniques rely on homogeneous and carefully annotated data. While imaging biobanks contain small volumes of well-curated data, it is the leveraging of ‘big data’ from the front-line of healthcare that is the focus of many machine learning developers hoping to train and validate computer vision algorithms. The quest for sufficiently large volumes of clean data that can be used for training, validation and testing involves several hurdles, namely ethics and consent, security, the assessment of data quality, ground truth data labelling, bias reduction, reusability and generalisability. In this chapter we propose a new medical imaging data readiness (MIDaR) scale. The MIDaR scale is designed to objectively clarify data quality for both researchers seeking imaging data and clinical providers aiming to share their data. It is hoped that the MIDaR scale will be used globally during collaborative academic and business conversations, so that everyone can more easily understand and quickly appraise the relevant stages of data readiness for machine learning in relation to their AI development projects. We believe that the MIDaR scale could become essential in the design, planning and management of AI medical imaging projects, and significantly increase chances of success.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J Bus Res 2017;70:263–286. https://www.sciencedirect.com/science/article/pii/S014829631630488X.
Article Google Scholar
Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision. Vol 2017-Oct. New York: IEEE; 2017. p. 843–52. ISBN: 9781538610329. https://doi.org/10.1109/ICCV.2017.97. http://ieeexplore.ieee.org/document/8237359/.
Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst. 2009;24(2)8–12. ISSN: 1541-1672. https://doi.org/10.1109/MIS.2009.36. http://ieeexplore.ieee.org/document/4804817/.
Article Google Scholar
Gueld MO, Kohnen M, Keysers D, Schubert H, Wein BB, Bredno J, Lehmann TM. Quality of DICOM header information for image categorization. Proc SPIE. 2002;4685:280–7. ISSN: 0277786X. https://doi.org/10.1117/12.467017. http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=880364. http://dx.doi.org/10.1117/12.467017.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. ISSN: 2052-4463. https://doi.org/10.1038/sdata.2016.18. http://www.ncbi.nlm.nih.gov/pubmed/26978244 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4792175 http://www.nature.com/articles/sdata201618.
Article Google Scholar
Kohli MD, Summers RM, Raymond Geis J. Medical image data and datasets in the era of machine learning-whitepaper from the 2016 C-MIMI Meeting Dataset Session. J Digit Imaging. 2017;30 (4):392–9. ISSN: 0897-1889. https://doi.org/10.1007/s10278-017-9976-3. http://www.ncbi.nlm.nih.gov/pubmed/28516233 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5537092 http://link.springer.com/10.1007/s10278-017-9976-3.
Article Google Scholar
Lawrence ND. Data readiness levels; 2017. http://arxiv.org/abs/1705.02245.
Supplements – DICOM standard. https://www.dicomstandard.org/supplements/.
De-identification knowledge base - the cancer imaging archive (TCIA) public access - cancer imaging archive Wiki; 2017. https://wiki.cancerimagingarchive.net/display/Public/De-identification+Knowledge+Base.
European Commission - Directorate General for Research and Innovation. Ethics for researchers - Facilitating Research Excellence in FP7. Technical report; 2013. http://ec.europa.eu/research/fp7/index_en.cfm?pg=documents http://ec.europa.eu/research/participants/data/ref/fp7/89888/ethics-for-researchers_en.pdf.
Integrated Research Application System; 2018. https://www.myresearchproject.org.uk/.
Research Ethics Committees overview - Health Research Authority; 2018. https://www.hra.nhs.uk/about-us/committees-and-services/res-and-recs/research-ethics-committees-overview/.
Institutional Review Board; 2018. https://www.niehs.nih.gov/about/boards/irb/index.cfm.
Santosh KC, Wendling L. Automated chest X-ray image view classification using force histogram. Singapore: Springer; 2017. p. 333–42. https://doi.org/10.1007/978-981-10-4859-3{_}30. http://link.springer.com/10.1007/978-981-10-4859-3_30.
Pons E, Braun LMM, Myriam Hunink MG, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43. ISSN: 0033-8419. https://doi.org/10.1148/radiol.16142770. http://pubs.rsna.org/doi/10.1148/radiol.16142770.
Article Google Scholar
Smith SM, Nichols TE. Statistical challenges in “big data” human neuroimaging; 2018. ISSN: 10974199. http://www.ncbi.nlm.nih.gov/pubmed/29346749.
Article CAS Google Scholar

Download references

Acknowledgements

With thanks to Hugh Lyshkow, DesAcc Inc. for his invaluable input and insight.

Author information

Authors and Affiliations

Kheiron Medical Technologies, London, UK
Hugh Harvey
Imperial College, London, UK
Ben Glocker

Authors

Hugh Harvey
View author publications
You can also search for this author in PubMed Google Scholar
Ben Glocker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hugh Harvey .

Editor information

Editors and Affiliations

ETZ Hospital, Tilburg, The Netherlands
Erik R. Ranschaert
Radiology Research and Practical Centre, Moscow, Russia
Sergey Morozov
Department of Radiology, Northwest Hospital Group, Alkmaar, The Netherlands
Paul R. Algra

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harvey, H., Glocker, B. (2019). A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology. In: Ranschaert, E., Morozov, S., Algra, P. (eds) Artificial Intelligence in Medical Imaging. Springer, Cham. https://doi.org/10.1007/978-3-319-94878-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-94878-2_6
Published: 30 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94877-5
Online ISBN: 978-3-319-94878-2
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics