Abstract
Medical imaging data is now extremely abundant due to over two decades of digitisation of imaging protocols and data storage formats. However, clean, well-curated data, that is amenable to machine learning, is relatively scarce, and AI developers are paradoxically data starved. Imaging and clinical data is also heterogeneous, often unstructured and unlabelled, whereas current supervised and semi-supervised machine learning techniques rely on homogeneous and carefully annotated data. While imaging biobanks contain small volumes of well-curated data, it is the leveraging of ‘big data’ from the front-line of healthcare that is the focus of many machine learning developers hoping to train and validate computer vision algorithms. The quest for sufficiently large volumes of clean data that can be used for training, validation and testing involves several hurdles, namely ethics and consent, security, the assessment of data quality, ground truth data labelling, bias reduction, reusability and generalisability. In this chapter we propose a new medical imaging data readiness (MIDaR) scale. The MIDaR scale is designed to objectively clarify data quality for both researchers seeking imaging data and clinical providers aiming to share their data. It is hoped that the MIDaR scale will be used globally during collaborative academic and business conversations, so that everyone can more easily understand and quickly appraise the relevant stages of data readiness for machine learning in relation to their AI development projects. We believe that the MIDaR scale could become essential in the design, planning and management of AI medical imaging projects, and significantly increase chances of success.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J Bus Res 2017;70:263–286. https://www.sciencedirect.com/science/article/pii/S014829631630488X.
Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision. Vol 2017-Oct. New York: IEEE; 2017. p. 843–52. ISBN: 9781538610329. https://doi.org/10.1109/ICCV.2017.97. http://ieeexplore.ieee.org/document/8237359/.
Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst. 2009;24(2)8–12. ISSN: 1541-1672. https://doi.org/10.1109/MIS.2009.36. http://ieeexplore.ieee.org/document/4804817/.
Gueld MO, Kohnen M, Keysers D, Schubert H, Wein BB, Bredno J, Lehmann TM. Quality of DICOM header information for image categorization. Proc SPIE. 2002;4685:280–7. ISSN: 0277786X. https://doi.org/10.1117/12.467017. http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=880364. http://dx.doi.org/10.1117/12.467017.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. ISSN: 2052-4463. https://doi.org/10.1038/sdata.2016.18. http://www.ncbi.nlm.nih.gov/pubmed/26978244 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4792175 http://www.nature.com/articles/sdata201618.
Kohli MD, Summers RM, Raymond Geis J. Medical image data and datasets in the era of machine learning-whitepaper from the 2016 C-MIMI Meeting Dataset Session. J Digit Imaging. 2017;30 (4):392–9. ISSN: 0897-1889. https://doi.org/10.1007/s10278-017-9976-3. http://www.ncbi.nlm.nih.gov/pubmed/28516233 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5537092 http://link.springer.com/10.1007/s10278-017-9976-3.
Lawrence ND. Data readiness levels; 2017. http://arxiv.org/abs/1705.02245.
Supplements – DICOM standard. https://www.dicomstandard.org/supplements/.
De-identification knowledge base - the cancer imaging archive (TCIA) public access - cancer imaging archive Wiki; 2017. https://wiki.cancerimagingarchive.net/display/Public/De-identification+Knowledge+Base.
European Commission - Directorate General for Research and Innovation. Ethics for researchers - Facilitating Research Excellence in FP7. Technical report; 2013. http://ec.europa.eu/research/fp7/index_en.cfm?pg=documents http://ec.europa.eu/research/participants/data/ref/fp7/89888/ethics-for-researchers_en.pdf.
Integrated Research Application System; 2018. https://www.myresearchproject.org.uk/.
Research Ethics Committees overview - Health Research Authority; 2018. https://www.hra.nhs.uk/about-us/committees-and-services/res-and-recs/research-ethics-committees-overview/.
Institutional Review Board; 2018. https://www.niehs.nih.gov/about/boards/irb/index.cfm.
Santosh KC, Wendling L. Automated chest X-ray image view classification using force histogram. Singapore: Springer; 2017. p. 333–42. https://doi.org/10.1007/978-981-10-4859-3{_}30. http://link.springer.com/10.1007/978-981-10-4859-3_30.
Pons E, Braun LMM, Myriam Hunink MG, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43. ISSN: 0033-8419. https://doi.org/10.1148/radiol.16142770. http://pubs.rsna.org/doi/10.1148/radiol.16142770.
Smith SM, Nichols TE. Statistical challenges in “big data” human neuroimaging; 2018. ISSN: 10974199. http://www.ncbi.nlm.nih.gov/pubmed/29346749.
Acknowledgements
With thanks to Hugh Lyshkow, DesAcc Inc. for his invaluable input and insight.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Harvey, H., Glocker, B. (2019). A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology. In: Ranschaert, E., Morozov, S., Algra, P. (eds) Artificial Intelligence in Medical Imaging. Springer, Cham. https://doi.org/10.1007/978-3-319-94878-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-94878-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94877-5
Online ISBN: 978-3-319-94878-2
eBook Packages: MedicineMedicine (R0)