Automatic Classification and Reporting of Multiple Common Thorax Diseases Using Chest Radiographs

Wang, Xiaosong; Peng, Yifan; Lu, Le; Lu, Zhiyong; Summers, Ronald M.

doi:10.1007/978-3-030-13969-8_19

Xiaosong Wang¹⁵,
Yifan Peng¹⁶,
Le Lu^17,18,
Zhiyong Lu¹⁶ &
…
Ronald M. Summers¹⁹

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

2682 Accesses
1 Citations

Abstract

Chest X-rays are one of the most common radiological examinations in daily clinical routines. Reporting thorax diseases using chest X-rays is often an entry-level task for radiologist trainees. Yet, reading a chest X-ray image remains a challenging job for learning-oriented machine intelligence, due to (1) shortage of large-scale machine-learnable medical image datasets, and (2) lack of techniques that can mimic the high-level reasoning of human radiologists that requires years of knowledge accumulation and professional training. In this paper, we show the clinical free-text radiological reports that accompany X-ray images in hospital picture and archiving communication systems can be utilized as a priori knowledge for tackling these two key problems. We propose a novel text-image embedding network (TieNet) for extracting the distinctive image and text representations. Multi-level attention models are integrated into an end-to-end trainable CNN-RNN architecture for highlighting the meaningful text words and image regions. We first apply TieNet to classify the chest X-rays by using both image features and text embeddings extracted from associated reports. The proposed auto-annotation framework achieves high accuracy (over 0.9 on average in AUCs) in assigning disease labels for our hand-label evaluation dataset. Furthermore, we transform the TieNet into a chest X-ray reporting system. It simulates the reporting process and can output disease classification and a preliminary report together, with X-ray images being the only input. The classification results are significantly improved (6% increase on average in AUCs) compared to the state-of-the-art baseline on an unseen and hand-labeled dataset (OpenI).

Wang—This work was done during his fellowship at National Institutes of Health Clinical Center.

Lu—This work was done during his employment at National Institutes of Health Clinical Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems
Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR), pp 1–15
Google Scholar
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Google Scholar
Ben-Cohen A, Diamant I, Klang E, Amitai M, Greenspan H (2016) Fully convolutional network for liver segmentation and lesions detection. In: International workshop on large-scale annotation of biomedical data and expert label synthesis, pp 77–85
Google Scholar
Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A (2017) Deep learning: a primer for radiologists. Radiogr Rev 37(7):2113–2131. Radiological Society of North America, Inc
Google Scholar
Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 3076–3086
Google Scholar
Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ (2015) Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inf Assoc 23(2):304–310
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Google Scholar
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118
Article Google Scholar
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Carin L, Deng L (2017) Semantic compositional networks for visual captioning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1–13
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hu R, Rohrbach M, Andreas J, Darrell T, Saenko K (2017) Modeling relationships in referential expressions with compositional modular networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1115–1124
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Google Scholar
Johnson J, Karpathy A, Fei-Fei L (2016) DenseCap: fully convolutional localization networks for dense captioning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 4565–4574
Google Scholar
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Article Google Scholar
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein MS, Li FF (2016) Visual genome: connecting language and vision using crowdsourced dense image annotations
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8, pp 1–8 (2004)
Google Scholar
Lin M, Chen Q, Yan S (2014) Network in network. In: International conference on learning representations (ICLR), pp 1–10
Google Scholar
Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2014) Microsoft COCO: common objects in context. In: European conference on computer vision (ECCV), pp 740–755
Chapter Google Scholar
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: 5th international conference on learning representations (ICLR), pp 1–15
Google Scholar
Liu J, Wang D, Lu L, Wei Z, Kim L, Turkbey EB, Sahiner B, Petrick N, Summers RM (2017) Detection and diagnosis of colitis on computed tomography using deep convolutional neural networks. Med Phys 44(9):4630–4642
Article Google Scholar
Liu Y, Sun C, Lin L, Wang X (2016) Learning natural language inference using bidirectional LSTM model and inner-attention
Google Scholar
Meng F, Lu Z, Wang M, Li H, Jiang W, Liu Q (2015) Encoding source language with convolutional neural network for machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (ACL-CoNLL), pp 20–30
Google Scholar
Nam H, Ha JW, Kim J (2017) Dual attention networks for multimodal reasoning and matching. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 299–307
Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL), pp 311–318
Google Scholar
Pedersoli M, Lucas T, Schmid C, Verbeek J (2017) Areas of attention for image captioning. In: International conference on computer vision (ICCV), pp 1–22
Google Scholar
Plummer B, Wang L, Cervantes C, Caicedo J, Hockenmaier J, Lazebnik S (2015) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: International conference on computer vision (ICCV)
Google Scholar
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 379–389
Google Scholar
Shin HC, Roberts K, Lu L, Demner-Fushman D, Yao J, Summers RM (2016) Learning to read chest X-rays: recurrent neural cascade model for automated image annotation. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2497–2506
Google Scholar
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700
Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164
Google Scholar
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2097–2106
Google Scholar
Wu Q, Wang P, Shen C, Dick A, van den Hengel A (2016) Ask me anything: free-form visual question answering based on knowledge from external sources. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1–5
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning (ICML), pp 2048–2057
Google Scholar
Yu D, Fu J, Mei T, Rui Y (2017) Multi-level attention networks for visual question answering. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
Google Scholar
Yulia WLLCC, Amir TS, Alan RFACD, Trancoso WBI (2015) Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 1367–1372
Google Scholar
Zhang Z, Chen P, Sapkota M, Yang L (2017) TandemNet: distilling knowledge from medical images using diagnostic reports as optional semantic references. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 320–328
Chapter Google Scholar
Zhang Z, Xie Y, Xing F, McGough M, Yang L (2017) MDNet: a semantically and visually interpretable medical image diagnosis network. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 6428–6436
Google Scholar
Zhu Y, Groth O, Bernstein M, Fei-Fei L (2016) Visual7W: grounded question answering in images. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar

Download references

Acknowledgements

This work was supported by the Intramural Research Programs of the NIH Clinical Center and National Library of Medicine. Thanks to Adam Harrison and Shazia Dharssi for proofreading the manuscript. We are also grateful to NVIDIA Corporation for the GPU donation.

Author information

Authors and Affiliations

Nvidia Corporation, Bethesda, MD, 20814, USA
Xiaosong Wang
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
Yifan Peng & Zhiyong Lu
PAII Inc., Bethesda Research Lab, 6720B Rockledge Drive, Ste 410, Bethesda, MD, 20817, USA
Le Lu
Johns Hopkins University, Baltimore, MD, USA
Le Lu
Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
Ronald M. Summers

Authors

Xiaosong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Peng
View author publications
You can also search for this author in PubMed Google Scholar
Le Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ronald M. Summers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaosong Wang .

Editor information

Editors and Affiliations

Bethesda Research Lab, PAII Inc., Bethesda, MD, USA
Le Lu
Nvidia Corporation, Bethesda, MD, USA
Xiaosong Wang
School of Computer Science, University of Adelaide, Adelaide, SA, Australia
Gustavo Carneiro
Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA
Lin Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M. (2019). Automatic Classification and Reporting of Multiple Common Thorax Diseases Using Chest Radiographs. In: Lu, L., Wang, X., Carneiro, G., Yang, L. (eds) Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-13969-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-13969-8_19
Published: 20 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13968-1
Online ISBN: 978-3-030-13969-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics