Abstract
We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernardi, R., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. In: IJCAI, pp. 4970–4974 (2017)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734. ACL (2014)
Department of Engineering Science, University of Oxford: PULSE. https://www.eng.ox.ac.uk/pulse/
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)
Goodfellow, I., et al.: Deep Learning (2016)
Google Cloud: Cloud Speech-to-Text. cloud.google.com/speech-to-text/
Google Code Archive: Word2Vec (2013). code.google.com/archive/p/word2vec/
GrammarBot: Grammar Check API. https://www.grammarbot.io/
Hochreiter, S., Schmidhuber, J.: Long short-term memory. NC 9(8), 1735–1780 (1997)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)
Lyndon, D., et al.: Neural captioning for the Image CLEF 2017 medical image challenges. In: CEUR Workshop Proceedings, vol. 1866 (2017)
McCarthy, P.M., Jarvis, S.: MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment. Behav. Res. Methods 42(2), 381–392 (2010)
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Ordonez, V., et al.: Im2Text: describing images using 1 million captioned photographs. In: Advances in NIPS, pp. 1143–1151 (2011)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, pp. 311–318. ACL (2002)
Pennington, et al.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sloetjes, H., Wittenburg, P.: Annotation by category-ELAN and ISO DCR. In: LREC (2008)
Tanti, M., et al.: What is the role of recurrent neural networks (RNNs) in an image caption generator? ACL, pp. 51–60 (2017)
Tanti, M., et al.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on CVPR, pp. 3156–3164 (2015)
You, Q., et al.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on CVPR, pp. 4651–4659 (2016)
Zeng, X.H., et al.: Understanding and generating ultrasound image description. J. Comput. Sci. Technol. 33(5), 1086–1100 (2018)
Acknowledgement
We acknowledge the ERC (ERC-ADG-2015 694 project PULSE), the EPSRC (EP/MO13774/1), the Rhodes Trust, and the NIHR BRC funding scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alsharid, M., Sharma, H., Drukker, L., Chatelain, P., Papageorghiou, A.T., Noble, J.A. (2019). Captioning Ultrasound Images Automatically. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-32251-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)