A Cross-Modality Neural Network Transform for Semi-automatic Medical Image Annotation

  • Mehdi MoradiEmail author
  • Yufan Guo
  • Yaniv Gur
  • Mohammadreza Negahdar
  • Tanveer Syeda-Mahmood
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9901)


There is a pressing need in the medical imaging community to build large scale datasets that are annotated with semantic descriptors. Given the cost of expert produced annotations, we propose an automatic methodology to produce semantic descriptors for images. These can then be used as weakly labeled instances or reviewed and corrected by clinicians. Our solution is in the form of a neural network that maps a given image to a new space formed by a large number of text paragraphs written about similar, but different images, by a human expert. We then extract semantic descriptors from the text paragraphs closest to the output of the transform network to describe the input image. We used deep learning to learn mappings between images/texts and their corresponding fixed size spaces, but a shallow network as the transform between the image and text spaces. This limits the complexity of the transform model and reduces the amount of data, in the form of image and text pairs, needed for training it. We report promising results for the proposed model in automatic descriptor generation in the case of Doppler images of cardiac valves and show that the system catches up to 91 % of the disease instances and 77 % of disease severity modifiers.


Optical Character Recognition Semantic Descriptor Valve Type Text Report Text Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bar, Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: SPIE, Medical Imaging 2015 (2015)Google Scholar
  2. 2.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)Google Scholar
  3. 3.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  4. 4.
    Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)Google Scholar
  5. 5.
    Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Understanding and generating image descriptions. In: CVPR (2011)Google Scholar
  6. 6.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  7. 7.
    Maier-Hein, L., Mersmann, S., Kondermann, D., Bodenstedt, S., Sanchez, A., Stock, C., Kenngott, H.G., Eisenmann, M., Speidel, S.: Can masses of non-experts train highly accurate image classifiers? In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 438–445. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10470-6_55 Google Scholar
  8. 8.
    Moradi, M., et al.: A hybrid learning approach for semantic labeling of cardiac CT slices and recognition of body position. In: IEEE ISBI, pp. 1418–1421 (2016)Google Scholar
  9. 9.
    Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015)Google Scholar
  10. 10.
    Rodrguez, A.F., Muller, H.: Ground truth generation in medical imaging: a crowdsourcing-based iterative approach. In: Proceedings of the ACM Workshop on Crowdsourcing for Multimedia (2012)Google Scholar
  11. 11.
    Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016)CrossRefGoogle Scholar
  12. 12.
    Syeda-Mahmood, T., Chiticariu, L.: Extraction of information from clinical reports 29 Aug 2013., US Patent App. 13/408,906
  13. 13.
    Wang, F., Syeda-Mahmood, T., Beymer, D.: Information extraction from multimodal ECG documents. In: ICDAR, pp. 381–385 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Mehdi Moradi
    • 1
    Email author
  • Yufan Guo
    • 1
  • Yaniv Gur
    • 1
  • Mohammadreza Negahdar
    • 1
  • Tanveer Syeda-Mahmood
    • 1
  1. 1.IBM Almaden Research CenterSan JoseUSA

Personalised recommendations