Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video

  • Weilin HuangEmail author
  • Christopher P. Bridge
  • J. Alison Noble
  • Andrew Zisserman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10434)


We present an automatic method to describe clinically useful information about scanning, and to guide image interpretation in ultrasound (US) videos of the fetal heart. Our method is able to jointly predict the visibility, viewing plane, location and orientation of the fetal heart at the frame level. The contributions of the paper are three-fold: (i) a convolutional neural network architecture is developed for a multi-task prediction, which is computed by sliding a \(3 \times 3\) window spatially through convolutional maps. (ii) an anchor mechanism and Intersection over Union (IoU) loss are applied for improving localization accuracy. (iii) a recurrent architecture is designed to recursively compute regional convolutional features temporally over sequential frames, allowing each prediction to be conditioned on the whole video. This results in a spatial-temporal model that precisely describes detailed heart parameters in challenging US videos. We report results on a real-world clinical dataset, where our method achieves performance on par with expert annotations.


Intersection Over Union (IoU) Fetal Heart Anchor Mechanism Detailed Heart Convolved Functions 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the EPSRC Programme Grant Seebibyte (EP/M013774/1).


  1. 1.
    Bridge, C.P., Ioannoub, C., Noblea, J.A.: Automated annotation and quantitative description of ultrasound videos of the fetal heart. Med. Image Anal. 36, 147–161 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, H., Dou, Q., Ni, D., Cheng, J.-Z., Qin, J., Li, S., Heng, P.-A.: Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 507–514. Springer, Cham (2015). doi: 10.1007/978-3-319-24553-9_62CrossRefGoogle Scholar
  3. 3.
    Gao, Y., Maraci, M.A., Noble, J.A.: Describing ultrasound video content using deep convolutional neural networks. In: ISBI (2016)Google Scholar
  4. 4.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18, 602–610 (2005)CrossRefGoogle Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  6. 6.
    Kwitta, R., Vasconcelosb, N., Razzaquec, S., Aylwarda, S.: Localizing target structures in ultrasound video a phantom study. Med. Image Anal. 17, 712–722 (2013)CrossRefGoogle Scholar
  7. 7.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  8. 8.
    Maraci, M.A., Bridge, C.P., Napolitano, R., Papageorghiou, A., Noble, J.A.: A framework for analysis of linear ultrasound videos to detect fetal presentation and heartbeat. Med. Image Anal. 37, 22–36 (2017)CrossRefGoogle Scholar
  9. 9.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  10. 10.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  11. 11.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM (2016)Google Scholar
  12. 12.
    Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 543–559. Springer, Cham (2016). doi: 10.1007/978-3-319-46493-0_33CrossRefGoogle Scholar
  13. 13.
    Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: CVPR (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Weilin Huang
    • 1
    Email author
  • Christopher P. Bridge
    • 1
  • J. Alison Noble
    • 1
  • Andrew Zisserman
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations