Detection and Characterization of the Fetal Heartbeat in Free-hand Ultrasound Sweeps with Weakly-supervised Two-streams Convolutional Networks

  • Yuan GaoEmail author
  • J. Alison Noble
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10434)


Assessment of fetal cardiac activity is essential to confirm pregnancy viability in obstetric ultrasound. However, automated detection and localization of a beating fetal heart, in free-hand ultrasound sweeps, is a very challenging task, due to high variation in heart appearance, scale and position (because of heart deformation, scanning orientations and artefacts). In this paper, we present a two-stream Convolutional Network (ConvNet) -a temporal sequence learning model- that recognizes heart frames and localizes the heart using only weak supervision. Our contribution is three-fold: (i) to the best of our knowledge, this is the first work to use two-stream spatio-temporal ConvNets in analysis of free-hand fetal ultrasound videos. The model is compact, and can be trained end-to-end with only image level labels, (ii) the model enforces rotation invariance, which does not require additional augmentation in the training data, and (iii) the model is particularly robust for heart detection, which is important in our application where there can be additional distracting textures, such as acoustic shadows. Our results demonstrate that the proposed two-stream ConvNet architecture significantly outperforms single stream spatial ConvNets (90.3% versus 74.9%), in terms of heart identification.


Two-stream ConvNet Weakly supervised detection Fetal heart Free-hand ultrasound video 



The authors acknowledge the China Scholarship Council (CSC) for Doctotral Training Award (grant No. 201408060107) and the RCUK CDT in Healthcare Innovation.


  1. 1.
    Bridge, C.P., Ioannou, C., Noble, J.A.: Automated annotation and quantitative description of ultrasound videos of the fetal heart. Med. Image Anal. 36, 147–161 (2017)CrossRefGoogle Scholar
  2. 2.
    Karen, S., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)Google Scholar
  3. 3.
    Teney, D., Hebert, M.: Learning to extract motion from videos in convolutional neural networks. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 412–428. Springer, Cham (2017). doi: 10.1007/978-3-319-54193-8_26CrossRefGoogle Scholar
  4. 4.
    Baumgartner, C.F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, B., Rueckert, D.: Real-time standard scan plane detection and localisation in fetal ultrasound using fully convolutional neural networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 203–211. Springer, Cham (2016). doi: 10.1007/978-3-319-46723-8_24CrossRefGoogle Scholar
  5. 5.
    Gao, Y., Maraci, M.A., Noble, J.A.: Describing ultrasound video content using deep convolutional neural networks. In: ISBI (2016)Google Scholar
  6. 6.
    Chen, H., Dou, Q., Ni, D., Cheng, J.-Z., Qin, J., Li, S., Heng, P.-A.: Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 507–514. Springer, Cham (2015). doi: 10.1007/978-3-319-24553-9_62CrossRefGoogle Scholar
  7. 7.
    Maraci, M.A., et al.: A framework for analysis of linear ultrasound videos to detect fetal presentation and heartbeat. Med. Image Anal. 37, 22–36 (2017)CrossRefGoogle Scholar
  8. 8.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2014)Google Scholar
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR (2015)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  11. 11.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional networks. In: BMVC (2014)Google Scholar
  12. 12.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR Workshop (2014)Google Scholar
  13. 13.
    Springenberg, J., et al.: Striving for simplicity: the all convolutional net. In: ICLR Workshop (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Biomedical Image Analysis Group, Department of Engineering Science, Institute of Biomedical EngineeringUniversity of OxfordOxfordUK

Personalised recommendations