ARVBNet: Real-Time Detection of Anatomical Structures in Fetal Ultrasound Cardiac Four-Chamber Planes

  • Jinbao Dong
  • Shengfeng LiuEmail author
  • Tianfu Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11794)


The quality assessment of ultrasound images is essential for prenatal diagnosis, in which detection of anatomical structures is the first and most important step in quality assessment. In clinical practice, it is usually done manually, which is experience-dependent, labor-extensive and time-consuming, as well as suffering from high inter- and intra-observer variability. In this paper, we propose a novel real-time detection model, named aggregated residual visual block network (ARVBNet), to accomplish automatic detection of anatomical structures in cardiac four-chamber plane (CFP) of fetal ultrasound images. Experiments on 1991 fetal ultrasound CFPs demonstrate the proposed network achieves state-of-the-art performance of 93.52% mean average precision (mAP) and a test speed of 101 frame-per-second (FPs). In addition, an extended experiment on the Pascal VOC dataset achieves state-of-the-art performance of 81.2% mAP as well, demonstrating the adaptability and generality of our proposed model.


Anatomical structure detection Aggregated residual visual block (ARVB) network Ultrasound cardiac four-chamber plane Deep learning 



This work is supported partly by National Natural Science Foundation of China (No. 61871274, 61801305 and 81571758), National Natural Science Foundation of Guangdong Province (No. 2017A030313377), Guangdong Pearl River Talents Plan (2016ZT06S220), Shenzhen Peacock Plan (No. KQTD2016053112051497 and KQTD2015033016104926), and Shenzhen Key Basic Research Project (No. JCYJ20170413152804728, JCYJ20180507184647636, JCYJ20170818142347251 and JCYJ20170818094109846).


  1. 1.
    Liu, S., Wang, Y., Yang, X., Lei, B., Liu, L., Li, S.X., et al.: Deep learning in medical ultrasound analysis: a review. Engineering 5(2), 261–275 (2019)CrossRefGoogle Scholar
  2. 2.
    Ville, Y.: ‘Ceci n’est pas une échographie’: a plea for quality assessment in prenatal ultrasound. Ultrasound Obstet. Gynecol. 31(1), 1–5 (2008)CrossRefGoogle Scholar
  3. 3.
    Maraci, M.A., Bridge, C.P., Napolitano, R., Papageorghiou, A., Noble, J.A.: A framework for analysis of linear ultrasound videos to detect fetal presentation and heartbeat. Med. Image Anal. 37, 22–36 (2017)CrossRefGoogle Scholar
  4. 4.
    Wu, L., Cheng, J.Z., Li, S., Lei, B., Wang, T., Ni, D.: FUIQA: fetal ultrasound image quality assessment with deep convolutional networks. IEEE Trans. Cybern. 47(5), 1336–1349 (2017)CrossRefGoogle Scholar
  5. 5.
    Wandell, B.A., Winawer, J.: Computational neuroimaging and population receptive fields. Trends Cogn. Sci. 19(6), 349–357 (2015)CrossRefGoogle Scholar
  6. 6.
    Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). Scholar
  7. 7.
    Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 951–959 (2017)Google Scholar
  8. 8.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  9. 9.
    He, K.., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  10. 10.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  11. 11.
    Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
  12. 12.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
  13. 13.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)Google Scholar
  14. 14.
    Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587 (2017)
  15. 15.
    Fu, C.-Y., Liu, W., Ranga, A, Tyagi, A, Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  16. 16.
    Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1937–1945 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Biomedical Engineering, Health Science CenterShenzhen UniversityShenzhenChina

Personalised recommendations