ARVBNet: Real-Time Detection of Anatomical Structures in Fetal Ultrasound Cardiac Four-Chamber Planes
The quality assessment of ultrasound images is essential for prenatal diagnosis, in which detection of anatomical structures is the first and most important step in quality assessment. In clinical practice, it is usually done manually, which is experience-dependent, labor-extensive and time-consuming, as well as suffering from high inter- and intra-observer variability. In this paper, we propose a novel real-time detection model, named aggregated residual visual block network (ARVBNet), to accomplish automatic detection of anatomical structures in cardiac four-chamber plane (CFP) of fetal ultrasound images. Experiments on 1991 fetal ultrasound CFPs demonstrate the proposed network achieves state-of-the-art performance of 93.52% mean average precision (mAP) and a test speed of 101 frame-per-second (FPs). In addition, an extended experiment on the Pascal VOC dataset achieves state-of-the-art performance of 81.2% mAP as well, demonstrating the adaptability and generality of our proposed model.
KeywordsAnatomical structure detection Aggregated residual visual block (ARVB) network Ultrasound cardiac four-chamber plane Deep learning
This work is supported partly by National Natural Science Foundation of China (No. 61871274, 61801305 and 81571758), National Natural Science Foundation of Guangdong Province (No. 2017A030313377), Guangdong Pearl River Talents Plan (2016ZT06S220), Shenzhen Peacock Plan (No. KQTD2016053112051497 and KQTD2015033016104926), and Shenzhen Key Basic Research Project (No. JCYJ20170413152804728, JCYJ20180507184647636, JCYJ20170818142347251 and JCYJ20170818094109846).
- 7.Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 951–959 (2017)Google Scholar
- 9.He, K.., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
- 11.Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
- 12.Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
- 13.Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)Google Scholar
- 14.Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587 (2017)
- 15.Fu, C.-Y., Liu, W., Ranga, A, Tyagi, A, Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
- 16.Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1937–1945 (2017)Google Scholar