Abstract
Real-time semantic segmentation is a challenging task in computer vision. Many researches emphasize real-time inference speed while neglecting segmentation quality. To tackle this problem, we propose a framework called DSMRSeg to achieve high-speed with high-accuracy result after training on only one GPU. Overall, we accomplish this by three core components: (1) Dual-Stage Feature Pyramid Network structure is designed to obtain richer multi-scale information and enhance the entire features hierarchy by bidirectionally propagating features with strong semantics and accurate localization. (2) Multi-Range Context Module is developed to expand receptive fields by aggregating the local dense features and multi-range context information. (3) Light-weight Feature Fusion Module is proposed to merge dual-stage features effectively. We evaluate DSMRSeg on Cityscapes, CamVid and BDD100K datasets and produce competitive results compared with the state-of-the-art methods. Specifically, DSMRSeg achieves 75.5% mIoU on Cityscapes test set, with speed of 40 FPS on one NVIDIA GTX1080 card for 1024 × 512 high-resolution image.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR, pp. 3431–3440 (2015)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Siam, M., Gamal, M., Abdel-Razek, M., et al.: RTSeg: real-time semantic segmentation comparative study. In: Proceedings of ICIP, pp. 1603–1607 (2018)
Gamal, M., Siam, M., Abdel-Razek, M.: ShuffleSeg: real-time semantic segmentation network (2018). arXiv preprint, arXiv:1803.03816
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of ECCV, pp. 6848–6856 (2018)
Wu, T., Tang, S., Zhang, R., et al.: CGNet: a light-weight context guided network for semantic segmentation (2018). arXiv preprint, arXiv:1811.08201
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 561–580. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_34
Romera, E., Alvarez, J.M., Bergasa, L.M., et al.: Efficient ConvNet for real-time semantic segmentation. In: Proceedings of IV, pp. 1789–1794 (2017)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Poudel, R.P.K., Bonde, U., Liwicki, S., et al.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)
Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network (2019). arXiv preprint, arXiv:1902.04502
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of CVPR, pp. 936–944 (2017)
Paszke, A., Chaurasia, A., Kim, S., et al.: ENet: a deep neural network architecture for real-time semantic segmentation (2016). arXiv preprint arXiv:1606.02147
Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: Proceedings of CVPR, pp. 6230–6239 (2017)
Li, H., Xiong, P., Fan, H., et al.: DFANet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of CVPR, pp. 9522–9531 (2019)
Chen, L., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv (2016)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of CVPR, pp. 3146–3154 (2019)
Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of CVPR, pp. 3213–3223 (2016)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Yu, F., Xian, W., Chen, Y., et al.: BDD100K: a diverse driving video database with scalable annotation tooling (2018). arXiv preprint arXiv:1805.04687
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of CVPR, pp. 636–644 (2017)
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of CVPR, pp. 4510–4520 (2018)
Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. In: Proceedings of KDD, pp. 2486–2495(2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of CVPR, pp. 418–434 (2018)
Acknowledgments
The work is supported by project grant of China (No. BE2016155).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, M., Shi, Y. (2019). DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-36808-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)