DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

Yang, Mingdong; Shi, Ying

doi:10.1007/978-3-030-36808-1_29

DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

Mingdong Yang⁹ &
Ying Shi⁹

Conference paper
First Online: 05 December 2019

2685 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Abstract

Real-time semantic segmentation is a challenging task in computer vision. Many researches emphasize real-time inference speed while neglecting segmentation quality. To tackle this problem, we propose a framework called DSMRSeg to achieve high-speed with high-accuracy result after training on only one GPU. Overall, we accomplish this by three core components: (1) Dual-Stage Feature Pyramid Network structure is designed to obtain richer multi-scale information and enhance the entire features hierarchy by bidirectionally propagating features with strong semantics and accurate localization. (2) Multi-Range Context Module is developed to expand receptive fields by aggregating the local dense features and multi-range context information. (3) Light-weight Feature Fusion Module is proposed to merge dual-stage features effectively. We evaluate DSMRSeg on Cityscapes, CamVid and BDD100K datasets and produce competitive results compared with the state-of-the-art methods. Specifically, DSMRSeg achieves 75.5% mIoU on Cityscapes test set, with speed of 40 FPS on one NVIDIA GTX1080 card for 1024 × 512 high-resolution image.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR, pp. 3431–3440 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Google Scholar
Siam, M., Gamal, M., Abdel-Razek, M., et al.: RTSeg: real-time semantic segmentation comparative study. In: Proceedings of ICIP, pp. 1603–1607 (2018)
Google Scholar
Gamal, M., Siam, M., Abdel-Razek, M.: ShuffleSeg: real-time semantic segmentation network (2018). arXiv preprint, arXiv:1803.03816
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of ECCV, pp. 6848–6856 (2018)
Google Scholar
Wu, T., Tang, S., Zhang, R., et al.: CGNet: a light-weight context guided network for semantic segmentation (2018). arXiv preprint, arXiv:1811.08201
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 561–580. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_34
Chapter Google Scholar
Romera, E., Alvarez, J.M., Bergasa, L.M., et al.: Efficient ConvNet for real-time semantic segmentation. In: Proceedings of IV, pp. 1789–1794 (2017)
Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Chapter Google Scholar
Poudel, R.P.K., Bonde, U., Liwicki, S., et al.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)
Google Scholar
Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network (2019). arXiv preprint, arXiv:1902.04502
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25
Chapter Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of CVPR, pp. 936–944 (2017)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., et al.: ENet: a deep neural network architecture for real-time semantic segmentation (2016). arXiv preprint arXiv:1606.02147
Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: Proceedings of CVPR, pp. 6230–6239 (2017)
Google Scholar
Li, H., Xiong, P., Fan, H., et al.: DFANet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of CVPR, pp. 9522–9531 (2019)
Google Scholar
Chen, L., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv (2016)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of CVPR, pp. 3146–3154 (2019)
Google Scholar
Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of CVPR, pp. 3213–3223 (2016)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Chapter Google Scholar
Yu, F., Xian, W., Chen, Y., et al.: BDD100K: a diverse driving video database with scalable annotation tooling (2018). arXiv preprint arXiv:1805.04687
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of CVPR, pp. 636–644 (2017)
Google Scholar
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of CVPR, pp. 4510–4520 (2018)
Google Scholar
Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. In: Proceedings of KDD, pp. 2486–2495(2018)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of CVPR, pp. 418–434 (2018)
Google Scholar

Download references

Acknowledgments

The work is supported by project grant of China (No. BE2016155).

Author information

Authors and Affiliations

School of Automation, Wuhan University of Technology, Wuhan, China
Mingdong Yang & Ying Shi

Authors

Mingdong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Shi .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, M., Shi, Y. (2019). DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-36808-1_29
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics