Skip to main content

DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Abstract

Real-time semantic segmentation is a challenging task in computer vision. Many researches emphasize real-time inference speed while neglecting segmentation quality. To tackle this problem, we propose a framework called DSMRSeg to achieve high-speed with high-accuracy result after training on only one GPU. Overall, we accomplish this by three core components: (1) Dual-Stage Feature Pyramid Network structure is designed to obtain richer multi-scale information and enhance the entire features hierarchy by bidirectionally propagating features with strong semantics and accurate localization. (2) Multi-Range Context Module is developed to expand receptive fields by aggregating the local dense features and multi-range context information. (3) Light-weight Feature Fusion Module is proposed to merge dual-stage features effectively. We evaluate DSMRSeg on Cityscapes, CamVid and BDD100K datasets and produce competitive results compared with the state-of-the-art methods. Specifically, DSMRSeg achieves 75.5% mIoU on Cityscapes test set, with speed of 40 FPS on one NVIDIA GTX1080 card for 1024 × 512 high-resolution image.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  2. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)

    Google Scholar 

  3. Siam, M., Gamal, M., Abdel-Razek, M., et al.: RTSeg: real-time semantic segmentation comparative study. In: Proceedings of ICIP, pp. 1603–1607 (2018)

    Google Scholar 

  4. Gamal, M., Siam, M., Abdel-Razek, M.: ShuffleSeg: real-time semantic segmentation network (2018). arXiv preprint, arXiv:1803.03816

  5. Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of ECCV, pp. 6848–6856 (2018)

    Google Scholar 

  6. Wu, T., Tang, S., Zhang, R., et al.: CGNet: a light-weight context guided network for semantic segmentation (2018). arXiv preprint, arXiv:1811.08201

  7. Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 561–580. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_34

    Chapter  Google Scholar 

  8. Romera, E., Alvarez, J.M., Bergasa, L.M., et al.: Efficient ConvNet for real-time semantic segmentation. In: Proceedings of IV, pp. 1789–1794 (2017)

    Google Scholar 

  9. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20

    Chapter  Google Scholar 

  10. Poudel, R.P.K., Bonde, U., Liwicki, S., et al.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)

    Google Scholar 

  11. Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network (2019). arXiv preprint, arXiv:1902.04502

  12. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25

    Chapter  Google Scholar 

  13. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  14. Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of CVPR, pp. 936–944 (2017)

    Google Scholar 

  15. Paszke, A., Chaurasia, A., Kim, S., et al.: ENet: a deep neural network architecture for real-time semantic segmentation (2016). arXiv preprint arXiv:1606.02147

  16. Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: Proceedings of CVPR, pp. 6230–6239 (2017)

    Google Scholar 

  17. Li, H., Xiong, P., Fan, H., et al.: DFANet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of CVPR, pp. 9522–9531 (2019)

    Google Scholar 

  18. Chen, L., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv (2016)

    Google Scholar 

  19. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587

  20. Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of CVPR, pp. 3146–3154 (2019)

    Google Scholar 

  21. Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of CVPR, pp. 3213–3223 (2016)

    Google Scholar 

  22. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5

    Chapter  Google Scholar 

  23. Yu, F., Xian, W., Chen, Y., et al.: BDD100K: a diverse driving video database with scalable annotation tooling (2018). arXiv preprint arXiv:1805.04687

  24. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of CVPR, pp. 636–644 (2017)

    Google Scholar 

  25. Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of CVPR, pp. 4510–4520 (2018)

    Google Scholar 

  26. Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. In: Proceedings of KDD, pp. 2486–2495(2018)

    Google Scholar 

  27. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of CVPR, pp. 418–434 (2018)

    Google Scholar 

Download references

Acknowledgments

The work is supported by project grant of China (No. BE2016155).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, M., Shi, Y. (2019). DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36808-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36807-4

  • Online ISBN: 978-3-030-36808-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics