Skip to main content

ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Included in the following conference series:

Abstract

The recent years have witnessed great advances for semantic segmentation using deep convolutional neural networks (DCNNs). However, a large number of convolutional layers and feature channels lead to semantic segmentation as a computationally heavy task, which is disadvantage to the scenario with limited resources. In this paper, we design an efficient symmetric network, called (ESNet), to address this problem. The whole network has nearly symmetric architecture, which is mainly composed of a series of factorized convolution unit (FCU) and its parallel counterparts. On one hand, the FCU adopts a widely-used 1D factorized convolution in residual layers. On the other hand, the parallel version employs a transform-split-transform-merge strategy in the designment of residual module, where the split branch adopts dilated convolutions with different rate to enlarge receptive field. Our model has nearly 1.6M parameters, and is able to be performed over 62 FPS on a single GTX 1080Ti GPU. The experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off for real-time semantic segmentation on CityScapes dataset.

The first author is student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

    Google Scholar 

  4. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI 39, 640–651 (2017)

    Article  Google Scholar 

  5. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40, 834–848 (2018)

    Article  Google Scholar 

  6. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2016)

    Google Scholar 

  7. Xiaoxiao, L., Zhiwei, L., Ping, L., Chenchange, L., Xiaoou, T.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: CVPR, pp. 6459–6468 (2017)

    Google Scholar 

  8. Badrinarayanan, V., Alex, K., Roberto, C.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)

  9. Guosheng, L., Anton, M., Chunhua, S., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR, pp. 5168–5177 (2017)

    Google Scholar 

  10. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)

    Google Scholar 

  11. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)

    Google Scholar 

  12. Peng, C., Xiangyu, Z., Gang, Y., Guiming, L., Jian, S.: Large kernel matters: improve semantic segmentation by global convolutional network. In: CVPR, pp. 1743–1751 (2017)

    Google Scholar 

  13. Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI 40, 1352–1366 (2018)

    Article  Google Scholar 

  14. Cong, D., et al.: Can: contextual aggregating network for semantic segmentation. In: ICASSP (2019, accepted)

    Google Scholar 

  15. Wu, T.Y., Tang, S., Zhang, R., Zhang, Y.D.: CGNet: a light-weight context guided network for semantic segmentation. arXiv preprint arXiv:1811.08201v1 (2018)

  16. Treml, M., et al.: Speeding up semantic segmentation for autonomous driving. In: NIPS Workshop, pp. 1–7 (2016)

    Google Scholar 

  17. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)

  18. Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv preprint arXiv:1803.06815v3 (2018)

  19. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19, 263–272 (2018)

    Google Scholar 

  20. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)

    Google Scholar 

  21. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35, 1915–1929 (2013)

    Article  Google Scholar 

  22. Panqu, W., et al.: Understanding convolution for semantic segmentation. In: WACV, pp. 1451–1460 (2018)

    Google Scholar 

  23. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)

  24. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  25. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  26. Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR, pp. 3309–3318 (2017)

    Google Scholar 

  27. Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback refinement network for dense image labeling. In: CVPR, pp. 4877–4885 (2017)

    Google Scholar 

  28. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)

    Article  Google Scholar 

  29. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  30. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  31. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)

    Google Scholar 

  32. Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR, pp. 5168–5177 (2016)

    Google Scholar 

  33. Xie, X., Girshick, R., Dollar, P., Tu, Z.W., He, K.M.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995 (2017)

    Google Scholar 

  34. Changqian, Y., Jingbo, W., Chao, P., Changxin, G., Gang, Y., Nong, S.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. arXiv preprint arXiv:1808.00897 (2018)

  35. Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: ICNet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545v2 (2018)

  36. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  37. Zhang, X., Cheny, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE TII (2019, accepted)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank all the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant No. 61876093, 61701258, 61701252, 61671253), Natural Science Foundation of Jiangsu Province (Grant No. BK20181393, BK20170906), Natural Science Foundation of Guizhou Province (Grant No. [2017] 1130), and Huawei Innovation Research Program (HIRP2018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X. (2019). ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31723-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31722-5

  • Online ISBN: 978-3-030-31723-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics