ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

Wang, Yu; Zhou, Quan; Xiong, Jian; Wu, Xiaofu; Jin, Xin

doi:10.1007/978-3-030-31723-2_4

Yu Wang¹⁶,
Quan Zhou¹⁶,
Jian Xiong¹⁶,
Xiaofu Wu¹⁶ &
…
Xin Jin^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2680 Accesses
36 Citations

Abstract

The recent years have witnessed great advances for semantic segmentation using deep convolutional neural networks (DCNNs). However, a large number of convolutional layers and feature channels lead to semantic segmentation as a computationally heavy task, which is disadvantage to the scenario with limited resources. In this paper, we design an efficient symmetric network, called (ESNet), to address this problem. The whole network has nearly symmetric architecture, which is mainly composed of a series of factorized convolution unit (FCU) and its parallel counterparts. On one hand, the FCU adopts a widely-used 1D factorized convolution in residual layers. On the other hand, the parallel version employs a transform-split-transform-merge strategy in the designment of residual module, where the split branch adopts dilated convolutions with different rate to enlarge receptive field. Our model has nearly 1.6M parameters, and is able to be performed over 62 FPS on a single GTX 1080Ti GPU. The experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off for real-time semantic segmentation on CityScapes dataset.

The first author is student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI 39, 640–651 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40, 834–848 (2018)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2016)
Google Scholar
Xiaoxiao, L., Zhiwei, L., Ping, L., Chenchange, L., Xiaoou, T.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: CVPR, pp. 6459–6468 (2017)
Google Scholar
Badrinarayanan, V., Alex, K., Roberto, C.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
Guosheng, L., Anton, M., Chunhua, S., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR, pp. 5168–5177 (2017)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Google Scholar
Peng, C., Xiangyu, Z., Gang, Y., Guiming, L., Jian, S.: Large kernel matters: improve semantic segmentation by global convolutional network. In: CVPR, pp. 1743–1751 (2017)
Google Scholar
Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI 40, 1352–1366 (2018)
Article Google Scholar
Cong, D., et al.: Can: contextual aggregating network for semantic segmentation. In: ICASSP (2019, accepted)
Google Scholar
Wu, T.Y., Tang, S., Zhang, R., Zhang, Y.D.: CGNet: a light-weight context guided network for semantic segmentation. arXiv preprint arXiv:1811.08201v1 (2018)
Treml, M., et al.: Speeding up semantic segmentation for autonomous driving. In: NIPS Workshop, pp. 1–7 (2016)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv preprint arXiv:1803.06815v3 (2018)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19, 263–272 (2018)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35, 1915–1929 (2013)
Article Google Scholar
Panqu, W., et al.: Understanding convolution for semantic segmentation. In: WACV, pp. 1451–1460 (2018)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR, pp. 3309–3318 (2017)
Google Scholar
Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback refinement network for dense image labeling. In: CVPR, pp. 4877–4885 (2017)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)
Article Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Google Scholar
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR, pp. 5168–5177 (2016)
Google Scholar
Xie, X., Girshick, R., Dollar, P., Tu, Z.W., He, K.M.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995 (2017)
Google Scholar
Changqian, Y., Jingbo, W., Chao, P., Changxin, G., Gang, Y., Nong, S.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. arXiv preprint arXiv:1808.00897 (2018)
Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: ICNet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545v2 (2018)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)
Google Scholar
Zhang, X., Cheny, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE TII (2019, accepted)
Google Scholar

Download references

Acknowledgments

The authors would like to thank all the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant No. 61876093, 61701258, 61701252, 61671253), Natural Science Foundation of Jiangsu Province (Grant No. BK20181393, BK20170906), Natural Science Foundation of Guizhou Province (Grant No. [2017] 1130), and Huawei Innovation Research Program (HIRP2018).

Author information

Authors and Affiliations

National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing, People’s Republic of China
Yu Wang, Quan Zhou, Jian Xiong & Xiaofu Wu
Beijing Electronic Science and Technology Institute, Beijing, People’s Republic of China
Xin Jin
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, People’s Republic of China
Xin Jin

Authors

Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jian Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Zhou .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X. (2019). ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-31723-2_4
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics