Abstract
The recent years have witnessed great advances for semantic segmentation using deep convolutional neural networks (DCNNs). However, a large number of convolutional layers and feature channels lead to semantic segmentation as a computationally heavy task, which is disadvantage to the scenario with limited resources. In this paper, we design an efficient symmetric network, called (ESNet), to address this problem. The whole network has nearly symmetric architecture, which is mainly composed of a series of factorized convolution unit (FCU) and its parallel counterparts. On one hand, the FCU adopts a widely-used 1D factorized convolution in residual layers. On the other hand, the parallel version employs a transform-split-transform-merge strategy in the designment of residual module, where the split branch adopts dilated convolutions with different rate to enlarge receptive field. Our model has nearly 1.6M parameters, and is able to be performed over 62 FPS on a single GTX 1080Ti GPU. The experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off for real-time semantic segmentation on CityScapes dataset.
The first author is student.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI 39, 640–651 (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40, 834–848 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2016)
Xiaoxiao, L., Zhiwei, L., Ping, L., Chenchange, L., Xiaoou, T.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: CVPR, pp. 6459–6468 (2017)
Badrinarayanan, V., Alex, K., Roberto, C.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
Guosheng, L., Anton, M., Chunhua, S., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR, pp. 5168–5177 (2017)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Peng, C., Xiangyu, Z., Gang, Y., Guiming, L., Jian, S.: Large kernel matters: improve semantic segmentation by global convolutional network. In: CVPR, pp. 1743–1751 (2017)
Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI 40, 1352–1366 (2018)
Cong, D., et al.: Can: contextual aggregating network for semantic segmentation. In: ICASSP (2019, accepted)
Wu, T.Y., Tang, S., Zhang, R., Zhang, Y.D.: CGNet: a light-weight context guided network for semantic segmentation. arXiv preprint arXiv:1811.08201v1 (2018)
Treml, M., et al.: Speeding up semantic segmentation for autonomous driving. In: NIPS Workshop, pp. 1–7 (2016)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv preprint arXiv:1803.06815v3 (2018)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19, 263–272 (2018)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35, 1915–1929 (2013)
Panqu, W., et al.: Understanding convolution for semantic segmentation. In: WACV, pp. 1451–1460 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR, pp. 3309–3318 (2017)
Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback refinement network for dense image labeling. In: CVPR, pp. 4877–4885 (2017)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR, pp. 5168–5177 (2016)
Xie, X., Girshick, R., Dollar, P., Tu, Z.W., He, K.M.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995 (2017)
Changqian, Y., Jingbo, W., Chao, P., Changxin, G., Gang, Y., Nong, S.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. arXiv preprint arXiv:1808.00897 (2018)
Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: ICNet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545v2 (2018)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)
Zhang, X., Cheny, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE TII (2019, accepted)
Acknowledgments
The authors would like to thank all the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant No. 61876093, 61701258, 61701252, 61671253), Natural Science Foundation of Jiangsu Province (Grant No. BK20181393, BK20170906), Natural Science Foundation of Guizhou Province (Grant No. [2017] 1130), and Huawei Innovation Research Program (HIRP2018).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X. (2019). ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-31723-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)