Skip to main content

Constructing a Convolutional Neural Network with a Suitable Capacity for a Semantic Segmentation Task

  • Chapter
  • First Online:
Deep Learning: Concepts and Architectures

Part of the book series: Studies in Computational Intelligence ((SCI,volume 866))

  • 3068 Accesses

Abstract

Although the state-of-the-art performance has been achieved in many computer vision tasks such as image classification , object detection, saliency prediction and depth estimation, Convolutional Neural Networks (CNNs) still perform unsatisfactorily in some difficult tasks such as human parsing which is the focus of our research. The inappropriate capacity of a CNN model and insufficient training data both contribute to the failure in perceiving the semantic information of detailed regions. The feature representations learned by a high-capacity model cannot generalize to the variations in viewpoints, human poses and occlusions in real-world scenarios due to overfitting . On the other hand, the under-fitting problem prevents a low-capacity model from developing the representations which are sufficiently expressive. In this chapter, we propose an approach to estimate the complexity of a task and match the capacity of a CNN model to the complexity of a task while avoiding under-fitting and overfitting . Firstly, a novel training scheme is proposed to fully explore the potential of low-capacity CNN models. The scheme outperforms existing end-to-end training schemes and enables low-capacity models to outperform models with higher capacity. Secondly, three methods are proposed to optimize the capacity of a CNN model on a task. The first method is based on improving the orthogonality among kernels which contributes to higher computational efficiency and better performance. In the second method, the convolutional kernels within each layer are evaluated according to their semantic functions and contributions to the training and test accuracy. The kernels which only contribute to the training accuracy but has no effect on the testing accuracy are removed to avoid overfitting . In the third method, the capacity of a CNN model is optimized by adjusting the dependency among convolutional kernels. A novel structure of convolutional layers is proposed to reduce the number of parameters while maintaining the similar performance. Besides capacity optimization , we further propose a method to evaluate the complexity of a human parsing task. An independent CNN model is trained for this purpose using the labels for pose estimation. The evaluation on complexity is achieved based on estimated pose information in images. The proposed scheme for complexity evaluation was conducted on the Pascal Person Part dataset and the Look into Person dataset which are for human parsing. The schemes for capacity optimization were conducted on our models for human parsing which were trained on the two data sets. Both quantitative and qualitative results demonstrate that our proposed algorithms can match the capacity of a CNN model well to the complexity of a task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee, J., Xiao, L., Schoenholz, S.S., Bahri, Y., Sohl-Dickstein, J., Pennington, J.: Wide neural networks of any depth evolve as linear models under gradient descent (2019). arXiv preprint arXiv:1902.06720

  2. Rolnick, D., Tegmark, M.: The power of deeper networks for expressing natural functions. In: International Conference on Learning Representations (2018)

    Google Scholar 

  3. Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S.S., Pennington, J.: Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks (2018). arXiv preprint arXiv:1806.05393

  4. Pierre, B., Roman, V.: Neuronal capacity. In: NIPS (2018)

    Google Scholar 

  5. Lei, N., Luo, Z., Yau, S.T., Gu, D.X.: Geometric understanding of deep learning (2018). arXiv preprint arXiv:1805.10451

  6. Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. In: Unpublished manuscript (2010)

    Google Scholar 

  7. Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Large scale visual recognition challenge 2012. Available: http://www.image-net.org/challenges/ILSVRC/2012/ (2012)

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  9. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  10. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. pp. 3 (2017)

    Google Scholar 

  13. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995 (2017)

    Google Scholar 

  14. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv preprint arXiv:1709.01507

  15. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2017). arXiv preprint arXiv:1707.07012

  16. Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.J., Li, F.F., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search (2017). arXiv preprint arXiv:1712.00559

  17. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). arXiv preprint arXiv:1802.01548

  18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)

    Google Scholar 

  19. He, K., Zhang, X., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016)

    Chapter  Google Scholar 

  20. Simon, M., Rodner, E., Denzler, J.: ImageNet pre-trained models with batch normalization (2016). arXiv preprint arXiv:1612.01452, https://github.com/cvjena/cnn-models

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks (2015). Available: https://github.com/KaimingHe/deep-residual-networks

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks with 1 K layers (2016). Available: https://github.com/KaimingHe/resnet-1k-layers

  23. He, K., Zhang, X., Ren, S., Sun, J.: Trained ResNet torch models (2016). Available: https://github.com/facebook/fb.resnet.torch/tree/master/pretrained

  24. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)

    Google Scholar 

  25. Kurakin, A., Goodfellow, I., Bengio, S., Dong, Y., Liao, F., Liang, M., Liang, J.: Adversarial attacks and defences competition (2018). arXiv preprint arXiv:1804.00097

    Chapter  Google Scholar 

  26. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual object classes challenge 2012 (VOC2012) results. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (2012)

  27. Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate (2019). arXiv preprint arXiv:1902.09843

  28. Anonymous: Shallow learning for deep networks. In: Under double-blind review (2018)

    Google Scholar 

  29. Zhou, Z.H.: A brief introduction to weakly supervised learning. Nat. Sci. Rev. 5(1), 44–53 (2017)

    Article  Google Scholar 

  30. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)

    Google Scholar 

  31. Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (2018)

    Google Scholar 

  32. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft coco: common objects in context. In: Proceedings of ECCV (2014)

    Google Scholar 

  33. Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999 (2017)

    Google Scholar 

  34. Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vision 1–20 (2016)

    Google Scholar 

  35. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  36. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR (2014)

    Google Scholar 

  37. Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)

    Article  Google Scholar 

  38. Li, J., Zhao, J., Wei, Y., Lang, C., Li, Y., Sim, T., Yan, S., Feng, J.: Multiple-human parsing in the wild (2017). arXiv:1705.07206

  39. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)

    Article  Google Scholar 

  40. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: Available: http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html (2011)

  41. Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-DenseUNet: hybrid densely connected UNet for liver and liver tumor segmentation from CT volumes (2017). arXiv preprint arXiv:1709.07330

  42. Kirillov, A., He, K., Girshick, R., Rother, C.: Panoptic segmentation (2018). arXiv preprint arXiv:1801.00868

  43. de Geus, D., Meletis, P., Dubbelman, G.: Panoptic segmentation with a joint semantic and instance segmentation network (2018). arXiv preprint arXiv:1809.02110

  44. Zheng, S.: Conditional random fields as recurrent neural networks. In: Proceedings of IEEE ICCV (2015)

    Google Scholar 

  45. Li, X., Zhao, L., Wei, L., Yang, M.H., Wu, F., Zhuang, Y., Ling, H., Wang, J.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 25(8), 3919–3930 (2016)

    Article  MathSciNet  Google Scholar 

  46. Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  47. Jiang, Y., Chi, Z.: A fully-convolutional framework for semantic segmentation. In: Proceedings of IEEE DICTA (2017)

    Google Scholar 

  48. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of IEEE CVPR (2014)

    Google Scholar 

  49. Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  50. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Alan, L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  51. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587

  52. Li, Q., Arnab, A., Torr, P.H.: Weakly-and semi-supervised panoptic segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 102–118 (2018)

    Chapter  Google Scholar 

  53. Dai, J., He, K., Sun, J.: Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of IEEE ICCV (2015)

    Google Scholar 

  54. Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)

    Article  MathSciNet  Google Scholar 

  55. Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  56. Donahue, J., Krahenbuhl, P., Darrell, T.: Adversarial feature learning (2016). arXiv preprint arXiv:1605.09782

  57. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Massachusetts, Cambridge (2016)

    Google Scholar 

  58. Wang, Y.X., Ramanan, D., Hebert, M.: Growing a brain: fine-tuning by increasing model capacity. In: Proceedings of IEEE CVPR (2017)

    Google Scholar 

  59. Wang, Y., Xie, L., Liu, C., Qiao, S., Zhang, Y., Zhang, W., Yuille, A.L.: Sort: second-order response transform for visual recognition. In: Proceedings of IEEE ICCV (2017)

    Google Scholar 

  60. Sigaud, O., Droniou, A.: Towards deep developmental learning. IEEE Trans. Cognit. Dev. Syst. 8(2), 99–114 (2016)

    Article  Google Scholar 

  61. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: Information Theory Workshop (ITW) (2015)

    Google Scholar 

  62. Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks (2018). arXiv preprint arXiv:1805.09785

  63. Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology (2018). arXiv preprint arXiv:1802.04443

  64. Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv preprint arXiv:1605.07146

  65. Krizhevsky, A., Hinton, G.E.: Learning Multiple Layers of Features from Tiny Images. Toronto (2009)

    Google Scholar 

  66. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  67. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  68. Jiang, Y., Chi, Z.: A CNN model for semantic person part segmentation with capacity optimization. IEEE Trans. Image Process. (2018)

    Google Scholar 

  69. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531

  70. Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields (2018). arXiv preprint arXiv:1812.08008

  71. Zhang, R., Phillip, I., Alexei, A.E.: Colorful image colorization. In: European Conference on Computer Vision (2016)

    Google Scholar 

  72. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2017). arXiv preprint arXiv:1612.08242

  73. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  74. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of IEEE CVPR (2014)

    Google Scholar 

  75. Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  76. Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Neural Information Processing Systems (2015)

    Google Scholar 

  77. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)

    Article  Google Scholar 

  78. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  79. Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Available: http://www.image-net.org/challenges/LSVRC/2012/ (2012)

  80. Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)

    Article  Google Scholar 

  81. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yalong Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jiang, Y., Chi, Z. (2020). Constructing a Convolutional Neural Network with a Suitable Capacity for a Semantic Segmentation Task. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-31756-0_8

Download citation

Publish with us

Policies and ethics