Abstract
Although the state-of-the-art performance has been achieved in many computer vision tasks such as image classification , object detection, saliency prediction and depth estimation, Convolutional Neural Networks (CNNs) still perform unsatisfactorily in some difficult tasks such as human parsing which is the focus of our research. The inappropriate capacity of a CNN model and insufficient training data both contribute to the failure in perceiving the semantic information of detailed regions. The feature representations learned by a high-capacity model cannot generalize to the variations in viewpoints, human poses and occlusions in real-world scenarios due to overfitting . On the other hand, the under-fitting problem prevents a low-capacity model from developing the representations which are sufficiently expressive. In this chapter, we propose an approach to estimate the complexity of a task and match the capacity of a CNN model to the complexity of a task while avoiding under-fitting and overfitting . Firstly, a novel training scheme is proposed to fully explore the potential of low-capacity CNN models. The scheme outperforms existing end-to-end training schemes and enables low-capacity models to outperform models with higher capacity. Secondly, three methods are proposed to optimize the capacity of a CNN model on a task. The first method is based on improving the orthogonality among kernels which contributes to higher computational efficiency and better performance. In the second method, the convolutional kernels within each layer are evaluated according to their semantic functions and contributions to the training and test accuracy. The kernels which only contribute to the training accuracy but has no effect on the testing accuracy are removed to avoid overfitting . In the third method, the capacity of a CNN model is optimized by adjusting the dependency among convolutional kernels. A novel structure of convolutional layers is proposed to reduce the number of parameters while maintaining the similar performance. Besides capacity optimization , we further propose a method to evaluate the complexity of a human parsing task. An independent CNN model is trained for this purpose using the labels for pose estimation. The evaluation on complexity is achieved based on estimated pose information in images. The proposed scheme for complexity evaluation was conducted on the Pascal Person Part dataset and the Look into Person dataset which are for human parsing. The schemes for capacity optimization were conducted on our models for human parsing which were trained on the two data sets. Both quantitative and qualitative results demonstrate that our proposed algorithms can match the capacity of a CNN model well to the complexity of a task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lee, J., Xiao, L., Schoenholz, S.S., Bahri, Y., Sohl-Dickstein, J., Pennington, J.: Wide neural networks of any depth evolve as linear models under gradient descent (2019). arXiv preprint arXiv:1902.06720
Rolnick, D., Tegmark, M.: The power of deeper networks for expressing natural functions. In: International Conference on Learning Representations (2018)
Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S.S., Pennington, J.: Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks (2018). arXiv preprint arXiv:1806.05393
Pierre, B., Roman, V.: Neuronal capacity. In: NIPS (2018)
Lei, N., Luo, Z., Yau, S.T., Gu, D.X.: Geometric understanding of deep learning (2018). arXiv preprint arXiv:1805.10451
Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. In: Unpublished manuscript (2010)
Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Large scale visual recognition challenge 2012. Available: http://www.image-net.org/challenges/ILSVRC/2012/ (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. pp. 3 (2017)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv preprint arXiv:1709.01507
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2017). arXiv preprint arXiv:1707.07012
Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.J., Li, F.F., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search (2017). arXiv preprint arXiv:1712.00559
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). arXiv preprint arXiv:1802.01548
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
He, K., Zhang, X., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016)
Simon, M., Rodner, E., Denzler, J.: ImageNet pre-trained models with batch normalization (2016). arXiv preprint arXiv:1612.01452, https://github.com/cvjena/cnn-models
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks (2015). Available: https://github.com/KaimingHe/deep-residual-networks
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks with 1Â K layers (2016). Available: https://github.com/KaimingHe/resnet-1k-layers
He, K., Zhang, X., Ren, S., Sun, J.: Trained ResNet torch models (2016). Available: https://github.com/facebook/fb.resnet.torch/tree/master/pretrained
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)
Kurakin, A., Goodfellow, I., Bengio, S., Dong, Y., Liao, F., Liang, M., Liang, J.: Adversarial attacks and defences competition (2018). arXiv preprint arXiv:1804.00097
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual object classes challenge 2012 (VOC2012) results. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (2012)
Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate (2019). arXiv preprint arXiv:1902.09843
Anonymous: Shallow learning for deep networks. In: Under double-blind review (2018)
Zhou, Z.H.: A brief introduction to weakly supervised learning. Nat. Sci. Rev. 5(1), 44–53 (2017)
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (2018)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft coco: common objects in context. In: Proceedings of ECCV (2014)
Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999 (2017)
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vision 1–20 (2016)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR (2014)
Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Li, J., Zhao, J., Wei, Y., Lang, C., Li, Y., Sim, T., Yan, S., Feng, J.: Multiple-human parsing in the wild (2017). arXiv:1705.07206
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: Available: http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html (2011)
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-DenseUNet: hybrid densely connected UNet for liver and liver tumor segmentation from CT volumes (2017). arXiv preprint arXiv:1709.07330
Kirillov, A., He, K., Girshick, R., Rother, C.: Panoptic segmentation (2018). arXiv preprint arXiv:1801.00868
de Geus, D., Meletis, P., Dubbelman, G.: Panoptic segmentation with a joint semantic and instance segmentation network (2018). arXiv preprint arXiv:1809.02110
Zheng, S.: Conditional random fields as recurrent neural networks. In: Proceedings of IEEE ICCV (2015)
Li, X., Zhao, L., Wei, L., Yang, M.H., Wu, F., Zhuang, Y., Ling, H., Wang, J.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 25(8), 3919–3930 (2016)
Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Jiang, Y., Chi, Z.: A fully-convolutional framework for semantic segmentation. In: Proceedings of IEEE DICTA (2017)
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of IEEE CVPR (2014)
Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Alan, L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
Li, Q., Arnab, A., Torr, P.H.: Weakly-and semi-supervised panoptic segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 102–118 (2018)
Dai, J., He, K., Sun, J.: Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of IEEE ICCV (2015)
Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)
Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Donahue, J., Krahenbuhl, P., Darrell, T.: Adversarial feature learning (2016). arXiv preprint arXiv:1605.09782
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Massachusetts, Cambridge (2016)
Wang, Y.X., Ramanan, D., Hebert, M.: Growing a brain: fine-tuning by increasing model capacity. In: Proceedings of IEEE CVPR (2017)
Wang, Y., Xie, L., Liu, C., Qiao, S., Zhang, Y., Zhang, W., Yuille, A.L.: Sort: second-order response transform for visual recognition. In: Proceedings of IEEE ICCV (2017)
Sigaud, O., Droniou, A.: Towards deep developmental learning. IEEE Trans. Cognit. Dev. Syst. 8(2), 99–114 (2016)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: Information Theory Workshop (ITW) (2015)
Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks (2018). arXiv preprint arXiv:1805.09785
Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology (2018). arXiv preprint arXiv:1802.04443
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv preprint arXiv:1605.07146
Krizhevsky, A., Hinton, G.E.: Learning Multiple Layers of Features from Tiny Images. Toronto (2009)
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Jiang, Y., Chi, Z.: A CNN model for semantic person part segmentation with capacity optimization. IEEE Trans. Image Process. (2018)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531
Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields (2018). arXiv preprint arXiv:1812.08008
Zhang, R., Phillip, I., Alexei, A.E.: Colorful image colorization. In: European Conference on Computer Vision (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2017). arXiv preprint arXiv:1612.08242
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of IEEE CVPR (2014)
Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Neural Information Processing Systems (2015)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Available: http://www.image-net.org/challenges/LSVRC/2012/ (2012)
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jiang, Y., Chi, Z. (2020). Constructing a Convolutional Neural Network with a Suitable Capacity for a Semantic Segmentation Task. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-31756-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-31756-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31755-3
Online ISBN: 978-3-030-31756-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)