Abstract
Deep convolutional neural networks (DCNNs) have recently been applied to Human pose estimation (HPE). However, most conventional methods have involved multiple models, and these models have been independently designed and optimized, which has led to sub-optimal performance. In addition, these methods based on multiple DCNNs have been computationally expensive and unsuitable for real-time applications. This paper proposes a novel end-to-end framework implemented with cascaded neural networks. Our proposed framework includes three tasks: (1) detecting regions which include parts of the human body, (2) predicting the coordinates of human body joints in the regions, and (3) finding optimum points as coordinates of human body joints. These three tasks are jointly optimized. Our experimental results demonstrated that our framework improved the accuracy and the running time was 2.57 times faster than conventional methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)
Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)
Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Cham (2015). doi:10.1007/978-3-319-16808-1_21
Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)
Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: CVPR (2015)
Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: CVPR (2016)
Chen, X., Yuille, A.: Parsing occluded people by flexible compositions. In: CVPR (2015)
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)
Yang, W., et al.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: CVPR (2016)
Wang, K., et al.: Human pose estimation from depth images via inference embedded multi-task learning. In: Multimedia Conference (2016)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2016)
Long, J., et al.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Han, X., et al.: MatchNet: unifying feature and metric learning for patch-based matching. In: CVPR (2015)
Jaderberg, M., et al.: Speeding up convolutional neural networks with low rank expansions. BMVA Press (2014)
Simonyan, K., et al.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Peng, X., et al.: Learning deep object detectors from 3D models. In: CVPR (2015)
Sonoda, S., et al.: Neural network with unbounded activation functions is universal approximator. In: Applied and Computational Harmonic Analysis (2015)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, 2nd edn. Springer, New York (2006)
Reyes, A., Caicedo, J., Camargo, J.: Fine-tuning Deep Convolutional Networks for Plant Recognition. In: Proceedings of CEUR Workshop, CLEF (Working Notes), vol. 1391 (2015). CEUR-WS.org
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2014)
Eichner, M., Ferrari, V.: Appearance sharing for collective human pose estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 138–151. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_11
Johnson, S., et al.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)
Courbariaux, M., et al.: BinaryConnect: training deep neural networks with binary weights during propagations. In: NIPS (2015)
Kingma, D.P., et al.: Adam: a method for stochastic optimization. In: ICLR (2015)
Website of MakeHuman. http://www.makehuman.org/. Accessed 18 Apr 2017
Website of Blender. https://www.blender.org. Accessed 18 Apr 2017
Website of motion capture database. http://mocap.cs.cmu.edu/. Accessed 18 Apr 2017
Website of Chainer. http://chainer.org/. Accessed 18 Apr 2017
Website of Model Zoo. https://github.com/BVLC/caffe/wiki/Model-Zoo. Accessed 18 Apr 2017
Benjamin, S., et al.: MODEC: multimodal decomposable models for human pose estimation. In: CVPR (2013)
Andriluka, M., et al.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Acknowledgments
The authors would like to thank Professor Hironobu Fujiyoshi at Chubu University for his forthright comments and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tanabe, S., Yamanaka, R., Tomono, M., Ito, M., Ishihara, T. (2017). Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10425. Springer, Cham. https://doi.org/10.1007/978-3-319-64698-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-64698-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64697-8
Online ISBN: 978-3-319-64698-5
eBook Packages: Computer ScienceComputer Science (R0)