Abstract
Pose estimation is the task of predicting the pose of an object in an image or in a sequence of images. Here, we focus on articulated human pose estimation in scenes with a single person. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a heatmap prediction of body joints. In this network topology, features are processed across all scales which captures the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. We propose some improvements to this type of regression-based networks to further increase performance, namely: (a) increase the number of parameters of the auto-encoder networks in the pipeline, (b) use stronger regularization along with heavy data augmentation, (c) use sub-pixel precision for more precise joint localization, and (d) combine all auto-encoders output heatmaps into a single prediction, which further increases body joint prediction accuracy. We demonstrate state-of-the-art results on the popular FLIC and LSP datasets.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693. IEEE (2014)
Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR, pp. 1–8 (2008)
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV, pp. 3487–3494 (2013)
Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: CVPR, vol. 13, p. 3 (2013)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS, pp. 1736–1744 (2014)
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv:1602.00134 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. arXiv:1603.06937 (2016)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Proceedings of CVPR (2011)
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In IEEE Proceedings of CVPR, pp. 588–595 (2013)
Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV, pp. 723–730. IEEE (2011)
Dantone, M., Gall, J., Leistner, C., Van Gool, L.: Human pose estimation using body parts dependent joint regressors. In: IEEE Proceedings of CVPR, pp. 3041–3048 (2013)
Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). doi:10.1007/978-3-319-10605-2_3
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. arXiv:1511.06645 (2015)
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp. 1799–1807 (2014)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. arXiv:1605.03170 (2016)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: IEEE Proceedings of CVPR, pp. 1653–1660 (2014)
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv:1602.00134 (2016)
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). doi:10.1007/978-3-319-46478-7_44
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Proceedings of CVPR, pp. 3431–3440 (2015)
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop, no. EPFL-CONF-192376 (2011)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 (2015)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, p. 5 (2010)
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. arXiv:1605.02914 (2016)
Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. arXiv:1603.08212 (2016)
Acknowledgments
This work was supported by the FCT project LARSyS (UID/EEA/50009/2013) and FCT PhD grant to author MF (SFRH/BD/79812/2011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Farrajota, M., Rodrigues, J.M.F., du Buf, J.M.H. (2017). Human Pose Estimation by a Series of Residual Auto-Encoders. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-58838-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58837-7
Online ISBN: 978-3-319-58838-4
eBook Packages: Computer ScienceComputer Science (R0)