Pixel Convolutional Networks for Skeleton-Based Human Action Recognition

  • Zhichao ChangEmail author
  • Jiangyun Wang
  • Liang Han
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 946)


Human action recognition is an important field in computer vision. Skeleton-based models of human obtain more attention in related researches because of strong robustness to external interference factors. In traditional researches the form of the feature is usually so hand-crafted that effective feature is difficult to extract from skeletons. In this paper a unique method is proposed for human action recognition called Pixel Convolutional Networks, which use a natural and intuitive way to extract skeleton feature from two dimensions, space and time. It achieves good performance compared with mainstream methods in the past few years in the large dataset NTU-RGB+D.


Human action recognition Skeleton-based models Skeleton pixel pictures Pixel convolutional networks 


  1. 1.
    Sigal, L., Black, M.J.: HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2006)Google Scholar
  2. 2.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595. IEEE Computer Society (2014)Google Scholar
  3. 3.
    Mahasseni, B., Todorovic, S.: Regularizing long short term memory with 3D human-skeleton sequences for action recognition. In: Computer Vision and Pattern Recognition, pp. 3054–3062. IEEE (2016)Google Scholar
  4. 4.
    Zhang, K., Zuo, W., Gu, S., et al.: Learning deep CNN denoiser prior for image restoration, pp. 2808–2817 (2017)Google Scholar
  5. 5.
    Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRefGoogle Scholar
  6. 6.
    Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (2014)Google Scholar
  7. 7.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS (2016)Google Scholar
  8. 8.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR 2017 (2017)Google Scholar
  9. 9.
    Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs, pp. 2014–2023 (2016)Google Scholar
  10. 10.
    Shahroudy, A., Liu, J., Ng, T.T., et al.: NTU RGB+D: a large scale dataset for 3D human activity analysis, pp. 1010–1019 (2016)Google Scholar
  11. 11.
    Veeriah, V., Zhuang, N., Qi, G.J.: Differential recurrent neural networks for action recognition. In: IEEE International Conference on Computer Vision, pp. 4041–4049. IEEE (2015)Google Scholar
  12. 12.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015)Google Scholar
  13. 13.
    Liu, J., Shahroudy, A., Xu, D., et al.: Spatio-temporal LSTM with trust gates for 3D human action recognition, pp. 816–833 (2016)CrossRefGoogle Scholar
  14. 14.
    Kim, T.S., Reiter Liu, J., Shahroudy, A., Xu, D., et al.: Spatio-temporal LSTM with trust gates for 3D human action recognition, pp. 816–833 (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.School of Automation Science and Electrical EngineeringBeihang UniversityBeijingChina

Personalised recommendations