Learning Discriminative Representation for Skeletal Action Recognition Using LSTM Networks

Hu, Lizhang; Xu, Jinhua

doi:10.1007/978-3-319-64698-5_9

Lizhang Hu¹⁶ &
Jinhua Xu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10425))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

1892 Accesses
2 Citations

Abstract

Human action recognition based on 3D skeleton data is a rapidly growing research area in computer vision due to their robustness to variations of viewpoint, human body scale and motion speed. Recent studies suggest that recurrent neural networks (RNNs) or convolutional neural networks (CNNs) are very effective to learn discriminative features of temporal sequences for classification. However, in prior models, the RNN-based method has a complicated multi-layer hierarchical architecture, and the CNN-based methods learn the contextual feature on fixed temporal scales. In this paper, we propose a framework which is simple and able to select temporal scales automatically with a single layer LSTM for skeleton based action recognition. Experimental results on three benchmark datasets show that our approach achieves the state-of-the-art performance compared to recent models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2015)
Google Scholar
Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Chen, L., Wei, H., Ferryman, J.M.: A survey of human motion analysis using depth imagery. Pattern Recogn. Lett. 34, 1995–2006 (2013)
Article Google Scholar
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443 (2014)
Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. J. Mach. Learn. Res. 3(1), 115–143 (2003)
MathSciNet MATH Google Scholar
Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: International Joint Conference on Artificial Intelligence, pp. 1351–1357 (2013)
Google Scholar
Grushin, A., Monner, D.D., Reggia, J.A., Mishra, A.: Robust human action recognition via long short-term memory. In: International Joint Conference on Neural Networks, pp. 1–8 (2013)
Google Scholar
Han, F., Reily, B., Hoff, W., Zhang, H.: Space-time representation of people based on 3D skeletal data: a review. arXiv preprint (2016). arXiv:1601.01006
Han, J.: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)
Article Google Scholar
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (2013)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Workshop on Human Activity Understanding from 3D Data, pp. 9–14 (2010)
Google Scholar
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3D human action recognition. In: European Conference on Computer Vision, pp. 816–833 (2016)
Google Scholar
Lonard, N., Waghmare, S., Wang, Y., Kim, J.H.: rnn: Recurrent library for torch (2015). arXiv preprint arXiv:1511.07889
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 411(1), pp. 1297–1304 (2011)
Google Scholar
Sivalingam, R., Somasundaram, G., Bhatawadekar, V., Morellas, V., Papanikolopoulos, N.: Sparse representation of point trajectories for action classification. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 3601–3606 (2012)
Google Scholar
Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: IEEE Conference on Computer Vision Workshop (2015)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Wang, C., Wang, Y., Yuille, A.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)
Google Scholar
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27 (2012)
Google Scholar
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using na07ve-bayes-nearest-neighbor. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19 (2012)
Google Scholar
Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., Gall, J.: A survey on human motion analysis from depth data. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. LNCS, vol. 8200, pp. 149–187. Springer, Heidelberg (2013). doi:10.1007/978-3-642-44964-2_8
Chapter Google Scholar
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)
Google Scholar
Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 486–491 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University, 3663 North Zhongshan Road, Shanghai, China
Lizhang Hu & Jinhua Xu

Authors

Lizhang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhua Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinhua Xu .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Lund University, Lund, Sweden
Anders Heyden
University of Southern Denmark, Odense, Denmark
Norbert Krüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, L., Xu, J. (2017). Learning Discriminative Representation for Skeletal Action Recognition Using LSTM Networks. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10425. Springer, Cham. https://doi.org/10.1007/978-3-319-64698-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-64698-5_9
Published: 28 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64697-8
Online ISBN: 978-3-319-64698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics