A Novel 3D Human Action Recognition Framework for Video Content Analysis

Wei, Lianglei; Wu, Yirui; Wang, Wenhai; Lu, Tong

doi:10.1007/978-3-319-73603-7_4

Lianglei Wei²¹,
Yirui Wu²²,
Wenhai Wang²¹ &
…
Tong Lu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

International Conference on Multimedia Modeling

3211 Accesses
5 Citations

Abstract

Understanding the meanings of human actions from 3D skeleton data embedded videos is a new challenge in content-oriented video analysis. In this paper, we propose to incorporate temporal patterns of joint positions with currently popular Long Short-Term Memory (LSTM) based learning to improve both accuracy and robustness. Regarding 3D actions are formed by sub-actions, we first propose Wavelet Temporal Pattern (WTP) to extract representations of temporal patterns for each sub-action by wavelet transform. Then, we define a novel Relation-aware LSTM (R-LSTM) structure to extract features by modeling the long-term spatio-temporal correlation between body parts. Regarding WTP and R-LSTM features as heterogeneous representations for human actions, we next fuse WTP and R-LSTM features by an Auto-Encoder network to define a more effective action descriptor for classification. The experimental results on a large scale challenging dataset NTU-RGB+D and several other datasets consisting of UT-Kinect and Florence 3D actions for 3D human action analysis demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexandros, C., Padilla-Lopez, J., Flórez-Revuelta, F.: Fusion of skeletal and silhouette-based features for human action recognition with RGB-D devices. In: Proceedings of the ICCVW, pp. 91–97 (2013)
Google Scholar
Anirudh, R., Turaga, P.K., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: Proceedings of the CVPR, pp. 3147–3155 (2015). https://doi.org/10.1109/CVPR.2015.7298934
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the CVPR, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the CVPR, pp. 1110–1118 (2015), https://doi.org/10.1109/CVPR.2015.7298714
Eamonn, K., Ann, R.C.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Ho, E.S.L., Chan, J.C.P., Chan, D.C.K., Shum, H.P.H., Cheung, Y., Yuen, P.C.: Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments. CVIU 148, 97–110 (2016). https://doi.org/10.1016/j.cviu.2015.12.011
Google Scholar
Hu, J., Zheng, W., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the CVPR, pp. 5344–5352 (2015). https://doi.org/10.1109/CVPR.2015.7299172
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Chapter Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the ICML, pp. 689–696 (2011)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the MM, pp. 357–360 (2007). http://doi.acm.org/10.1145/1291233.1291311
Seidenari, L., Varano, V., Berretti, S., Bimbo, A.D., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: Proceedings of the CVPRW, pp. 479–485 (2013). https://doi.org/10.1109/CVPRW.2013.77
Shahroudy, A., Liu, J., Ng, T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the CVPR, pp. 1010–1019 (2016), http://doi.ieeecomputersociety.org/10.1109/CVPR.2016.115
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: Proceedings of the ICCV, pp. 4041–4049 (2015). https://doi.org/10.1109/ICCV.2015.460
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: Proceedings of the CVPR, pp. 588–595 (2014). https://doi.org/10.1109/CVPR.2014.82
Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. CoRR abs/1704.02581 (2017). http://arxiv.org/abs/1704.02581
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. PAMI 36(5), 914–927 (2014). https://doi.org/10.1109/TPAMI.2013.198
Article Google Scholar
Wang, P., Yuan, C., Hu, W., Li, B., Zhang, Y.: Graph based skeleton motion representation and similarity measurement for action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 370–385. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_23
Google Scholar
Wu, D., Pigou, L., Kindermans, P., Le, N.D., Shao, L., Dambre, J., Odobez, J.: Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans. PAMI 38(8), 1583–1597 (2016). https://doi.org/10.1109/TPAMI.2016.2537340
Article Google Scholar
Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Proceedings of the CVPRW, pp. 20–27 (2012). https://doi.org/10.1109/CVPRW.2012.6239233
Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: Proceedings of the CVPRW, pp. 486–491 (2013)
Google Scholar

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant 61672273, Grant 61272218, and Grant 61321491, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant BK20160021, the Science Foundation of JiangSu under Grant BK20170892, the Fundamental Research Funds for the Central Universities under Grant 2013/B16020141, and the open Project of the National Key Lab for Novel Software Technology in NJU under Grant KFKT2017B05.

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
Lianglei Wei, Wenhai Wang & Tong Lu
College of Computer and Information, Hohai University, Nanjing, China
Yirui Wu

Authors

Lianglei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yirui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, L., Wu, Y., Wang, W., Lu, T. (2018). A Novel 3D Human Action Recognition Framework for Video Content Analysis. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_4
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics