Skip to main content

A Novel 3D Human Action Recognition Framework for Video Content Analysis

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

Abstract

Understanding the meanings of human actions from 3D skeleton data embedded videos is a new challenge in content-oriented video analysis. In this paper, we propose to incorporate temporal patterns of joint positions with currently popular Long Short-Term Memory (LSTM) based learning to improve both accuracy and robustness. Regarding 3D actions are formed by sub-actions, we first propose Wavelet Temporal Pattern (WTP) to extract representations of temporal patterns for each sub-action by wavelet transform. Then, we define a novel Relation-aware LSTM (R-LSTM) structure to extract features by modeling the long-term spatio-temporal correlation between body parts. Regarding WTP and R-LSTM features as heterogeneous representations for human actions, we next fuse WTP and R-LSTM features by an Auto-Encoder network to define a more effective action descriptor for classification. The experimental results on a large scale challenging dataset NTU-RGB+D and several other datasets consisting of UT-Kinect and Florence 3D actions for 3D human action analysis demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexandros, C., Padilla-Lopez, J., Flórez-Revuelta, F.: Fusion of skeletal and silhouette-based features for human action recognition with RGB-D devices. In: Proceedings of the ICCVW, pp. 91–97 (2013)

    Google Scholar 

  2. Anirudh, R., Turaga, P.K., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: Proceedings of the CVPR, pp. 3147–3155 (2015). https://doi.org/10.1109/CVPR.2015.7298934

  3. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the CVPR, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177

  4. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the CVPR, pp. 1110–1118 (2015), https://doi.org/10.1109/CVPR.2015.7298714

  5. Eamonn, K., Ann, R.C.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  6. Ho, E.S.L., Chan, J.C.P., Chan, D.C.K., Shum, H.P.H., Cheung, Y., Yuen, P.C.: Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments. CVIU 148, 97–110 (2016). https://doi.org/10.1016/j.cviu.2015.12.011

    Google Scholar 

  7. Hu, J., Zheng, W., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the CVPR, pp. 5344–5352 (2015). https://doi.org/10.1109/CVPR.2015.7299172

  8. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  9. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the ICML, pp. 689–696 (2011)

    Google Scholar 

  10. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the MM, pp. 357–360 (2007). http://doi.acm.org/10.1145/1291233.1291311

  11. Seidenari, L., Varano, V., Berretti, S., Bimbo, A.D., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: Proceedings of the CVPRW, pp. 479–485 (2013). https://doi.org/10.1109/CVPRW.2013.77

  12. Shahroudy, A., Liu, J., Ng, T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the CVPR, pp. 1010–1019 (2016), http://doi.ieeecomputersociety.org/10.1109/CVPR.2016.115

  13. Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: Proceedings of the ICCV, pp. 4041–4049 (2015). https://doi.org/10.1109/ICCV.2015.460

  14. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: Proceedings of the CVPR, pp. 588–595 (2014). https://doi.org/10.1109/CVPR.2014.82

  15. Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. CoRR abs/1704.02581 (2017). http://arxiv.org/abs/1704.02581

  16. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. PAMI 36(5), 914–927 (2014). https://doi.org/10.1109/TPAMI.2013.198

    Article  Google Scholar 

  17. Wang, P., Yuan, C., Hu, W., Li, B., Zhang, Y.: Graph based skeleton motion representation and similarity measurement for action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 370–385. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_23

    Google Scholar 

  18. Wu, D., Pigou, L., Kindermans, P., Le, N.D., Shao, L., Dambre, J., Odobez, J.: Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans. PAMI 38(8), 1583–1597 (2016). https://doi.org/10.1109/TPAMI.2016.2537340

    Article  Google Scholar 

  19. Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Proceedings of the CVPRW, pp. 20–27 (2012). https://doi.org/10.1109/CVPRW.2012.6239233

  20. Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: Proceedings of the CVPRW, pp. 486–491 (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant 61672273, Grant 61272218, and Grant 61321491, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant BK20160021, the Science Foundation of JiangSu under Grant BK20170892, the Fundamental Research Funds for the Central Universities under Grant 2013/B16020141, and the open Project of the National Key Lab for Novel Software Technology in NJU under Grant KFKT2017B05.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, L., Wu, Y., Wang, W., Lu, T. (2018). A Novel 3D Human Action Recognition Framework for Video Content Analysis. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics