Advertisement

Deep Stacked Bidirectional LSTM Neural Network for Skeleton-Based Action Recognition

  • Kai Zou
  • Ming YinEmail author
  • Weitian Huang
  • Yiqiu Zeng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11901)

Abstract

Skeleton-based action recognition has made great progress recently. However, many problems still remain unsolved. For example, the representations of skeleton sequences learned by most of the existing methods lack spatial structure information and detailed temporal dynamics features. To this end, we propose a novel Deep Stacked Bidirectional LSTM Network (DSB-LSTM) for human action recognition from skeleton data. Specifically, we first exploit human body geometry to extract the skeletal modulus ratio features (MR) and the skeletal vector angle features (VA) from the skeletal data. Then, the DSB-LSTM is applied to learning both the spatial and temporal representation from MR features and VA features. This network not only leads to more powerful representation but also stronger generalization capability. We perform several experiments on the MSR Action3D dataset, Florence 3D dataset and UTKinect-Action dataset. And the results show that our approach outperforms the compared methods on all datasets, demonstrating the effectiveness of the DSB-LSTM.

Keywords

Deep learning Skeleton-based action recognition Bidirectional LSTM 

References

  1. 1.
    Simonyan, K., Zisserman, A., Ghahramani, Z., et al.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27, pp. 568–576 (2014)Google Scholar
  2. 2.
    Si, C.Y., Chen, W.T., Wan, W., et al.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  3. 3.
    Wang, L.M., Xiong, Y., Wang, Z., et al.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36 (2016)CrossRefGoogle Scholar
  4. 4.
    Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 4(2), 201–211 (1973)CrossRefGoogle Scholar
  5. 5.
    Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRefGoogle Scholar
  6. 6.
    Cao, Z., Simon, T., Wei, S.E., et al.: Realtime multi-person 2d pose estimation using part affinity fields. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)Google Scholar
  7. 7.
    Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning using temporal sliding LSTM networks. In: The European Conference on Computer Vision, pp. 103–118 (2018)CrossRefGoogle Scholar
  8. 8.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)Google Scholar
  9. 9.
    Shahroudy, A., Liu, J., Ng, T.T., et al.: Ntu rgb+d: A large scale dataset for 3d human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)Google Scholar
  10. 10.
    Wang. J., Liu, Z., Wu, Y., et al.: Mining actionlet ensemble for action recognition with depth cameras. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)Google Scholar
  11. 11.
    Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: International Joint Conference on Artificial Intelligence (2013)Google Scholar
  12. 12.
    Wenjun, Z., Junliang, X., et al.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  13. 13.
    Yan, S., Xiong Y., Lin, D., et al.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: International Joint Conference on Artificial Intelligence (2018)Google Scholar
  14. 14.
    Li, C., Zhong, Q., Xie, D. et al.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: International Joint Conference on Artificial Intelligence (2018)Google Scholar
  15. 15.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  16. 16.
    Hinton, G.E., Srivastava,N., Krizhevsky, A. et al.: Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint arXiv:1207.0580, (2012)
  17. 17.
    Tompson, J., Goroshin, R., Jain, A., et al.: Efficient object localization using convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)Google Scholar
  18. 18.
    Xia, L., Chen, C., Aggarwal, J.K., et al.: View invariant human action recognition using histograms of 3D joints. In: The IEEE International Conference on Computer Vision and Pattern Recognition, pp. 20–27 (2012)Google Scholar
  19. 19.
    Yang, X., Tian, Y.: Eigen joints based action recognition using Naitive-Bayes-nearest-neighbor. In: The IEEE International Conference on Computer Vision and Pattern Recognition, pp. 14–19 (2012)Google Scholar
  20. 20.
    Yuan, M., Chen, E., Gao, L.: Posture selection based on two-layer ap with application to human action recognition using HMM. In: IEEE International Symposium on Multimedia, pp. 359–364 (2017)Google Scholar
  21. 21.
    Seidenari, L., Varano, V., Berretti, S., et al.: Recongnizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: the IEEE International Conference on Computer Vision and Pattern Recognition Workshops (2013)Google Scholar
  22. 22.
    Ding, W.W., Liu, K., Li, G.: Human action recognition using spectral embedding to similarity degree between postures. In: Visiual Communications and Image Processing, Chengdu (2017)Google Scholar
  23. 23.
    Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2015)Google Scholar
  24. 24.
    Youssef, C.: Spatiotemporal representation of 3d skeleton joints-based action recognition using modified spherical harmonics. Pattern Recogn. Lett. 83, 32–41 (2016)CrossRefGoogle Scholar
  25. 25.
    Li, X., Liao, D., Zhang, Y.: Mining key skeleton poses with latent SVM for action recognition. Appl. Comput. Intell. Soft Comput. 2017, 11 (2017)Google Scholar
  26. 26.
    Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3d action recognition. In: The IEEE International Conference on Computer Vision and Pattern Recognition, pp. 486–491 (2013)Google Scholar
  27. 27.
    Liu, J., Shahroudy, A., Xu, D.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision, pp. 816–833 (2016)CrossRefGoogle Scholar
  28. 28.
    Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer lstm networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 148–157 (2017)Google Scholar
  29. 29.
    Ghorbel, E., Boonaert, J., Boutteau, R., et al.: An extension of kernel learning methods using a modified Log-Euclidean distance for fast and accurate skeleton-based human action recognition. Comput. Vis. Image Understand. 175, 32–43 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of AutomationGuangdong University of TechnologyGuangzhouChina

Personalised recommendations