Multimedia Tools and Applications

, Volume 78, Issue 14, pp 19587–19601 | Cite as

Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN

  • Zhifei LiEmail author
  • Zhonglong Zheng
  • Feilong Lin
  • Howard Leung
  • Qing Li


This paper presents a method for human action recognition from depth sequences captured by the depth camera. The main idea of the method is the action mapping image classification via convolutional neural network (CNN) based approach. Firstly, we project the raw frames onto three orthogonal Cartesian planes and stack the results into three still images (corresponding to the front, side, and top views) to form the Depth Motion Maps (DMMs). Secondly, Local Ternary Pattern (LTP) is introduced as an image filter for DMMs, thus to improve the distinguishability of similar actions. Finally, we apply CNN to action recognition by classifying corresponding LTP-encoded images. Experimental results on the popular and challenging benchmark MSR-Action 3D and MSR-Gesture dataset show the effectiveness of the presented method and meet real-time action recognition task requirements.


Human action recognition Depth motion maps Convolutional neural network Local ternary pattern 



The authors thank the anonymous reviewers for valuable comments. This work is mainly supported by grants from Zhejiang Provincial Top Key Discipline of Computer Software and Theory, National Natural Science Foundation of China (No. 61170109, 61672467), and National Science Foundation of Zhejiang Province (No. 2015C31095), China.


  1. 1.
    Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48(1):70–80CrossRefGoogle Scholar
  2. 2.
    Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: IEEE Winter Conference on Applications of Computer Vision. IEEE Computer Society, Hawaii, p 1092–1099Google Scholar
  3. 3.
    Chen C, Hou Z, Zhang B, Jiang J, Yang Y (2015) Gradient local auto-correlations and extreme learning machine for depth-based activity recognition. In: 11th international symposium on Visual Computing (ISVC'15). Springer International Publishing, Las Vegas, p 613-623Google Scholar
  4. 4.
    Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 12(1):155–163CrossRefGoogle Scholar
  5. 5.
    Guo P, Miao Z, Shen Y et al (2014) Continuous human action recognition in real time. Multimed Tools Appl 68(3):827–844CrossRefGoogle Scholar
  6. 6.
    Hattori H, Lee N, Boddeti VN et al (2018) Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance. Int J Comput Vis 126(9):1027–1044CrossRefGoogle Scholar
  7. 7.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition. IEEE Computer Society, LAS Vegas, p 770-778Google Scholar
  8. 8.
    Junsong Y, Ying W, Zicheng L, Jiang W (2012) Mining action let ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Providence, Rhode Island, p 1290–1297Google Scholar
  9. 9.
    Krizhevsky A, Sutskever I, Hinton GE (2012). ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. Curran Associates Inc, Doha, p 1097-1105Google Scholar
  10. 10.
    Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: European Signal Processing Conference. IEEE. Bucharest, Romania, p 1975–1979Google Scholar
  11. 11.
    Laraba S, Brahimi M, Tilmanne J et al (2017) 3D skeleton-based action recognition by representing motion capture sequences as 2D-RGB images. Comput Anima Virt W 28(3-4):1–11Google Scholar
  12. 12.
    Li X, Li J (2013) Lpt optimization algorithm in the nuclear environment image monitoring. J Softw 8(3):659–665Google Scholar
  13. 13.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer Vision and Pattern Recognition Workshops. IEEE Computer Society, San Francisco, p 9-14Google Scholar
  14. 14.
    Oreifej O, Liu Z (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: Computer Vision and Pattern Recognition. IEEE Computer Society, Portland, p 716–723Google Scholar
  15. 15.
    Reily B, Han F, Parker LE et al (2018) Skeleton-based bio-inspired human activity prediction for real-time human–robot interaction. Auton Robot 42(4):1281–1298CrossRefGoogle Scholar
  16. 16.
    Shen Z, Liu Z, Li J, et al. (2017) DSOD: Learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision, ICCV, Venis Italy, p 1937–1945Google Scholar
  17. 17.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. ICLR, San Diego, p 1-14Google Scholar
  18. 18.
    Tan X, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3d action recognition with random occupancy patterns. In: European Conference on Computer Vision. Springer-Verlag, Florence, p 872–885Google Scholar
  20. 20.
    Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona P (2015) ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudo coloring. In: ACM International Conference on Multimedia. ACM, Brisbane, p 1119-1122Google Scholar
  21. 21.
    Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Human-Mach Syst 46(4):498–509CrossRefGoogle Scholar
  22. 22.
    Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer Vision and Pattern Recognition. IEEE Computer Society, Portland, p 2834-2841Google Scholar
  23. 23.
    Yang X, Tian YL (2012) Eigen joints-based action recognition using naïve-bayes-nearest-neighbor. In: Computer Vision and Pattern Recognition Workshops. IEEE Computer Society, Providence, Rhode Island, p 14–19Google Scholar
  24. 24.
    Yang R, Yang R (2015) DMM-pyramid based deep architectures for action recognition with depth cameras. In: Asian Conference on Computer Vision. Springer International Publishing, Singapore, p 37–49Google Scholar
  25. 25.
    Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of ACM International Conference on Multimedia. ACM, Nara, Japan, p 1057-1060Google Scholar
  26. 26.
    Yang J, Wang Y, Lv Z et al (2018) Interaction with three-dimensional gesture and character input in virtual reality: recognizing gestures in different directions and improving user input. IEEE Consum Electro 7(2):64–72CrossRefGoogle Scholar
  27. 27.
    Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: IEEE International Conference on Computer Vision. IEEE Computer Society, Sydney, p 2752–2759Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyZhejiang Normal UniversityJinhuaChina
  2. 2.Department of Computer ScienceCITYUHong KongHong Kong

Personalised recommendations