Skip to main content
Log in

Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information. Specifically, depth and skeleton data are firstly augmented for deep learning and making the recognition insensitive to view variance. Secondly, depth sequences are segmented using the handcrafted features based on skeleton joints motion histogram to exploit the local temporal information. All training segments are clustered using an Infinite Gaussian Mixture Model (IGMM) through Bayesian estimation and labelled for training Convolutional Neural Networks (ConvNets) on the depth maps. Thus, a depth sequence can be reliably encoded into a sequence of segment labels. Finally, the sequence of labels is fed into a joint Hidden Markov Model and Support Vector Machine (HMM-SVM) classifier to explore the global temporal information for final recognition. The proposed framework was evaluated on the widely used MSRAction-Pairs, MSRDailyActivity3D and UTD-MHAD datasets and achieved promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Althloothi S, Mahoor MH, Zhang X, Voyles RM (2014) Human activity recognition using multi-features and multiple kernel learning. Pattern Recogn 47(5):1800–1812

    Article  Google Scholar 

  2. Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP), pp 168–172

  3. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE International conference on computer vision, pp 1110–1118

  4. Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: ACCV, pp 428–443

  5. Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181

    Article  MathSciNet  MATH  Google Scholar 

  6. Gelman A, Carlin JB, Stern HS, Rubin DB Bayesian data analysis, 2nd edn. Crc Pr I Llc

  7. Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition. In: International joint conference on artificial intelligence, pp 1351–1357

  8. Griffiths T, Ghahramani Z (2005) Infinite latent feature models and the indian buffet process. Adv Neural Inf Process Syst 18:475–482

    Google Scholar 

  9. Jia Y, Shelhamer E, Donahue J, Karayev S, Long JC (2014) Convolutional architecture for fast feature embedding. Eprint Arxiv 675–678

  10. Kong Y, Fu Y (2015) Bilinear heterogeneous information machine for rgb-d action recognition. In: CVPR, pp 1054–1062

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012

    Google Scholar 

  12. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE Computer society conference on computer vision and pattern recognition workshops (CVPRW), pp 9–14

  13. Neal RM (2010) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(9):249–265

    MathSciNet  Google Scholar 

  14. Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: IEEE Conference on computer vision and pattern recognition, pp 716–723

  15. Shao L, Ji L (2009) Motion histogram analysis based key frame extraction for human action/activity representation. In: Canadian conference on computer and robot vision, pp 88–92

  16. Shotton J, Fitzgibbon A, Cook M, Sharp T (2011) Real-time human pose recognition in parts from single depth images. In: Computer vision and pattern recognition, pp 1297–1304

  17. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: IEEE Conference on computer vision and pattern recognition, pp 588–595

  18. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on computer vision and pattern recognition, pp 1290–1297

  19. Wang J, Liu Z, Wu Y (2014) Learning actionlet ensemble for 3d human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927

    Article  Google Scholar 

  20. Wang P, Li W, Ogunbona P, Gao Z (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: International conference on digital lmage computing: techniques and applications, pp 1 – 8

  21. Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona P (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122

  22. Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2015) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Human-Mach Syst 46(4):498–509

    Article  Google Scholar 

  23. Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Human-Mach Syst 46(4):498–509

    Article  Google Scholar 

  24. Wood F, Goldwater S, Black MJ (2006) A non-parametric bayesian approach to spike sorting 1(1):1165–1168

    Google Scholar 

  25. Wu D, Shao L (2014) Deep dynamic neural networks for gesture segmentation and recognition. Springer International Publishing

  26. Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer vision and pattern recognition, pp 2834–2841

  27. Xiaodong Y, Tian YL (2012) Eigenjoints-based action recognition using naïve-bayes-nearest-neighbor. In: Computer vision and pattern recognition workshops, pp 14–19

  28. Yang X, Tian YL (2014) Super normal vector for activity recognition using depth sequences. In: IEEE Conference on computer vision and pattern recognition, pp 804–811

  29. Yang X, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM International conference on multimedia, pp 1057–1060

  30. Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: IEEE International conference on computer vision, pp 2752–2759

  31. Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: DICTA. IEEE

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (NO. 61571325 and 61502357) and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (NO. CUG170654).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Hou, Y., Li, Z. et al. Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier. Multimed Tools Appl 77, 18983–18998 (2018). https://doi.org/10.1007/s11042-017-5335-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5335-0

Keywords

Navigation