Skip to main content
Log in

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a novel approach for human action and gesture recognition using dual-complementary tensors. In particular, the proposed method constructs a compact and yet discriminative representation by normalizing the input video volume into dual tensors. One tensor is obtained from the raw video volume data and the other one is obtained from the histogram of oriented gradients (HOG) features. Each tensor is converted to factored matrices and the similarity between factored matrices is evaluated using canonical correlation analysis (CCA). We, furthermore, propose an information fusion method to combine the resulting similarity scores. The proposed fusion strategy can effectively enhance discriminability between different action categories and lead to better recognition accuracy. We have conducted several experiments on two publicly available databases (UCF sports and Cambridge-Gesture). The results show that our proposed method achieves comparable recognition accuracy as the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Atmosukarto I, Ahuja N, Ghanem B (2015) Action recognition using discriminative structured trajectory groups. In: IEEE winter conference on applications of computer vision. IEEE, pp 899–906

  2. Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE conference on computer vision and pattern recognition workshops. IEEE, pp 702–707

  3. Cristani M, Raghavendra R, Del Bue A, Murino V (2013) Human behavior analysis in video surveillance: a social signal processing perspective. Neurocomputing 100:86–97

    Article  Google Scholar 

  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1. IEEE, pp 886–893

  5. De Geest R, Tuytelaars T (2014) Dense interest features for video processing. In: IEEE international conference on image processing. IEEE, pp 5771–5775

  6. Deng X, Liu X, Song M, Cheng J, Bu J, Chen C (2013) Lf-eme: local features with elastic manifold embedding for human action recognition. Neurocomputing 99:144–153

    Article  Google Scholar 

  7. Dollár P Piotr’s computer vision matlab toolbox (PMT). http://vision.ucsd.edu/pdollar/toolbox/doc/index.html

  8. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance. IEEE, pp 65–72

  9. Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: The British machine vision conference, vol 2, p 5

  10. Dyana A, Das S (2010) Mst-css (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans Circuits Syst for Video Technol 20(8):1080–1094

    Article  Google Scholar 

  11. Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8

  12. Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588

    Article  Google Scholar 

  13. Harandi MT, Sanderson C, Shirazi S, Lovell BC (2013) Kernel analysis on grassmann manifolds for action recognition. Pattern Recog Lett 34(15):1906–1915

    Article  Google Scholar 

  14. Iosifidis A, Tefas A, Pitas I (2013) Minimum class variance extreme learning machine for human action recognition. IEEE Trans Circuits Syst Video Technol 23 (11):1968–1979

    Article  Google Scholar 

  15. Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65

    Article  Google Scholar 

  16. Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recog Lett 33(4):446–452

    Article  Google Scholar 

  17. Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428

    Article  Google Scholar 

  18. Kim TK, Wong KYK, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  19. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500

    Article  MathSciNet  MATH  Google Scholar 

  20. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp. 2046–2053

  21. Lai K, Konrad J, Ishwar P (2012) A gesture-driven computer interface using kinect. In: IEEE Southwest symposium on image analysis and interpretation. IEEE, pp 185–188

  22. Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3361–3368

  23. Li H, Tang J, Wu S, Zhang Y, Lin S (2010) Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans Circuits Syst Video Technol 20(3):351–364

    Article  Google Scholar 

  24. Lin W, Sun MT, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circuits Syst Video Technol 20(8):1057–1067

    Article  Google Scholar 

  25. Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208

    Article  Google Scholar 

  26. Lui YM (2012) Human gesture recognition on product manifolds. J Mach Learn Res 13(1):3297–3321

    MathSciNet  MATH  Google Scholar 

  27. Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Trans Circuits Syst Video Technol 22(6):930–942

    Article  Google Scholar 

  28. Lui YM, Beveridge JR (2011) Tangent bundle for human action recognition. In: IEEE international conference on automatic face & gesture recognition and workshops. IEEE, pp 97–102

  29. Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 833–839

  30. Ma AJ, Yuen PC, Zou WW, Lai JH (2013) Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Trans Circuits Syst Video Technol 23(8):1447–1460

    Article  Google Scholar 

  31. Minhas R, Mohammed AA, Wu QJ (2012) Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol 22 (11):1529–1541

    Article  Google Scholar 

  32. Nagendar G, Bandiatmakuri SG, Tandarpally MG, Jawahar C (2013) Action recognition using canonical correlation kernels. In: Asian conference on computer vision. Springer, pp 479–492

  33. O’Hara S, Draper BA (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1210–1217

  34. Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1242–1249

  35. Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  36. Ross A, Jain A (2003) Information fusion in biometrics. Pattern Recog Lett 24(13):2115–2125

    Article  Google Scholar 

  37. Rougier C, Meunier J, St-Arnaud A, Rousseau J (2011) Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol 21(5):611–622

    Article  Google Scholar 

  38. Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1234–1241

  39. Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141

    Article  Google Scholar 

  40. Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827

    Article  Google Scholar 

  41. Shi F, Petriu E, Laganiere R (2013) Sampling strategies for real-time action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2595–2602

  42. Song Y, Demirdjian D, Davis R (2012) Continuous body and hand gesture recognition for natural human-computer interaction. ACM Trans Interact Intell Syst 2 (1):5

    Article  Google Scholar 

  43. Song Y, Zheng YT, Tang S, Zhou X, Zhang Y, Lin S, Chua TS (2011) Localized multiple kernel learning for realistic human action recognition in videos. IEEE Trans Circuits Syst Video Technol 21(9):1193–1202

    Article  Google Scholar 

  44. Sun D, Roth S, Black MJ (2010) Secrets of optical flow estimation and their principles. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2432–2439

  45. Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3169–3176

  46. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79

    Article  MathSciNet  Google Scholar 

  47. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 489–496

  48. Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural svm. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431

    Article  Google Scholar 

  49. Yang M, Dai D, Shen L, Van Gool L (2014) Latent dictionary learning for sparse representation based classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 4138–4145

  50. Yuan C, Hu W, Tian G, Yang S, Wang H (2013) Multi-task sparse learning with beta process prior for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 423–429

  51. Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182–1190

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Yang Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsieh, CY., Lin, WY. Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors. Multimed Tools Appl 76, 7575–7594 (2017). https://doi.org/10.1007/s11042-016-3407-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3407-1

Keywords

Navigation