Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Hsieh, Chung-Yang; Lin, Wei-Yang

doi:10.1007/s11042-016-3407-1

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Published: 11 March 2016

Volume 76, pages 7575–7594, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chung-Yang Hsieh¹ &
Wei-Yang Lin¹

452 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we present a novel approach for human action and gesture recognition using dual-complementary tensors. In particular, the proposed method constructs a compact and yet discriminative representation by normalizing the input video volume into dual tensors. One tensor is obtained from the raw video volume data and the other one is obtained from the histogram of oriented gradients (HOG) features. Each tensor is converted to factored matrices and the similarity between factored matrices is evaluated using canonical correlation analysis (CCA). We, furthermore, propose an information fusion method to combine the resulting similarity scores. The proposed fusion strategy can effectively enhance discriminability between different action categories and lead to better recognition accuracy. We have conducted several experiments on two publicly available databases (UCF sports and Cambridge-Gesture). The results show that our proposed method achieves comparable recognition accuracy as the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Gesture Recognition on Product Manifolds

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Article 16 December 2015

Action Recognition Using Canonical Correlation Kernels

References

Atmosukarto I, Ahuja N, Ghanem B (2015) Action recognition using discriminative structured trajectory groups. In: IEEE winter conference on applications of computer vision. IEEE, pp 899–906
Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE conference on computer vision and pattern recognition workshops. IEEE, pp 702–707
Cristani M, Raghavendra R, Del Bue A, Murino V (2013) Human behavior analysis in video surveillance: a social signal processing perspective. Neurocomputing 100:86–97
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1. IEEE, pp 886–893
De Geest R, Tuytelaars T (2014) Dense interest features for video processing. In: IEEE international conference on image processing. IEEE, pp 5771–5775
Deng X, Liu X, Song M, Cheng J, Bu J, Chen C (2013) Lf-eme: local features with elastic manifold embedding for human action recognition. Neurocomputing 99:144–153
Article Google Scholar
Dollár P Piotr’s computer vision matlab toolbox (PMT). http://vision.ucsd.edu/pdollar/toolbox/doc/index.html
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance. IEEE, pp 65–72
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: The British machine vision conference, vol 2, p 5
Dyana A, Das S (2010) Mst-css (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans Circuits Syst for Video Technol 20(8):1080–1094
Article Google Scholar
Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8
Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588
Article Google Scholar
Harandi MT, Sanderson C, Shirazi S, Lovell BC (2013) Kernel analysis on grassmann manifolds for action recognition. Pattern Recog Lett 34(15):1906–1915
Article Google Scholar
Iosifidis A, Tefas A, Pitas I (2013) Minimum class variance extreme learning machine for human action recognition. IEEE Trans Circuits Syst Video Technol 23 (11):1968–1979
Article Google Scholar
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65
Article Google Scholar
Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recog Lett 33(4):446–452
Article Google Scholar
Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428
Article Google Scholar
Kim TK, Wong KYK, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Article MathSciNet MATH Google Scholar
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp. 2046–2053
Lai K, Konrad J, Ishwar P (2012) A gesture-driven computer interface using kinect. In: IEEE Southwest symposium on image analysis and interpretation. IEEE, pp 185–188
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3361–3368
Li H, Tang J, Wu S, Zhang Y, Lin S (2010) Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans Circuits Syst Video Technol 20(3):351–364
Article Google Scholar
Lin W, Sun MT, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circuits Syst Video Technol 20(8):1057–1067
Article Google Scholar
Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208
Article Google Scholar
Lui YM (2012) Human gesture recognition on product manifolds. J Mach Learn Res 13(1):3297–3321
MathSciNet MATH Google Scholar
Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Trans Circuits Syst Video Technol 22(6):930–942
Article Google Scholar
Lui YM, Beveridge JR (2011) Tangent bundle for human action recognition. In: IEEE international conference on automatic face & gesture recognition and workshops. IEEE, pp 97–102
Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 833–839
Ma AJ, Yuen PC, Zou WW, Lai JH (2013) Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Trans Circuits Syst Video Technol 23(8):1447–1460
Article Google Scholar
Minhas R, Mohammed AA, Wu QJ (2012) Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol 22 (11):1529–1541
Article Google Scholar
Nagendar G, Bandiatmakuri SG, Tandarpally MG, Jawahar C (2013) Action recognition using canonical correlation kernels. In: Asian conference on computer vision. Springer, pp 479–492
O’Hara S, Draper BA (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1210–1217
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1242–1249
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Ross A, Jain A (2003) Information fusion in biometrics. Pattern Recog Lett 24(13):2115–2125
Article Google Scholar
Rougier C, Meunier J, St-Arnaud A, Rousseau J (2011) Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol 21(5):611–622
Article Google Scholar
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1234–1241
Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141
Article Google Scholar
Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827
Article Google Scholar
Shi F, Petriu E, Laganiere R (2013) Sampling strategies for real-time action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2595–2602
Song Y, Demirdjian D, Davis R (2012) Continuous body and hand gesture recognition for natural human-computer interaction. ACM Trans Interact Intell Syst 2 (1):5
Article Google Scholar
Song Y, Zheng YT, Tang S, Zhou X, Zhang Y, Lin S, Chua TS (2011) Localized multiple kernel learning for realistic human action recognition in videos. IEEE Trans Circuits Syst Video Technol 21(9):1193–1202
Article Google Scholar
Sun D, Roth S, Black MJ (2010) Secrets of optical flow estimation and their principles. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2432–2439
Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3169–3176
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Article MathSciNet Google Scholar
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 489–496
Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural svm. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431
Article Google Scholar
Yang M, Dai D, Shen L, Van Gool L (2014) Latent dictionary learning for sparse representation based classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 4138–4145
Yuan C, Hu W, Tian G, Yang S, Wang H (2013) Multi-task sparse learning with beta process prior for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 423–429
Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182–1190
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan, Republic of China
Chung-Yang Hsieh & Wei-Yang Lin

Authors

Chung-Yang Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Yang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Yang Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsieh, CY., Lin, WY. Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors. Multimed Tools Appl 76, 7575–7594 (2017). https://doi.org/10.1007/s11042-016-3407-1

Download citation

Received: 01 June 2015
Revised: 10 January 2016
Accepted: 26 February 2016
Published: 11 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3407-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Abstract

Access this article

Similar content being viewed by others

Human Gesture Recognition on Product Manifolds

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Action Recognition Using Canonical Correlation Kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Abstract

Access this article

Similar content being viewed by others

Human Gesture Recognition on Product Manifolds

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Action Recognition Using Canonical Correlation Kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation