DeepGRU: Deep Gesture Recognition Utility

Maghoumi, Mehran; LaViola, Joseph J.

doi:10.1007/978-3-030-33720-9_2

Mehran Maghoumi^20,21 &
Joseph J. LaViola Jr.²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11844))

Included in the following conference series:

International Symposium on Visual Computing

2308 Accesses
24 Citations

The original version of this chapter was revised: the name of an author was tagged incorrectly. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-33720-9_54

Abstract

We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of stacked gated recurrent units (GRU), two fully-connected layers and a novel global attention model. We evaluate our method on seven publicly available datasets, containing various number of samples and spanning over a broad range of interactions (full-body, multi-actor, hand gestures, etc.). In all but one case we outperform the state-of-the-art pose-based methods. For instance, we achieve a recognition accuracy of 84.9% and 92.3% on cross-subject and cross-view tests of the NTU RGB+D dataset respectively, and also 100% recognition accuracy on the UT-Kinect dataset. We show that even in the absence of powerful hardware, or a large amount of training data, and with as little as four samples per class, DeepGRU can be trained in under 10 min while beating traditional methods specifically designed for small training sets, making it an enticing choice for rapid application prototyping and development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

21 October 2019
The given name and family name of an author were not tagged correctly in the originally published article. The author’s given name is “Joseph J.” and his family name is “LaViola.” This was corrected.

Notes

1.
Reference implementation is available at: https://github.com/Maghoumi/DeepGRU.
2.
A factor of ±0.3 indicates that samples are randomly and non-uniformly (e.g. ) scaled along all axes to [0.7, 1.3] of their original size.
3.
Refer to our supplementary material for more details: https://arxiv.org/abs/1810.12514.

References

Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2015)
Google Scholar
Avola, D., Bernardi, M., Cinque, L., Foresti, G.L., Massaroni, C.: Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans. Multimed. 21, 234–245 (2018)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)
Google Scholar
Baradel, F., Wolf, C., Mille, J.: Human action recognition: pose-based attention draws focus to hands. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 604–613 (2017)
Google Scholar
Baradel, F., Wolf, C., Mille, J., Taylor, G.W.: Glimpse clouds: human activity recognition from unstructured feature points. In: The IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (2017)
Google Scholar
Caputo, F.M., et al.: Online gesture recognition. In: Eurographics Workshop on 3D Object Retrieval (2019)
Google Scholar
Cheema, S., Hoffman, M., LaViola, J.J.: 3D gesture classification with linear acceleration and angular velocity sensing devices for video games. Entertain. Comput. 4(1), 11–24 (2013)
Article Google Scholar
Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2881–2885 (2017)
Google Scholar
Cherian, A., Sra, S., Gould, S., Hartley, R.: Non-linear temporal subspace representations for activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2018)
Google Scholar
Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Google Scholar
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
Google Scholar
De Smedt, Q., Wannous, H., Vandeborre, J.-P.: 3D hand gesture recognition by analysing set-of-joints trajectories. In: Wannous, H., Pala, P., Daoudi, M., Flórez-Revuelta, F. (eds.) UHA3DS 2016. LNCS, vol. 10188, pp. 86–97. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91863-1_7
Chapter Google Scholar
De Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3D hand gesture recognition using a depth and skeletal dataset. In: 10th Eurographics Workshop on 3D Object Retrieval (2017)
Google Scholar
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 106–113 (2018)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Fernández-Ramírez, J., Álvarez-Meza, A., Orozco-Gutiérrez, Á.: Video-based human action recognition using kernel relevance analysis. In: Bebis, G., et al. (eds.) ISVC 2018. LNCS, vol. 11241, pp. 116–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03801-4_11
Chapter Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, J., Zheng, W., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2186–2200 (2017)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift, pp. 448–456 (2015)
Google Scholar
Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017)
Article Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4570–4579. IEEE (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kratz, S., Rohs, M.: The \$3 recognizer: Simple 3D gesture recognition on mobile devices. In: Proceedings of the 15th International Conference on Intelligent User Interfaces (2010)
Google Scholar
Kratz, S., Rohs, M.: Protractor3D: a closed-form solution to rotation-invariant 3D gestures. In: Proceedings of the 16th International Conference on Intelligent User Interfaces (2011)
Google Scholar
Liu, J., Wang, G., Duan, L., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Chapter Google Scholar
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68(C), 346–362 (2017)
Article Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Google Scholar
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: The IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2018)
Google Scholar
Núñez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Vélez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76(C), 80–94 (2018)
Article Google Scholar
Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG2 for action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Google Scholar
Pittman, C.R., LaViola Jr., J.J.: Multiwave: complex hand gesture recognition using the doppler effect. In: Proceedings of the 43rd Graphics Interface Conference. pp. 97–106 (2017)
Google Scholar
Shahroudy, A., Ng, T., Gong, Y., Wang, G.: Deep multimodal feature analysis for action recognition in RGB+D videos. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1045–1058 (2018)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTURGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Smedt, Q.D., Wannous, H., Vandeborre, J.: Skeleton-based dynamic hand gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1206–1214 (2016)
Google Scholar
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. AAAI. 1, 4263–4270 (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Taranta, II, E.M., LaViola Jr., J.J.: Penny pincher: a blazing fast, highly accurate \$-family recognizer. In: Proceedings of the 41st Graphics Interface Conference, pp. 195–202 (2015)
Google Scholar
Taranta II, E.M., Maghoumi, M., Pittman, C.R., LaViola Jr., J.J.: A rapid prototyping approach to synthetic data generation for improved 2D gesture recognition. In: Proceedings of the 29th Symposium on User Interface Software and Technology, pp. 873–885. ACM (2016)
Google Scholar
Taranta II, E.M., Samiei, A., Maghoumi, M., Khaloo, P., Pittman, C.R., LaViola Jr., J.J.: Jackknife: a reliable recognizer with few samples and many modalities. In: Proceedings of the 2017 Conference on Human Factors in Computing Systems, pp. 5850–5861 (2017)
Google Scholar
Tas, Y., Koniusz, P.: CNN-based action recognition and supervised domain adaptation on 3D body skeletons via kernel feature maps. In: BMVC (2018)
Google Scholar
Tewari, A., Taetz, B., Grandidier, F., Stricker, D.: Two phase classification for early hand gesture recognition in 3D top view data. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 353–363. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_33
Chapter Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Google Scholar
Vrigkas, M., Mastora, E., Nikou, C., Kakadiaris, I.A.: Robust incremental hidden conditional random fields for human action recognition. In: Bebis, G., et al. (eds.) ISVC 2018. LNCS, vol. 11241, pp. 126–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03801-4_12
Chapter Google Scholar
Weng, J., Weng, C., Yuan, J.: Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 445–454 (2017)
Google Scholar
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3D action and gesture recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 142–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_9
Chapter Google Scholar
Xia, L., Chen, C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE (2012)
Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE (2012)
Google Scholar
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2136–2145 (2017)
Google Scholar
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

NVIDIA, Santa Clara, CA, 95051, USA
Mehran Maghoumi
University of Central Florida, Orlando, FL, 32816, USA
Mehran Maghoumi & Joseph J. LaViola Jr.

Authors

Mehran Maghoumi
View author publications
You can also search for this author in PubMed Google Scholar
Joseph J. LaViola Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehran Maghoumi .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
University of Nevada, Reno, NV, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Daniela Ushizima
Latent AI, Palo Alto, CA, USA
Sek Chai
Texas A&M University, College Station, TX, USA
Shinjiro Sueda
Louisiana State University, Baton Rouge, LA, USA
Xin Lin
University of North Carolina at Charlotte, Charlotte, NC, USA
Aidong Lu
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Daniel Thalmann
Notre Dame University, Notre Dame, IN, USA
Chaoli Wang
Bosch Research North America, Palo Alto, CA, USA
Panpan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maghoumi, M., LaViola, J.J. (2019). DeepGRU: Deep Gesture Recognition Utility. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-33720-9_2
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics