A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos

Das, Srijan; Thonnat, Monique; Sakhalkar, Kaustubh; Koperski, Michal; Bremond, Francois; Francesca, Gianpiero

doi:10.1007/978-3-030-05716-9_40

A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos

Srijan Das¹⁹,
Monique Thonnat¹⁹,
Kaustubh Sakhalkar¹⁹,
Michal Koperski¹⁹,
Francois Bremond¹⁹ &
…
Gianpiero Francesca²⁰

Conference paper
First Online: 11 December 2018

2233 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Abstract

Activity Recognition from RGB-D videos is still an open problem due to the presence of large varieties of actions. In this work, we propose a new architecture by mixing a high level handcrafted strategy and machine learning techniques. We propose a novel two level fusion strategy to combine features from different cues to address the problem of large variety of actions. As similar actions are common in daily living activities, we also propose a mechanism for similar action discrimination. We validate our approach on four public datasets, CAD-60, CAD-120, MSRDailyActivity3D, and NTU-RGB+D improving the state-of-the-art results on them.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org. https://www.tensorflow.org/
Baradel, F., Wolf, C., Mille, J.: Human action recognition: pose-based attention draws focus to hands. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 604–613, October 2017
Google Scholar
Baradel, F., Wolf, C., Mille, J., Taylor, G.W.: Glimpse clouds: human activity recognition from unstructured feature points. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE (2017)
Google Scholar
Cheron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)
Google Scholar
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Das, S., Koperski, M., Bremond, F., Francesca, G.: A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition. ArXiv e-prints, February 2018
Google Scholar
Das, S., Koperski, M., Bremond, F., Francesca, G.: Action recognition based on a mixture of RGB and depth based skeleton. In: AVSS (2017)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Google Scholar
Koperski, M.: Human action recognition in videos with local representation. Ph.D. thesis, University COTE D’AZUR (2017)
Google Scholar
Koperski, M., Bremond, F.: Modeling spatial layout of features for real world scenario RGB-D action recognition. In: AVSS (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE (2008). https://lvdmaaten.github.io/tsne/
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: ICRA (2012)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176, June 2011
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, Australia, Sydney (2013)
Google Scholar
Wu, Y.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
Google Scholar
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148–157, March 2017
Google Scholar
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, Sophia Antipolis, 2004 Rte des Lucioles, 06902, Valbonne, France
Srijan Das, Monique Thonnat, Kaustubh Sakhalkar, Michal Koperski & Francois Bremond
Toyota Motor Europe, Hoge Wei 33, 1930, Zaventem, Belgium
Gianpiero Francesca

Authors

Srijan Das
View author publications
You can also search for this author in PubMed Google Scholar
Monique Thonnat
View author publications
You can also search for this author in PubMed Google Scholar
Kaustubh Sakhalkar
View author publications
You can also search for this author in PubMed Google Scholar
Michal Koperski
View author publications
You can also search for this author in PubMed Google Scholar
Francois Bremond
View author publications
You can also search for this author in PubMed Google Scholar
Gianpiero Francesca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srijan Das .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, S., Thonnat, M., Sakhalkar, K., Koperski, M., Bremond, F., Francesca, G. (2019). A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_40
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics