Analysis of Temporal Coherence in Videos for Action Recognition

Saleh, Adel; Abdel-Nasser, Mohamed; Akram, Farhan; Garcia, Miguel Angel; Puig, Domenec

doi:10.1007/978-3-319-41501-7_37

Analysis of Temporal Coherence in Videos for Action Recognition

Adel Saleh¹⁵,
Mohamed Abdel-Nasser¹⁵,
Farhan Akram¹⁵,
Miguel Angel Garcia¹⁶ &
…
Domenec Puig¹⁵

Conference paper
First Online: 01 July 2016

2792 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9730))

Abstract

This paper proposes an approach to improve the performance of activity recognition methods by analyzing the coherence of the frames in the input videos and then modeling the evolution of the coherent frames, which constitute a sub-sequence, to learn a representation for the videos. The proposed method consist of three steps: coherence analysis, representation leaning and classification. Using two state-of-the-art datasets (Hollywood2 and HMDB51), we demonstrate that learning the evolution of subsequences in lieu of frames, improves the recognition results and makes actions classification faster.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008–19th British Machine Vision Conference, pp. 275:1–275:10. British Machine Vision Association (2008)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Article Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision & Pattern Recognition. http://lear.inrialpes.fr/pubs/2008/LMSR08
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1817–1824 (2013)
Google Scholar
Saleh, A., Garcia, M.A., Akram, F., Abdel-Nasser, M., Puig, D.: Exploiting the kinematic of the trajectories of the local descriptors to improve human action recognition (2016)
Google Scholar
Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. CoRR abs/1511.04119 (2015). http://arxiv.org/abs/1511.04119
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. CoRR abs/1406.2199 (2014). http://arxiv.org/abs/1406.2199
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. arXiv preprint arXiv:1502.04681 (2015)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR abs/1502.04681 (2015). http://arxiv.org/abs/1502.04681
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR abs/1412.0767 (2014). http://arxiv.org/abs/1412.0767
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013)
Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference, pp. 124.1–124.11. BMVA Press (2009)
Google Scholar
Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations. CoRRabs/1512.00795 (2015). http://arxiv.org/abs/1512.00795
Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations (2015)
Google Scholar
Ng, J.Y.-H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)
Google Scholar

Download references

Acknowledgment

This work was partly supported by Universitat Rovira i Virgili, Spain, and Hodeidah University, Yemen.

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain
Adel Saleh, Mohamed Abdel-Nasser, Farhan Akram & Domenec Puig
Department of Electronic and Communications Technology, Autonomous University of Madrid, Madrid, Spain
Miguel Angel Garcia

Authors

Adel Saleh
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abdel-Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Farhan Akram
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Angel Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Domenec Puig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adel Saleh .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
Aurélio Campilho
Department of Electrical, University of Waterloo, Waterloo, Ontario, Canada
Fakhri Karray

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saleh, A., Abdel-Nasser, M., Akram, F., Garcia, M.A., Puig, D. (2016). Analysis of Temporal Coherence in Videos for Action Recognition. In: Campilho, A., Karray, F. (eds) Image Analysis and Recognition. ICIAR 2016. Lecture Notes in Computer Science(), vol 9730. Springer, Cham. https://doi.org/10.1007/978-3-319-41501-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-41501-7_37
Published: 01 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41500-0
Online ISBN: 978-3-319-41501-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics