Skip to main content

Analysis of Temporal Coherence in Videos for Action Recognition

  • Conference paper
  • First Online:
  • 2792 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9730))

Abstract

This paper proposes an approach to improve the performance of activity recognition methods by analyzing the coherence of the frames in the input videos and then modeling the evolution of the coherent frames, which constitute a sub-sequence, to learn a representation for the videos. The proposed method consist of three steps: coherence analysis, representation leaning and classification. Using two state-of-the-art datasets (Hollywood2 and HMDB51), we demonstrate that learning the evolution of subsequences in lieu of frames, improves the recognition results and makes actions classification faster.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  3. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  4. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008–19th British Machine Vision Conference, pp. 275:1–275:10. British Machine Vision Association (2008)

    Google Scholar 

  5. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  6. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  7. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

    Google Scholar 

  8. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision & Pattern Recognition. http://lear.inrialpes.fr/pubs/2008/LMSR08

  9. Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1817–1824 (2013)

    Google Scholar 

  10. Saleh, A., Garcia, M.A., Akram, F., Abdel-Nasser, M., Puig, D.: Exploiting the kinematic of the trajectories of the local descriptors to improve human action recognition (2016)

    Google Scholar 

  11. Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. CoRR abs/1511.04119 (2015). http://arxiv.org/abs/1511.04119

  12. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)

    Google Scholar 

  13. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. CoRR abs/1406.2199 (2014). http://arxiv.org/abs/1406.2199

  14. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. arXiv preprint arXiv:1502.04681 (2015)

  15. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR abs/1502.04681 (2015). http://arxiv.org/abs/1502.04681

  16. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR abs/1412.0767 (2014). http://arxiv.org/abs/1412.0767

  17. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)

    Google Scholar 

  18. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013)

    Google Scholar 

  19. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference, pp. 124.1–124.11. BMVA Press (2009)

    Google Scholar 

  20. Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations. CoRRabs/1512.00795 (2015). http://arxiv.org/abs/1512.00795

  21. Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations (2015)

    Google Scholar 

  22. Ng, J.Y.-H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)

    Google Scholar 

Download references

Acknowledgment

This work was partly supported by Universitat Rovira i Virgili, Spain, and Hodeidah University, Yemen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adel Saleh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Saleh, A., Abdel-Nasser, M., Akram, F., Garcia, M.A., Puig, D. (2016). Analysis of Temporal Coherence in Videos for Action Recognition. In: Campilho, A., Karray, F. (eds) Image Analysis and Recognition. ICIAR 2016. Lecture Notes in Computer Science(), vol 9730. Springer, Cham. https://doi.org/10.1007/978-3-319-41501-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41501-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41500-0

  • Online ISBN: 978-3-319-41501-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics