Action Detection with Improved Dense Trajectories and Sliding Window

Shu, Zhixin; Yun, Kiwon; Samaras, Dimitris

doi:10.1007/978-3-319-16178-5_38

Zhixin Shu¹⁶,
Kiwon Yun¹⁶ &
Dimitris Samaras¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8925))

Included in the following conference series:

European Conference on Computer Vision

5334 Accesses
5 Citations

Abstract

In this paper we describe an action/interaction detection system based on improved dense trajectories [19], multiple visual descriptors and bag-of-features representation. Given that the actions/interactions are not mutual exclusive, we train a binary classifier for every predefined action/interaction. We rely on a non-overlapped temporal sliding window to enable the temporal localization. We have tested our system in ChaLearn Looking at People Challenge 2014 Track 2 dataset [1, 2]. We obtained 0.4226 average overlap, which is the 3rd place in the track of the challenge. Finally, we provide an extensive analysis of the performance of this system on different actions and provide possible ways to improve a general action detection system.

Download to read the full chapter text

Chapter PDF

A Robust and Efficient Video Representation for Action Recognition

Article 17 July 2015

A survey on aggregating methods for action recognition with dense trajectories

Article 15 March 2015

Human action and event recognition using a novel descriptor based on improved dense trajectories

Article 03 July 2017

Keywords

References

Escalera, S., et al.: ChaLearn looking at people challenge 2014: dataset and results. In: Bronstein, M., Agapito, L., Rother, C. (eds.) Computer Vision - ECCV 2014 Workshops. LNCS, vol. 8925, pp. 459–473. Springer, Heidelberg (2015)
Google Scholar
Snchez, D., Bautista, M., Escalera, S.: HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs. Neurocomputing (2014)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)
Article Google Scholar
Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)
Article Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the International Conference On Computer Vision, ICCV (2005)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, PETS (2005)
Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR (2004)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2011)
Google Scholar
Ayazoglu, M., Yilmaz, B., Sznaier, M., Camps, O.: Finding causal interactions in video sequences. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)
Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW (2012)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)
Google Scholar
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, BMVC (2009)
Google Scholar
Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)
Article Google Scholar
Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2007)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Google Scholar
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Zhang, W., Zhu, M., Derpanis, K.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv:1406.2199v1 (2014)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50 (1988)
Google Scholar
Jain, M., Jgou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2005)
Google Scholar
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Stony Brook University, Stony Brook, NY, 11794, USA
Zhixin Shu, Kiwon Yun & Dimitris Samaras

Authors

Zhixin Shu
View author publications
You can also search for this author in PubMed Google Scholar
Kiwon Yun
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Samaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Shu .

Editor information

Editors and Affiliations

University College London, London, United Kingdom
Lourdes Agapito
University of Lugano, Lugano, Switzerland
Michael M. Bronstein
Technische Universität Dresden, Dresden, Germany
Carsten Rother

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, Z., Yun, K., Samaras, D. (2015). Action Detection with Improved Dense Trajectories and Sliding Window. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8925. Springer, Cham. https://doi.org/10.1007/978-3-319-16178-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-16178-5_38
Published: 19 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16177-8
Online ISBN: 978-3-319-16178-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Action Detection with Improved Dense Trajectories and Sliding Window

Abstract

Chapter PDF

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

A survey on aggregating methods for action recognition with dense trajectories

Human action and event recognition using a novel descriptor based on improved dense trajectories

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Action Detection with Improved Dense Trajectories and Sliding Window

Abstract

Chapter PDF

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

A survey on aggregating methods for action recognition with dense trajectories

Human action and event recognition using a novel descriptor based on improved dense trajectories

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation