Abstract
In this paper we describe an action/interaction detection system based on improved dense trajectories [19], multiple visual descriptors and bag-of-features representation. Given that the actions/interactions are not mutual exclusive, we train a binary classifier for every predefined action/interaction. We rely on a non-overlapped temporal sliding window to enable the temporal localization. We have tested our system in ChaLearn Looking at People Challenge 2014 Track 2 dataset [1, 2]. We obtained 0.4226 average overlap, which is the 3rd place in the track of the challenge. Finally, we provide an extensive analysis of the performance of this system on different actions and provide possible ways to improve a general action detection system.
Chapter PDF
Similar content being viewed by others
References
Escalera, S., et al.: ChaLearn looking at people challenge 2014: dataset and results. In: Bronstein, M., Agapito, L., Rother, C. (eds.) Computer Vision - ECCV 2014 Workshops. LNCS, vol. 8925, pp. 459–473. Springer, Heidelberg (2015)
Snchez, D., Bautista, M., Escalera, S.: HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs. Neurocomputing (2014)
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)
Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the International Conference On Computer Vision, ICCV (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, PETS (2005)
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR (2004)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2011)
Ayazoglu, M., Yilmaz, B., Sznaier, M., Camps, O.: Finding causal interactions in video sequences. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW (2012)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, BMVC (2009)
Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)
Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2007)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2011)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2012)
Zhang, W., Zhu, M., Derpanis, K.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)
Oneata, D., Verbeek, J., Schmid, C.: Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv:1406.2199v1 (2014)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50 (1988)
Jain, M., Jgou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2005)
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Shu, Z., Yun, K., Samaras, D. (2015). Action Detection with Improved Dense Trajectories and Sliding Window. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8925. Springer, Cham. https://doi.org/10.1007/978-3-319-16178-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-16178-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16177-8
Online ISBN: 978-3-319-16178-5
eBook Packages: Computer ScienceComputer Science (R0)