Abstract
Human action categories exhibit significant intra-class variation. Changes in viewpoint, human appearance, and the temporal evolution of an action confound recognition algorithms. In order to address this, we present an approach to discover action primitives, sub-categories of action classes, that allow us to model this intra-class variation. We learn action primitives and their interrelations in a multi-level spatio-temporal model for action recognition. Action primitives are discovered via a data-driven clustering approach that focuses on repeatable, discriminative sub-categories. Higher-level interactions between action primitives and the actions of a set of people present in a scene are learned. Empirical results demonstrate that these action primitives can be effectively localized, and using them to model action classes improves action recognition performance on challenging datasets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: discriminative models for contextual group activities. In: NIPS (2010)
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)
Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012)
Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discovery in human events. In: CVPR (2013)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. T-PAMI 32, 1672–1645 (2010)
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. T-CSVT (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)
Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: CVPR (2013)
Wang, H., Kläser, A., C.Schmid, Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)
Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: CVPR (2013)
Shugao Ma, Jianming Zhang, N.I.C., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: ICCV (2013)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: CVPR (1992)
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI (2002)
Bobick, A., Wilson, A.: A state-based technique for the summarization and recognition of gesture. In: ICCV (1995)
Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)
Médioni, G., Cohen, I., Brémond, F., Hongeng, S., Nevatia, R.: Event detection and analysis from video streams. T-PAMI 23, 873–889 (2001)
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)
Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In: International Workshop on Sign, Gesture, Activity (2010)
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)
Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Discovering primitive action categories by leveraging relevant visual context. In: ECCV Workshop on Visual Surveillance (2008)
Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)
Lan, T., Sigal, L., Raptis, M., Mori, G.: From subcategories to visual composites: a multi-level framework for object detection. In: ICCV (2013)
Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010)
Todorovic, S., Ahuja, N.: Learning subcategory relevances for category recognition. In: CVPR (2008)
Gu, C., Arbeláez, P., Lin, Y., Yu, K., Malik, J.: Multi-component Models for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 445–458. Springer, Heidelberg (2012)
Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking via medoidshifts. In: ICCV (2007)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science (2007)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2, 265–292 (2001)
Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Choi, W., Shahid, K., Savarese, S.: What are they doing?: collective activity classification using spatial-temporal relationship among people. In: International Workshop on Visual Surveillance (2009)
Sadanand, S., Corso, J.J.: Action Bank: a high-level representation of activity in video. In: CVPR (2012)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatial-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)
Alexe, B., Deselares, T., Ferrari, V.: What is an object?. In: CVPR (2010)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lan, T., Chen, L., Deng, Z., Zhou, GT., Mori, G. (2015). Learning Action Primitives for Multi-level Video Event Understanding. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8927. Springer, Cham. https://doi.org/10.1007/978-3-319-16199-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-16199-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16198-3
Online ISBN: 978-3-319-16199-0
eBook Packages: Computer ScienceComputer Science (R0)