Learning Action Primitives for Multi-level Video Event Understanding

Lan, Tian; Chen, Lei; Deng, Zhiwei; Zhou, Guang-Tong; Mori, Greg

doi:10.1007/978-3-319-16199-0_7

Tian Lan¹⁶,
Lei Chen¹⁷,
Zhiwei Deng¹⁷,
Guang-Tong Zhou¹⁷ &
…
Greg Mori¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8927))

Included in the following conference series:

European Conference on Computer Vision

3224 Accesses
3 Citations

Abstract

Human action categories exhibit significant intra-class variation. Changes in viewpoint, human appearance, and the temporal evolution of an action confound recognition algorithms. In order to address this, we present an approach to discover action primitives, sub-categories of action classes, that allow us to model this intra-class variation. We learn action primitives and their interrelations in a multi-level spatio-temporal model for action recognition. Action primitives are discovered via a data-driven clustering approach that focuses on repeatable, discriminative sub-categories. Higher-level interactions between action primitives and the actions of a set of people present in a scene are learned. Empirical results demonstrate that these action primitives can be effectively localized, and using them to model action classes improves action recognition performance on challenging datasets.

Download to read the full chapter text

Chapter PDF

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Enhancing early action prediction in videos through temporal composition of sub-actions

Article 18 March 2024

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

Article 26 April 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: discriminative models for contextual group activities. In: NIPS (2010)
Google Scholar
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)
Google Scholar
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)
Chapter Google Scholar
Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012)
Chapter Google Scholar
Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discovery in human events. In: CVPR (2013)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. T-PAMI 32, 1672–1645 (2010)
Article Google Scholar
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. T-CSVT (2008)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)
Google Scholar
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)
Google Scholar
Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: CVPR (2013)
Google Scholar
Wang, H., Kläser, A., C.Schmid, Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Google Scholar
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)
Google Scholar
Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: CVPR (2013)
Google Scholar
Shugao Ma, Jianming Zhang, N.I.C., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: ICCV (2013)
Google Scholar
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: CVPR (1992)
Google Scholar
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI (2002)
Google Scholar
Bobick, A., Wilson, A.: A state-based technique for the summarization and recognition of gesture. In: ICCV (1995)
Google Scholar
Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)
Google Scholar
Médioni, G., Cohen, I., Brémond, F., Hongeng, S., Nevatia, R.: Event detection and analysis from video streams. T-PAMI 23, 873–889 (2001)
Article Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)
Google Scholar
Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In: International Workshop on Sign, Gesture, Activity (2010)
Google Scholar
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)
Google Scholar
Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Google Scholar
Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Discovering primitive action categories by leveraging relevant visual context. In: ECCV Workshop on Visual Surveillance (2008)
Google Scholar
Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)
Google Scholar
Lan, T., Sigal, L., Raptis, M., Mori, G.: From subcategories to visual composites: a multi-level framework for object detection. In: ICCV (2013)
Google Scholar
Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010)
Chapter Google Scholar
Todorovic, S., Ahuja, N.: Learning subcategory relevances for category recognition. In: CVPR (2008)
Google Scholar
Gu, C., Arbeláez, P., Lin, Y., Yu, K., Malik, J.: Multi-component Models for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 445–458. Springer, Heidelberg (2012)
Chapter Google Scholar
Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking via medoidshifts. In: ICCV (2007)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science (2007)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2, 265–292 (2001)
Google Scholar
Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: What are they doing?: collective activity classification using spatial-temporal relationship among people. In: International Workshop on Visual Surveillance (2009)
Google Scholar
Sadanand, S., Corso, J.J.: Action Bank: a high-level representation of activity in video. In: CVPR (2012)
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatial-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)
Google Scholar
Alexe, B., Deselares, T., Ferrari, V.: What is an object?. In: CVPR (2010)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Stanford University, Stanford, USA
Tian Lan
Simon Fraser University, Burnaby, Canada
Lei Chen, Zhiwei Deng, Guang-Tong Zhou & Greg Mori

Authors

Tian Lan
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Deng
View author publications
You can also search for this author in PubMed Google Scholar
Guang-Tong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Greg Mori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tian Lan .

Editor information

Editors and Affiliations

University College London, London, United Kingdom
Lourdes Agapito
University of Lugano, Lugano, Switzerland
Michael M. Bronstein
Technische Universität Dresden, Dresden, Germany
Carsten Rother

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lan, T., Chen, L., Deng, Z., Zhou, GT., Mori, G. (2015). Learning Action Primitives for Multi-level Video Event Understanding. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8927. Springer, Cham. https://doi.org/10.1007/978-3-319-16199-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-16199-0_7
Published: 20 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16198-3
Online ISBN: 978-3-319-16199-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Action Primitives for Multi-level Video Event Understanding

Abstract

Chapter PDF

Similar content being viewed by others

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Enhancing early action prediction in videos through temporal composition of sub-actions

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Action Primitives for Multi-level Video Event Understanding

Abstract

Chapter PDF

Similar content being viewed by others

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Enhancing early action prediction in videos through temporal composition of sub-actions

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation