Combining Per-frame and Per-track Cues for Multi-person Action Recognition

Khamis, Sameh; Morariu, Vlad I.; Davis, Larry S.

doi:10.1007/978-3-642-33718-5_9

Sameh Khamis²¹,
Vlad I. Morariu²¹ &
Larry S. Davis²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7572))

Included in the following conference series:

European Conference on Computer Vision

10k Accesses
23 Citations

Abstract

We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual’s action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.

Download to read the full chapter text

Chapter PDF

Video Action Detection with Relational Dynamic-Poselets

Video-Based Action Detection Using Multiple Wearable Cameras

Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

Article 01 March 2016

References

Lan, T., Wang, Y., Mori, G., Robinovitch, S.N.: Retrieving actions in group contexts. In: SGA (2010)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Google Scholar
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: VS (2009)
Google Scholar
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)
Google Scholar
Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)
Google Scholar
Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. IJCV 67, 21–51 (2006)
Article Google Scholar
Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. In: AI (2007)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. IJCV 93, 183–200 (2010)
Article MathSciNet Google Scholar
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR (2009)
Google Scholar
Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: CVPR (2011)
Google Scholar
Brendel, W., Todorovic, S., Fern, A.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Google Scholar
Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR (2008)
Google Scholar
Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)
Google Scholar
Berclaz, J., Fleuret, F., Türetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. PAMI 33, 1806–1819 (2011)
Article Google Scholar
Shitrit, H.B., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global appearance constraints. In: ICCV (2011)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)
Google Scholar
Komodakis, N., Paragios, N., Tziritas, G.: Mrf optimization via dual decomposition: Message-passing revisited. In: ICCV (2007)
Google Scholar
Pearl, J.: Reverend bayes on inference engines: A distributed hierarchical approach. In: AAAI, pp. 133–136 (1982)
Google Scholar
Gamarnik, D., Shah, D., Wei, Y.: Belief propagation for min-cost network flow: convergence & correctness. In: SODA (2010)
Google Scholar
Sutton, C., McCallum, A.: Piecewise training for undirected models. In: UAI (2005)
Google Scholar
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Chapter Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: A library for large linear classification. JMLR 9, 1871–1874 (2008)
MATH Google Scholar
Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum-weight independent set. In: CVPR (2011)
Google Scholar
Weinberger, K.Q., Saul, L.K.: Fast solvers and efficient implementations for distance metric learning. In: ICML (2008)
Google Scholar
Gonfaus, J.M., Boix, X., de Weijer, J.V., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, USA
Sameh Khamis, Vlad I. Morariu & Larry S. Davis

Authors

Sameh Khamis
View author publications
You can also search for this author in PubMed Google Scholar
Vlad I. Morariu
View author publications
You can also search for this author in PubMed Google Scholar
Larry S. Davis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khamis, S., Morariu, V.I., Davis, L.S. (2012). Combining Per-frame and Per-track Cues for Multi-person Action Recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-33718-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Per-frame and Per-track Cues for Multi-person Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Video Action Detection with Relational Dynamic-Poselets

Video-Based Action Detection Using Multiple Wearable Cameras

Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Combining Per-frame and Per-track Cues for Multi-person Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Video Action Detection with Relational Dynamic-Poselets

Video-Based Action Detection Using Multiple Wearable Cameras

Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation