Learning Human Interaction by Interactive Phrases

Kong, Yu; Jia, Yunde; Fu, Yun

doi:10.1007/978-3-642-33718-5_22

Yu Kong^21,23,
Yunde Jia²¹ &
Yun Fu²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7572))

Included in the following conference series:

European Conference on Computer Vision

10k Accesses
66 Citations

Abstract

In this paper, we present a novel approach for human interaction recognition from videos. We introduce high-level descriptions called interactive phrases to express binary semantic motion relationships between interacting people. Interactive phrases naturally exploit human knowledge to describe interactions and allow us to construct a more descriptive model for recognizing human interactions. We propose a novel hierarchical model to encode interactive phrases based on the latent SVM framework where interactive phrases are treated as latent variables. The interdependencies between interactive phrases are explicitly captured in the model to deal with motion ambiguity and partial occlusion in interactions. We evaluate our method on a newly collected BIT-Interaction dataset and UT-Interaction dataset. Promising results demonstrate the effectiveness of the proposed method.

Download to read the full chapter text

Chapter PDF

Dyadic Interaction Detection from Pose and Flow

Modeling Supporting Regions for Close Human Interaction Recognition

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Article Open access 01 March 2018

References

Patron-Perez, A., Marszalek, M., Zissermann, A., Reid, I.: High five: Recognising human interactions in tv shows. In: BMVC (2010)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)
Google Scholar
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. PAMI 31, 1775–1789 (2009)
Article Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR, pp. 17–24 (2010)
Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: CVPR Workshop on Structued Models in Computer Vision (2010)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Google Scholar
Wang, Y., Mori, G.: A Discriminative Latent Model of Object Classes and Attributes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 155–168. Springer, Heidelberg (2010)
Chapter Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Google Scholar
Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: CVPR, pp. 872–879 (2009)
Google Scholar
Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: ICCV Workshops, pp. 1729–1736 (2011)
Google Scholar
Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV, pp. 1593–1600 (2009)
Google Scholar
Yu, T.H., Kim, T.K., Cipolla, R.: Real-time action recognition by spatiotemporal semantic and structural forests. In: BMVC (2010)
Google Scholar
Ryoo, M., Aggarwal, J.: Stochastic representation and recognition of high-level group activities. IJCV 93, 183–200 (2011)
Article MathSciNet MATH Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Google Scholar
Gupta, A., Davis, L.S.: Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 16–29. Springer, Heidelberg (2008)
Chapter Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence tree. IEEE Transactions on Information Theory 14, 462–467 (1968)
Article MATH Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS (2003)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset. In: ICPR Contest on Semantic Description of Human Activities, SDHA (2010), http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, 100081, P.R. China
Yu Kong & Yunde Jia
Department of ECE and College of CIS, Northeastern University, Boston, MA, USA
Yun Fu
Department of CSE, State University of New York, Buffalo, NY, USA
Yu Kong

Authors

Yu Kong
View author publications
You can also search for this author in PubMed Google Scholar
Yunde Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yun Fu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, Y., Jia, Y., Fu, Y. (2012). Learning Human Interaction by Interactive Phrases. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-33718-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Human Interaction by Interactive Phrases

Abstract

Chapter PDF

Similar content being viewed by others

Dyadic Interaction Detection from Pose and Flow

Modeling Supporting Regions for Close Human Interaction Recognition

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Human Interaction by Interactive Phrases

Abstract

Chapter PDF

Similar content being viewed by others

Dyadic Interaction Detection from Pose and Flow

Modeling Supporting Regions for Close Human Interaction Recognition

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation