Skip to main content

Human Interaction Recognition by Spatial Structure Models

  • Conference paper
Intelligence Science and Big Data Engineering (IScIDE 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8261))

Abstract

In this paper, we focus on the recognition and localization of human interactions in real-world videos. It is a difficult challenge because of large variations in person appearance, camera viewpoint, length of video, intra-class variability, and etc. To address these challenges, we present a spatial structure model in this paper. In our model, the crucial movement of each category is represented using a segment of the entire video. To capture the spatial configuration of the human interactions within the video segment, a spatial structure model is built over the segment, and trajectory features are extracted within each cell. The proposed model is trained automatically from real-world videos that are annotated only with the classification label. We examine our approach on the TVHI dataset, which contain 4 complex human interaction action classes. The experimental results demonstrate the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)

    Google Scholar 

  2. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. (2008)

    Google Scholar 

  4. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  5. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: Proc. ICCV (2007)

    Google Scholar 

  6. Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. ICCV (2003)

    Google Scholar 

  7. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. CVPR (2008)

    Google Scholar 

  8. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)

    Google Scholar 

  9. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proc. CVPR (2009)

    Google Scholar 

  10. Liu, J., Shah, M.: Learning human actions via information maximization. In: Proc. CVPR (2008)

    Google Scholar 

  11. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. CVPR (2009)

    Google Scholar 

  12. Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2441–2453 (2012)

    Article  Google Scholar 

  14. Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: Recognising human interactions in tv shows. In: Proc. BMVC (2010)

    Google Scholar 

  15. Rodriguez, M., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Proc. CVPR (2008)

    Google Scholar 

  16. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proc. ICPR (2004)

    Google Scholar 

  17. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proc. ACM Multimedia (2007)

    Google Scholar 

  18. Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: Proc. CVPR (2012)

    Google Scholar 

  19. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proc. ICML (2004)

    Google Scholar 

  20. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: Proc. CVPR (2010)

    Google Scholar 

  21. Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proc. BMVC (2009)

    Google Scholar 

  22. Wang, H., Klaser, A., Schimid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proc. CVPR (2011)

    Google Scholar 

  23. Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: Proc. ICML (2009)

    Google Scholar 

  24. Yuille, A., Rangarajan, A.: The concave-convex procedure (cccp). In: Proc. NIPS, pp. 1033–1040 (2001)

    Google Scholar 

  25. Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: Proc. CVPR (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, J., Chen, F., Hu, D. (2013). Human Interaction Recognition by Spatial Structure Models. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds) Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science, vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-42057-3_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-42056-6

  • Online ISBN: 978-3-642-42057-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics