Abstract
In this paper, we focus on the recognition and localization of human interactions in real-world videos. It is a difficult challenge because of large variations in person appearance, camera viewpoint, length of video, intra-class variability, and etc. To address these challenges, we present a spatial structure model in this paper. In our model, the crucial movement of each category is represented using a segment of the entire video. To capture the spatial configuration of the human interactions within the video segment, a spatial structure model is built over the segment, and trajectory features are extracted within each cell. The proposed model is trained automatically from real-world videos that are annotated only with the classification label. We examine our approach on the TVHI dataset, which contain 4 complex human interaction action classes. The experimental results demonstrate the effectiveness of our model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. (2008)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: Proc. ICCV (2007)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. ICCV (2003)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. CVPR (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proc. CVPR (2009)
Liu, J., Shah, M.: Learning human actions via information maximization. In: Proc. CVPR (2008)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. CVPR (2009)
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2441–2453 (2012)
Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: Recognising human interactions in tv shows. In: Proc. BMVC (2010)
Rodriguez, M., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Proc. CVPR (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proc. ICPR (2004)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proc. ACM Multimedia (2007)
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: Proc. CVPR (2012)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proc. ICML (2004)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: Proc. CVPR (2010)
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proc. BMVC (2009)
Wang, H., Klaser, A., Schimid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proc. CVPR (2011)
Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: Proc. ICML (2009)
Yuille, A., Rangarajan, A.: The concave-convex procedure (cccp). In: Proc. NIPS, pp. 1033–1040 (2001)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: Proc. CVPR (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, J., Chen, F., Hu, D. (2013). Human Interaction Recognition by Spatial Structure Models. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds) Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science, vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-42057-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42056-6
Online ISBN: 978-3-642-42057-3
eBook Packages: Computer ScienceComputer Science (R0)