Abstract
Understanding the activities taking place in a video is a challenging problem in Artificial Intelligence. Complex video sequences contain many activities and involve a multitude of interacting objects. Determining which objects are relevant to a particular activity is the first step in understanding the activity. Indeed many objects in the scene are irrelevant to the main activity taking place. In this work, we consider human-centric activities and look to identify which objects in the scene are involved in the activity. We take an activity-agnostic approach and rank every moving object in the scene with how likely it is to be involved in the activity. We use a comprehensive spatio-temporal representation that captures the joint movement between humans and each object. We then use supervised machine learning techniques to recognize relevant objects based on these features. Our approach is tested on the challenging Mind’s Eye dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. Technical report RT-0411, INRIA (2011)
Wolter, D., WallgrĂ¼n, J.O.: Qualitative spatial reasoning for applications: new challenges and the sparq toolbox. In: Qualitative Spatio-Temporal Representation and Reasoning: Trends and Future Directions. IGI Global (2010)
Sridhar, M., Cohn, A.G., Hogg, D.C.: Benchmarking qualitative spatial calculi for video activity analysis. In: IJCAI Workshop Benchmarks and Applications of Spatial Reasoning, pp. 15–20 (2011)
Sridhar, M., Cohn, A.G., Hogg, D.C.: Unsupervised learning of event classes from video. In: Association for the Advancement of Artificial Intelligence (AAAI) (2010)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Computer Vision and Pattern Recognition (CVPR), pp. 17–24 (2010)
Kjellström, H., Romero, J., Kragic, D.: Visual object-action recognition: Inferring object affordances from human demonstration. Comput. Vis. Image Underst. 115, 81–90 (2011)
Sokeh, H.S., Gould, S., Renz, J.: Efficient extraction and representation of spatial information from video data. In: International Joint Conferences on Artificial Intelligence (IJCAI) (2013)
Cohn, A.G., Renz, J.: Qualitative spatial representation and reasoning. In: van Hermelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 551–596. Elsevier, Amsterdam (2008)
Sridhar, M., Cohn, A.G., Hogg, D.C.: From video to RCC8: exploiting a distance based semantics to stabilise the interpretation of mereotopological relations. In: Egenhofer, M., Giudice, N., Moratz, R., Worboys, M. (eds.) COSIT 2011. LNCS, vol. 6899, pp. 110–125. Springer, Heidelberg (2011)
Cohn, A.G., Renz, J., Sridhar, M.: Thinking inside the box: A comprehensive spatial representation for video analysis. In: International Conference on Principles of Knowledge Representation and Reasoning (KR) (2012)
HernĂ¡ndez, D., Clementini, E., Felice, P.D.: Qualitative distances. In: Conference On Spatial Information Theory (COSIT), pp. 45–57 (1995)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. Pattern Anal. Mach. Intell. (PAMI) 32, 1627–1645 (2010)
Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: CVPR, pp. 2432–2439 (2010)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. Pattern Anal. Mach. Intell. (PAMI) 34, 1409–1422 (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sokeh, H.S., Gould, S., Renz, J. (2015). Determining Interacting Objects in Human-Centric Activities via Qualitative Spatio-Temporal Reasoning. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)