Abstract
Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Hames, M., Rigoll, G.: A multi-modal mixed-state dynamic Bayesian network for robust meeting event recognition from disturbed data. In: IEEE International Conference on Multimedia and Expo (ICME 2005), pp. 45–48 (2005)
Albanese, M., Chellappa, R., Moscato, V., Picariello, A., Subrahmanian, V.S., Turaga, P., Udrea, O.: A constrained probabilistic Petri Net framework for human activity detection in video. IEEE Trans. Multimed. 10(6), 982–996 (2008)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision (2005)
Bobick, A.F.: Movement, activity and action: the role of knowledge in the perception of motion. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 352(1358), 1257 (1997)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2002)
Bosch, A., Munoz, X., Marti, R.: Which is the best way to organize/classify images by content? Image Vis. Comput. 25(6), 778–791 (2007)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893, San Diego (2005)
Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: Proceedings of the British Machine Vision Conference, Aberystwyth, UK (2010)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: International Conference on Computer Vision (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results (2010)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010)
Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Interpretation of complex situations in a cognitive surveillance framework. Signal Process. Image Commun. 23(7), 554–569 (2008)
Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Determining the best suited semantic events for cognitive surveillance. Expert Syst. Appl. 38(4), 4068–4079 (2011)
Fusier, F., Valentin, V., Brémond, F., Thonnat, M., Borg, M., Thirde, D., Ferryman, J.: Video understanding for complex activity recognition. Mach. Vis. Appl. 18(3), 167–188 (2007)
Gonzàlez, J.: Human sequence evaluation: the key-frame approach. PhD thesis, UAB, Spain (2004). http://www.cvc.uab.es/~poal/hse/hse.htm
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: International Conference on Computer Vision, pp. 84–93 (2001)
Ikizler, N., Duygulu, P.I.: Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)
Ikizler, N., Forsyth, D.A.: Searching video for complex activities with finite state models. In: CVPR (2007)
Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning actions from the web. In: International Conference on Computer Vision (2009)
Kitani, K.M., Sato, Y., Sugimoto, A.: Recovering the basic structure of human activities from noisy video-based symbol strings. Int. J. Pattern Recognit. Artif. Intell. 22(8), 1621–1646 (2008)
Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: European Conference on Computer Vision, pp. 336–349 (2008)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska (2008)
Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 2169–2178 (2006)
Li, L.-J., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: International Conference on Computer Vision (2007)
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 77–116 (1998)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Kerkyra, Greece, p. 1150 (1999)
Mahajan, D., Kwatra, N., Jain, S., Kalra, P., Banerjee, S.: A framework for activity recognition and detection of unusual activities. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing. Citeseer, University Park (2004)
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Masoud, O., Papanikolopoulos, N.: A method for human action recognition. Image Vis. Comput. 21(8), 729–743 (2003)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: Proceedings of the National Conference on Artificial Intelligence, pp. 770–776 (2002)
Nagel, H.H.: From image sequences towards conceptual descriptions. Image Vis. Comput. 6(2), 59–74 (1988)
Nagel, H.H., Gerber, R.: Representation of occurrences for road vehicle traffic. AI Mag. 172(4–5), 351–391 (2008)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Noceti, N., Santoro, M., Odone, F., Disi, V.D.: String-based spectral clustering for understanding human behaviours. In: Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences, pp. 19–27 (2008)
Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22, 831 (2000)
Pedersoli, M., Gonzàlez, J., Bagdanov, A.D., Roca, F.X.: Recursive coarse-to-fine localization for fast object detection. In: European Conference on Computer Vision (2010)
Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)
Roth, D., Koller-Meier, E., Van Gool, L.: Multi-object tracking evaluated on sparse events. Multimed. Tools Appl. 1–19 (September 2009), online
Rowe, D., Rius, I., Gonzàlez, J., Villanueva, J.J.: Improving tracking by handling occlusions. In: 3rd ICAPR. LNCS, vol. 2, pp. 384–393. Springer, Berlin (2005)
Saxena, S., Brémond, F., Thonnat, M., Ma, R.: Crowd behavior recognition for video surveillance. In: Advanced Concepts for Intelligent Vision Systems, pp. 970–981 (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition, Cambridge, UK (2004)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (2006)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 1349–1380 (2000)
Smith, P., da Vitoria, N., Shah, M.: Temporal boost for event recognition. In: 10th IEEE International Conference on Computer Vision, October 2005
Vu, V.T., Brémond, F., Thonnat, M.: Automatic video interpretation: A recognition algorithm for temporal scenarios based on pre-compiled scenario models. Comput. Vis. Syst. 523–533 (2003)
Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 1654–1661 (2006)
Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. Int. J. Comput. Vis. 67(1), 21–51 (2006)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Zheng, H., Wang, H., Black, N.: Human activity detection in smart home environment with self-adaptive neural networks. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC), pp. 1505–1510, April 2008
Acknowledgements
We gratefully acknowledge Marco Pedersoli in providing the detection module. This work was initially supported by the EU Project FP6 HERMES IST-027110 and VIDI-Video IST-045547. Also, the authors acknowledge the support of the Spanish Research Programs Consolider-Ingenio 2010: MIPRCV (CSD200700018); Avanza I+D ViCoMo (TSI-020400-2009-133); CENIT-IMAGENIO 2010 SEGUR@; along with the Spanish projects TIN2009-14501-C02-01 and TIN2009-14501-C02-02.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Shapovalova, N., Fernández, C., Roca, F.X., Gonzàlez, J. (2011). Semantics of Human Behavior in Image Sequences. In: Salah, A., Gevers, T. (eds) Computer Analysis of Human Behavior. Springer, London. https://doi.org/10.1007/978-0-85729-994-9_7
Download citation
DOI: https://doi.org/10.1007/978-0-85729-994-9_7
Publisher Name: Springer, London
Print ISBN: 978-0-85729-993-2
Online ISBN: 978-0-85729-994-9
eBook Packages: Computer ScienceComputer Science (R0)