Semantics of Human Behavior in Image Sequences

Shapovalova, Nataliya; Fernández, Carles; Roca, F. Xavier; Gonzàlez, Jordi

doi:10.1007/978-0-85729-994-9_7

Nataliya Shapovalova³,
Carles Fernández³,
F. Xavier Roca³ &
…
Jordi Gonzàlez³

1407 Accesses
2 Citations

Abstract

Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Hames, M., Rigoll, G.: A multi-modal mixed-state dynamic Bayesian network for robust meeting event recognition from disturbed data. In: IEEE International Conference on Multimedia and Expo (ICME 2005), pp. 45–48 (2005)
Chapter Google Scholar
Albanese, M., Chellappa, R., Moscato, V., Picariello, A., Subrahmanian, V.S., Turaga, P., Udrea, O.: A constrained probabilistic Petri Net framework for human activity detection in video. IEEE Trans. Multimed. 10(6), 982–996 (2008)
Article Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision (2005)
Google Scholar
Bobick, A.F.: Movement, activity and action: the role of knowledge in the perception of motion. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 352(1358), 1257 (1997)
Article Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2002)
Article Google Scholar
Bosch, A., Munoz, X., Marti, R.: Which is the best way to organize/classify images by content? Image Vis. Comput. 25(6), 778–791 (2007)
Article Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893, San Diego (2005)
Google Scholar
Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: Proceedings of the British Machine Vision Conference, Aberystwyth, UK (2010)
Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: International Conference on Computer Vision (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results (2010)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010)
Google Scholar
Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Interpretation of complex situations in a cognitive surveillance framework. Signal Process. Image Commun. 23(7), 554–569 (2008)
Article Google Scholar
Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Determining the best suited semantic events for cognitive surveillance. Expert Syst. Appl. 38(4), 4068–4079 (2011)
Article Google Scholar
Fusier, F., Valentin, V., Brémond, F., Thonnat, M., Borg, M., Thirde, D., Ferryman, J.: Video understanding for complex activity recognition. Mach. Vis. Appl. 18(3), 167–188 (2007)
Article MATH Google Scholar
Gonzàlez, J.: Human sequence evaluation: the key-frame approach. PhD thesis, UAB, Spain (2004). http://www.cvc.uab.es/~poal/hse/hse.htm
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)
Article Google Scholar
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: International Conference on Computer Vision, pp. 84–93 (2001)
Google Scholar
Ikizler, N., Duygulu, P.I.: Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)
Article Google Scholar
Ikizler, N., Forsyth, D.A.: Searching video for complex activities with finite state models. In: CVPR (2007)
Google Scholar
Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning actions from the web. In: International Conference on Computer Vision (2009)
Google Scholar
Kitani, K.M., Sato, Y., Sugimoto, A.: Recovering the basic structure of human activities from noisy video-based symbol strings. Int. J. Pattern Recognit. Artif. Intell. 22(8), 1621–1646 (2008)
Article Google Scholar
Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: European Conference on Computer Vision, pp. 336–349 (2008)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska (2008)
Google Scholar
Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)
Chapter Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 2169–2178 (2006)
Google Scholar
Li, L.-J., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: International Conference on Computer Vision (2007)
Google Scholar
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 77–116 (1998)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Kerkyra, Greece, p. 1150 (1999)
Chapter Google Scholar
Mahajan, D., Kwatra, N., Jain, S., Kalra, P., Banerjee, S.: A framework for activity recognition and detection of unusual activities. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing. Citeseer, University Park (2004)
Google Scholar
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Google Scholar
Masoud, O., Papanikolopoulos, N.: A method for human action recognition. Image Vis. Comput. 21(8), 729–743 (2003)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)
Article Google Scholar
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)
Article Google Scholar
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: Proceedings of the National Conference on Artificial Intelligence, pp. 770–776 (2002)
Google Scholar
Nagel, H.H.: From image sequences towards conceptual descriptions. Image Vis. Comput. 6(2), 59–74 (1988)
Article Google Scholar
Nagel, H.H., Gerber, R.: Representation of occurrences for road vehicle traffic. AI Mag. 172(4–5), 351–391 (2008)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Article Google Scholar
Noceti, N., Santoro, M., Odone, F., Disi, V.D.: String-based spectral clustering for understanding human behaviours. In: Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences, pp. 19–27 (2008)
Google Scholar
Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22, 831 (2000)
Article Google Scholar
Pedersoli, M., Gonzàlez, J., Bagdanov, A.D., Roca, F.X.: Recursive coarse-to-fine localization for fast object detection. In: European Conference on Computer Vision (2010)
Google Scholar
Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)
Article Google Scholar
Roth, D., Koller-Meier, E., Van Gool, L.: Multi-object tracking evaluated on sparse events. Multimed. Tools Appl. 1–19 (September 2009), online
Google Scholar
Rowe, D., Rius, I., Gonzàlez, J., Villanueva, J.J.: Improving tracking by handling occlusions. In: 3rd ICAPR. LNCS, vol. 2, pp. 384–393. Springer, Berlin (2005)
Google Scholar
Saxena, S., Brémond, F., Thonnat, M., Ma, R.: Crowd behavior recognition for video surveillance. In: Advanced Concepts for Intelligent Vision Systems, pp. 970–981 (2008)
Chapter Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition, Cambridge, UK (2004)
Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (2006)
Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 1349–1380 (2000)
Google Scholar
Smith, P., da Vitoria, N., Shah, M.: Temporal boost for event recognition. In: 10th IEEE International Conference on Computer Vision, October 2005
Google Scholar
Vu, V.T., Brémond, F., Thonnat, M.: Automatic video interpretation: A recognition algorithm for temporal scenarios based on pre-compiled scenario models. Comput. Vis. Syst. 523–533 (2003)
Google Scholar
Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 1654–1661 (2006)
Google Scholar
Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. Int. J. Comput. Vis. 67(1), 21–51 (2006)
Article Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)
Google Scholar
Zheng, H., Wang, H., Black, N.: Human activity detection in smart home environment with self-adaptive neural networks. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC), pp. 1505–1510, April 2008
Chapter Google Scholar

Download references

Acknowledgements

We gratefully acknowledge Marco Pedersoli in providing the detection module. This work was initially supported by the EU Project FP6 HERMES IST-027110 and VIDI-Video IST-045547. Also, the authors acknowledge the support of the Spanish Research Programs Consolider-Ingenio 2010: MIPRCV (CSD200700018); Avanza I+D ViCoMo (TSI-020400-2009-133); CENIT-IMAGENIO 2010 SEGUR@; along with the Spanish projects TIN2009-14501-C02-01 and TIN2009-14501-C02-02.

Author information

Authors and Affiliations

Departament de Ciències de la Computació and Computer Vision Center, Universitat Autònoma de Barcelona, 08193, Bellaterra, Catalonia, Spain
Nataliya Shapovalova, Carles Fernández, F. Xavier Roca & Jordi Gonzàlez

Authors

Nataliya Shapovalova
View author publications
You can also search for this author in PubMed Google Scholar
Carles Fernández
View author publications
You can also search for this author in PubMed Google Scholar
F. Xavier Roca
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Gonzàlez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nataliya Shapovalova .

Editor information

Editors and Affiliations

Department of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Albert Ali Salah
Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam, 1098 XH, Netherlands
Theo Gevers

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shapovalova, N., Fernández, C., Roca, F.X., Gonzàlez, J. (2011). Semantics of Human Behavior in Image Sequences. In: Salah, A., Gevers, T. (eds) Computer Analysis of Human Behavior. Springer, London. https://doi.org/10.1007/978-0-85729-994-9_7

Download citation

DOI: https://doi.org/10.1007/978-0-85729-994-9_7
Publisher Name: Springer, London
Print ISBN: 978-0-85729-993-2
Online ISBN: 978-0-85729-994-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics