Skip to main content

Semantics of Human Behavior in Image Sequences

  • Chapter
Computer Analysis of Human Behavior

Abstract

Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Hames, M., Rigoll, G.: A multi-modal mixed-state dynamic Bayesian network for robust meeting event recognition from disturbed data. In: IEEE International Conference on Multimedia and Expo (ICME 2005), pp. 45–48 (2005)

    Chapter  Google Scholar 

  2. Albanese, M., Chellappa, R., Moscato, V., Picariello, A., Subrahmanian, V.S., Turaga, P., Udrea, O.: A constrained probabilistic Petri Net framework for human activity detection in video. IEEE Trans. Multimed. 10(6), 982–996 (2008)

    Article  Google Scholar 

  3. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision (2005)

    Google Scholar 

  4. Bobick, A.F.: Movement, activity and action: the role of knowledge in the perception of motion. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 352(1358), 1257 (1997)

    Article  Google Scholar 

  5. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2002)

    Article  Google Scholar 

  6. Bosch, A., Munoz, X., Marti, R.: Which is the best way to organize/classify images by content? Image Vis. Comput. 25(6), 778–791 (2007)

    Article  Google Scholar 

  7. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)

    Google Scholar 

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893, San Diego (2005)

    Google Scholar 

  9. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: Proceedings of the British Machine Vision Conference, Aberystwyth, UK (2010)

    Google Scholar 

  10. Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: International Conference on Computer Vision (2009)

    Google Scholar 

  11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results (2010)

    Google Scholar 

  12. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010)

    Google Scholar 

  13. Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Interpretation of complex situations in a cognitive surveillance framework. Signal Process. Image Commun. 23(7), 554–569 (2008)

    Article  Google Scholar 

  14. Fernández, C., Baiget, P., Roca, F.X., Gonzàlez, J.: Determining the best suited semantic events for cognitive surveillance. Expert Syst. Appl. 38(4), 4068–4079 (2011)

    Article  Google Scholar 

  15. Fusier, F., Valentin, V., Brémond, F., Thonnat, M., Borg, M., Thirde, D., Ferryman, J.: Video understanding for complex activity recognition. Mach. Vis. Appl. 18(3), 167–188 (2007)

    Article  MATH  Google Scholar 

  16. Gonzàlez, J.: Human sequence evaluation: the key-frame approach. PhD thesis, UAB, Spain (2004). http://www.cvc.uab.es/~poal/hse/hse.htm

  17. Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)

    Article  Google Scholar 

  18. Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: International Conference on Computer Vision, pp. 84–93 (2001)

    Google Scholar 

  19. Ikizler, N., Duygulu, P.I.: Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)

    Article  Google Scholar 

  20. Ikizler, N., Forsyth, D.A.: Searching video for complex activities with finite state models. In: CVPR (2007)

    Google Scholar 

  21. Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning actions from the web. In: International Conference on Computer Vision (2009)

    Google Scholar 

  22. Kitani, K.M., Sato, Y., Sugimoto, A.: Recovering the basic structure of human activities from noisy video-based symbol strings. Int. J. Pattern Recognit. Artif. Intell. 22(8), 1621–1646 (2008)

    Article  Google Scholar 

  23. Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: European Conference on Computer Vision, pp. 336–349 (2008)

    Google Scholar 

  24. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska (2008)

    Google Scholar 

  25. Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)

    Chapter  Google Scholar 

  26. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 2169–2178 (2006)

    Google Scholar 

  27. Li, L.-J., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: International Conference on Computer Vision (2007)

    Google Scholar 

  28. Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 77–116 (1998)

    Google Scholar 

  29. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)

    Google Scholar 

  30. Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Kerkyra, Greece, p. 1150 (1999)

    Chapter  Google Scholar 

  31. Mahajan, D., Kwatra, N., Jain, S., Kalra, P., Banerjee, S.: A framework for activity recognition and detection of unusual activities. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing. Citeseer, University Park (2004)

    Google Scholar 

  32. Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)

    Google Scholar 

  33. Masoud, O., Papanikolopoulos, N.: A method for human action recognition. Image Vis. Comput. 21(8), 729–743 (2003)

    Article  Google Scholar 

  34. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)

    Article  Google Scholar 

  35. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)

    Article  Google Scholar 

  36. Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: Proceedings of the National Conference on Artificial Intelligence, pp. 770–776 (2002)

    Google Scholar 

  37. Nagel, H.H.: From image sequences towards conceptual descriptions. Image Vis. Comput. 6(2), 59–74 (1988)

    Article  Google Scholar 

  38. Nagel, H.H., Gerber, R.: Representation of occurrences for road vehicle traffic. AI Mag. 172(4–5), 351–391 (2008)

    Google Scholar 

  39. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  40. Noceti, N., Santoro, M., Odone, F., Disi, V.D.: String-based spectral clustering for understanding human behaviours. In: Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences, pp. 19–27 (2008)

    Google Scholar 

  41. Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22, 831 (2000)

    Article  Google Scholar 

  42. Pedersoli, M., Gonzàlez, J., Bagdanov, A.D., Roca, F.X.: Recursive coarse-to-fine localization for fast object detection. In: European Conference on Computer Vision (2010)

    Google Scholar 

  43. Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)

    Article  Google Scholar 

  44. Roth, D., Koller-Meier, E., Van Gool, L.: Multi-object tracking evaluated on sparse events. Multimed. Tools Appl. 1–19 (September 2009), online

    Google Scholar 

  45. Rowe, D., Rius, I., Gonzàlez, J., Villanueva, J.J.: Improving tracking by handling occlusions. In: 3rd ICAPR. LNCS, vol. 2, pp. 384–393. Springer, Berlin (2005)

    Google Scholar 

  46. Saxena, S., Brémond, F., Thonnat, M., Ma, R.: Crowd behavior recognition for video surveillance. In: Advanced Concepts for Intelligent Vision Systems, pp. 970–981 (2008)

    Chapter  Google Scholar 

  47. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition, Cambridge, UK (2004)

    Google Scholar 

  48. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (2006)

    Google Scholar 

  49. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 1349–1380 (2000)

    Google Scholar 

  50. Smith, P., da Vitoria, N., Shah, M.: Temporal boost for event recognition. In: 10th IEEE International Conference on Computer Vision, October 2005

    Google Scholar 

  51. Vu, V.T., Brémond, F., Thonnat, M.: Automatic video interpretation: A recognition algorithm for temporal scenarios based on pre-compiled scenario models. Comput. Vis. Syst. 523–533 (2003)

    Google Scholar 

  52. Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 1654–1661 (2006)

    Google Scholar 

  53. Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. Int. J. Comput. Vis. 67(1), 21–51 (2006)

    Article  Google Scholar 

  54. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, Florida, USA (2009)

    Google Scholar 

  55. Zheng, H., Wang, H., Black, N.: Human activity detection in smart home environment with self-adaptive neural networks. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC), pp. 1505–1510, April 2008

    Chapter  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge Marco Pedersoli in providing the detection module. This work was initially supported by the EU Project FP6 HERMES IST-027110 and VIDI-Video IST-045547. Also, the authors acknowledge the support of the Spanish Research Programs Consolider-Ingenio 2010: MIPRCV (CSD200700018); Avanza I+D ViCoMo (TSI-020400-2009-133); CENIT-IMAGENIO 2010 SEGUR@; along with the Spanish projects TIN2009-14501-C02-01 and TIN2009-14501-C02-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nataliya Shapovalova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Shapovalova, N., Fernández, C., Roca, F.X., Gonzàlez, J. (2011). Semantics of Human Behavior in Image Sequences. In: Salah, A., Gevers, T. (eds) Computer Analysis of Human Behavior. Springer, London. https://doi.org/10.1007/978-0-85729-994-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-994-9_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-993-2

  • Online ISBN: 978-0-85729-994-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics