Advertisement

Contextual Action Recognition

  • Hedvig Kjellström (Sidenbladh)

Abstract

The scope of this chapter is contextual information in analysis of human actions. We first discuss the definition of context in visual action recognition. Context in action recognition is here divided into four categories, object context, scene context, semantic context, and photogrammetric context. The value of all these types of context is twofold: First, context improves action recognition, provided that it offers information that is complementary to the human pose data on which the action recognition is based. Second, context makes semi-supervised learning easier, since it provides more views of the action, to some degree independent of the human pose view. A number of different methods for contextual action recognition are then reviewed, followed by a method-level description of a contextual object–action recognition method. We finally discuss future directions for the field of contextual action recognition.

Keywords

Contextual Information Action Recognition Semantic Context Human Action Recognition Action Combination 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ali, S., Shah, M.: Floor fields for tracking in high-density crowd scenes. In: European Conference on Computer Vision (2008) Google Scholar
  2. 2.
    Bickel, S., Scheffer, T.: Multi-view clustering. In: IEEE International Conference on Data Mining (2004) Google Scholar
  3. 3.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Conference on Computational Learning Theory (1998) Google Scholar
  4. 4.
    Brubaker, M.A., Sigal, L., Fleet, D.J.: Estimating contact dynamics. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  5. 5.
    Christoudias, C.M., Saenko, K., Morency, L., Darrell, T.: Co-adaption of audio-visual speech and gesture classifiers. In: International Conference on Multi-Modal Interface (2006) Google Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000) Google Scholar
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  8. 8.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977) MathSciNetMATHGoogle Scholar
  9. 9.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human–object interactions. In: Workshop on Structured Models in Computer Vision (2010) Google Scholar
  10. 10.
    Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2007) Google Scholar
  11. 11.
    Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  12. 12.
    Gibson, J.J.: The Ecological Approach to Visual Perception. Lawrence Erlbaum Associates, Mahwah (1979) Google Scholar
  13. 13.
    Gupta, A., Chen, T., Shen, F., Kimber, D., Davis, L.S.: Context and observation driven latent variable model for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  14. 14.
    Gupta, A., Davis, L.S.: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In: European Conference on Computer Vision (2008) Google Scholar
  15. 15.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009) CrossRefGoogle Scholar
  16. 16.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots: Learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  17. 17.
    Hamer, H., Schindler, K., Koller-Meier, E., Van Gool, L.: Tracking a hand manipulating an object. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  18. 18.
    Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  19. 19.
    Herzog, D., Ude, A., Krüger, V.: Motion imitation and recognition using parametric hidden Markov models. In: IEEE-RAS International Conference on Humanoid Robots (2008) Google Scholar
  20. 20.
    Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer Vision (2010) Google Scholar
  21. 21.
    Kjellström, H., Kragić, D., Black, M.J.: Tracking people interacting with objects. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  22. 22.
    Kjellström, H., Romero, J., Kragić, D.: Visual object–action recognition: Inferring object affordances from human demonstration. Comput. Vis. Image Underst. (in press). doi: 10.1016/j.cviu.2010.08.002
  23. 23.
    Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: European Conference on Computer Vision (2008) Google Scholar
  24. 24.
    Kohli, P., Rihan, J., Bray, M., Torr, P.H.S.: Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. Int. J. Comput. Vis. 79(3), 285–298 (2008) CrossRefGoogle Scholar
  25. 25.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001) Google Scholar
  26. 26.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  27. 27.
    Li, L., Fei-Fei, L.: What, where and who? Classifying events by scene and object recognition. In: IEEE International Conference on Computer Vision (2007) Google Scholar
  28. 28.
    Mann, R., Jepson, A.: Towards the computational perception of action. In: IEEE Conference on Computer Vision and Pattern Recognition (1998) Google Scholar
  29. 29.
    Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  30. 30.
    Moore, D.J., Essa, I.A., Hayes, M.H.: Exploiting human actions and object context for recognition tasks. In: IEEE International Conference on Computer Vision (1999) Google Scholar
  31. 31.
    Morency, L., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: International Conference on Multimodal Interface (2005) Google Scholar
  32. 32.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007) CrossRefGoogle Scholar
  33. 33.
    Peursum, P., West, G., Venkatesh, S.: Combining image regions and human activity for indirect object recognition in indoor wide-angle views. In: IEEE International Conference on Computer Vision (2005) Google Scholar
  34. 34.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989) CrossRefGoogle Scholar
  35. 35.
    Romero, J., Kjellström, H., Kragić, D.: Hands in action: Real-time 3d reconstruction of hands in interaction with objects. In: IEEE International Conference on Robotics and Automation (2010) Google Scholar
  36. 36.
    Singh, V.K., Khan, F.M., Nevatia, R.: Multiple pose context trees for estimating human pose in object context. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  37. 37.
    Strat, T.M.: Employing contextual information in computer vision. In: ARPA Image Understanding Workshop (1993) Google Scholar
  38. 38.
    Sutton, C., Rohanimanesh, K., McCallum, A.: Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. In: International Conference on Machine Learning (2004) Google Scholar
  39. 39.
    Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vis. 53(2), 169–191 (2003) CrossRefGoogle Scholar
  40. 40.
    Urtasun, R., Fleet, D.J., Fua, P.: Monocular 3D tracking of the golf swing. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  41. 41.
    Vondrak, M., Sigal, L., Jenkins, O.: The kneed walker for human pose tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  42. 42.
    Wilson, A.D., Bobick, A.F.: Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 884–900 (1999) CrossRefGoogle Scholar
  43. 43.
    Wörgötter, F., Agostini, A., Krüger, N., Shylo, N., Porr, B.: Cognitive agents – a procedural perspective relying on the predictability of object–action–complexes (OACs). Robot. Auton. Syst. 57(4), 420–432 (2009) CrossRefGoogle Scholar
  44. 44.
    Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: IEEE International Conference on Computer Vision (2007) Google Scholar
  45. 45.
    Yao, B., Fei-Fei, L.: Grouplet: A structured image representation for recognizing human and object intractions. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  46. 46.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human–object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  47. 47.
    Zeng, Z., Ji, Q.: Knowledge based activity recognition with dynamic Bayesian network. In: European Conference on Computer Vision (2010) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.CSC/CVAPKTHStockholmSweden

Personalised recommendations