A Framework for Combined Recognition of Actions and Objects

  • Ilktan Ar
  • Yusuf Sinan Akgul
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7594)


This paper proposes a novel approach to recognize actions and objects within the context of each other. Assuming that the different actions involve different objects in image sequences and there is one-to-one relation between object and action type, we present a Bayesian network based framework which combines motion patterns and object usage information to recognize actions/objects. More specifically, our approach recognizes high-level actions and the related objects without any body-part segmentation, hand tracking, and temporal segmentation methods. Additionally, we present a novel motion representation, based on 3D Haar-like features, which can be formed by depth, color, or both images. Our approach is also appropriate for object and action recognition where the involved object is partially or fully occluded. Finally, experiments show that our approach improves the accuracy of both action and object recognition significantly.


Action and object recognition Bayesian Network motion pattern 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ullman, S.: High-level vision: object recognition and visual cognition. MIT Press (1996)Google Scholar
  2. 2.
    Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories: short course. In: ICCV (2009)Google Scholar
  3. 3.
    Carbonetto, P., de Freitas, N., Barnard, K.: A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vision 53, 169–191 (2003)CrossRefGoogle Scholar
  5. 5.
    Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: CVPR (2008)Google Scholar
  6. 6.
    Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: ICCV, pp. 1–7 (2007)Google Scholar
  7. 7.
    Kjellström, H., Romero, J., Kragić, D.: Visual object-action recognition: interfering object affordances from human demonstration. Comput. Vis. Image Underst. 115, 81–90 (2011)CrossRefGoogle Scholar
  8. 8.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 90–126 (2006)CrossRefGoogle Scholar
  9. 9.
    Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28, 976–990 (2010)CrossRefGoogle Scholar
  10. 10.
    Gall, J., Fossati, A., Gool, L.J.V.: Functional categorization of objects using real-time markerless motion capture. In: CVPR, pp. 1969–1976 (2011)Google Scholar
  11. 11.
    Ciu, X., Liu, Y., Shan, S., Chen, X., Gao, W.: 3D Haar-like features for pedestrian detection. In: ICME 2007, pp. 1263–1266 (2007)Google Scholar
  12. 12.
    Fei-Fei, L.: Bag of words models: recognizing and learning object categories. In: CVPR 2007 (2007)Google Scholar
  13. 13.
    Rubinstein, Y., Hastie, T.: Discriminative vs. informative learning. In: Proc. of the 3th Int. Conf. on Knowledge Discovery and Data Mining, pp. 49–53 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ilktan Ar
    • 1
    • 2
  • Yusuf Sinan Akgul
    • 2
  1. 1.Kadir Has UniversityCibaliTurkey
  2. 2.GIT Vision Lab.Gebze Institute of TechnologyGebzeTurkey

Personalised recommendations