Action Recognition with Exemplar Based 2.5D Graph Matching

  • Bangpeng Yao
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7575)


This paper deals with recognizing human actions in still images. We make two key contributions. (1) We propose a novel, 2.5D representation of action images that considers both view-independent pose information and rich appearance information. A 2.5D graph of an action image consists of a set of nodes that are key-points of the human body, as well as a set of edges that are spatial relationships between the nodes. Each key-point is represented by view-independent 3D positions and local 2D appearance features. The similarity between two action images can then be measured by matching their corresponding 2.5D graphs. (2) We use an exemplar based action classification approach, where a set of representative images are selected for each action class. The selected images cover large within-action variations and carry discriminative information compared with the other classes. This exemplar based representation of action classes further makes our approach robust to pose variations and occlusions. We test our method on two publicly available datasets and show that it achieves very promising performance.


Action Class Training Image Action Recognition Action Image Human Action Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ikizler, N., Cinbis, R.G., Pehlivan, S., Duygulu, P.: Recognizing actions from still images. In: ICPR (2008)Google Scholar
  2. 2.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE T. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)CrossRefGoogle Scholar
  3. 3.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  4. 4.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)Google Scholar
  5. 5.
    Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)Google Scholar
  6. 6.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)Google Scholar
  7. 7.
    Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)Google Scholar
  8. 8.
    Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE T. Pattern Anal. Mach. Intell. 34, 601–614 (2012)CrossRefGoogle Scholar
  9. 9.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR (2011)Google Scholar
  10. 10.
    Everingham, M., Van Gool, L.J., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results (2011)Google Scholar
  11. 11.
    Natarajan, P., Nevatia, R.: View and scale invariant action recognition using multiview shape-flow methods. In: CVPR (2008)Google Scholar
  12. 12.
    Yan, P., Khan, S.M., Shah, M.: Learning 4D action feature models for arbitaray view action recognition. In: CVPR (2008)Google Scholar
  13. 13.
    Gong, D., Medioni, G.: Dynamic manifold warping for view invariant action recognition. In: ICCV (2011)Google Scholar
  14. 14.
    Weinland, D., Özuysal, M., Fua, P.: Making Action Recognition Robust to Occlusions and Viewpoint Changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE T. Pattern Anal. Mach. Intell. 33, 172–185 (2011)CrossRefGoogle Scholar
  16. 16.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded Models for Articulated Pose Estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 406–420. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single uncalibrated image, vol. 80, pp. 349–363 (2000)Google Scholar
  18. 18.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  19. 19.
    Yao, A., Gall, J., Fanelli, G., van Gool, L.: Does human action recognition benefit from pose estimation? In: BMVC (2011)Google Scholar
  20. 20.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  21. 21.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  22. 22.
    Szeliski, R., Anandan, P., Baker, S.: From 2D images to 2.5D sprites: A layered approach to modeling 3D scenes. In: MMCS (1999)Google Scholar
  23. 23.
    Duan, Y., Qin, H.: 2.5D active contour for surface reconstruction. In: VMV (2003)Google Scholar
  24. 24.
    Zafeiriou, S., Petrou, M.: 2.5D elastic graph matching. Comput. Vis. Image Und. 115, 1062–1072 (2011)CrossRefGoogle Scholar
  25. 25.
    Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE T. Pattern Anal. Mach. Intell. 20, 39–51 (1998)CrossRefGoogle Scholar
  26. 26.
    Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: ICCV (2007)Google Scholar
  27. 27.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)Google Scholar
  28. 28.
    Willems, G., Becker, J.H., Tuytelaars, T., van Gool, L.: Exemplar-based action recognition in video. In: BMVC (2009)Google Scholar
  29. 29.
    Hedetniemi, S.T., Laskar, R.C.: Bibliography on domination in graphs and some basic definitions of domination parameters. Discrete Math. 86, 257–277 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    Yao, B., Ai, H., Lao, S.: Building a Compact Relevant Sample Coverage for Relevance Feedback in Content-Based Image Retrieval. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 697–710. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Read, J.C.A., Phillipson, G.P., Serrano-Pedraza, I., Milner, A.D., Parker, A.J.: Stereoscopic vision in the absence of the lateral occipital cortex. PLoS One 5 (2010)Google Scholar
  32. 32.
    Lee, H.J., Chen, Z.: Determination of human body posture from a single view. Comp. Vision, Graphics, and Image Proc. 30, 148–168 (1985)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: BMVC (2010)Google Scholar
  34. 34.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE T. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)CrossRefGoogle Scholar
  35. 35.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Learning locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  36. 36.
    Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE T. Pattern Anal. Mach. Intell. 13, 376–380 (1991)CrossRefGoogle Scholar
  37. 37.
    Yao, B., Fei-Fei, L.: Grouplet: A structured image representation for recognizing human and object interactions. In: CVPR (2010)Google Scholar
  38. 38.
    Burghouts, G.J., Geusebroek, J.M.: Performance evaluation of local colour invariants. Comput. Vis. Image Und. 113, 48–62 (2009)CrossRefGoogle Scholar
  39. 39.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bangpeng Yao
    • 1
  • Li Fei-Fei
    • 1
  1. 1.Department of Computer ScienceStanford UniversityUSA

Personalised recommendations