Representing Pairwise Spatial and Temporal Relations for Action Recognition

  • Pyry Matikainen
  • Martial Hebert
  • Rahul Sukthankar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


The popular bag-of-words paradigm for action recognition tasks is based on building histograms of quantized features, typically at the cost of discarding all information about relationships between them. However, although the beneficial nature of including these relationships seems obvious, in practice finding good representations for feature relationships in video is difficult. We propose a simple and computationally efficient method for expressing pairwise relationships between quantized features that combines the power of discriminative representations with key aspects of Naïve Bayes. We demonstrate how our technique can augment both appearance- and motion-based features, and that it significantly improves performance on both types of features.


Action Recognition Temporal Relation Multiple Kernel Learning Pairwise Relationship Feature Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  2. 2.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79 (2008)Google Scholar
  3. 3.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  4. 4.
    Carneiro, G., Lowe, D.: Sparse flexible models of local features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 29–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Leordeanu, M., Hebert, M., Sukthankar, R.: Beyond local appearance: Category recognition from pairwise interactions of simple features. In: CVPR (2007)Google Scholar
  7. 7.
    Zhang, Z.M., Hu, Y.Q., Chan, S., Chia, L.T.: Motion context: A new representation for human action recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
  9. 9.
    Jiang, H., Martin, D.R.: Finding actions using shape flows. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 278–292. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Johnson, N., Hogg, D.: Learning the distribution of object trajectories for event recognition. Image and Vision Computing 14 (1996)Google Scholar
  12. 12.
    Makris, D., Ellis, T.: Spatial and probabilistic modelling of pedestrian behaviour. In: BMVC (2002)Google Scholar
  13. 13.
    Gilbert, A., Illingworth, J., Bowden, R.: Scale invariant action recognition using compound features mined from dense spatio-temporal corners. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 222–233. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: ICCV (2009)Google Scholar
  15. 15.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV (2009)Google Scholar
  16. 16.
    Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L.: Spatial-Temporal correlatons for unsupervised action classification. In: WMVC (2008)Google Scholar
  17. 17.
    Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR (2009)Google Scholar
  18. 18.
    Maji, S., Malik, J.: Object detection using a max-margin Hough transform. In: CVPR (2009)Google Scholar
  19. 19.
    Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: ICCV Workshop on Video-oriented Object and Event Classification (2009)Google Scholar
  20. 20.
    Chang, C.C., Lin, C.J.: LIBSVM – a library for support vector machines (2001)Google Scholar
  21. 21.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: CVPR (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Pyry Matikainen
    • 1
  • Martial Hebert
    • 1
  • Rahul Sukthankar
    • 2
    • 1
  1. 1.The Robotics InstituteCarnegie Mellon University 
  2. 2.Intel Labs Pittsburgh 

Personalised recommendations