Statistics of Pairwise Co-occurring Local Spatio-temporal Features for Human Action Recognition

  • Piotr Bilinski
  • Francois Bremond
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7583)


The bag-of-words approach with local spatio-temporal features have become a popular video representation for action recognition in videos. Together these techniques have demonstrated high recognition results for a number of action classes. Recent approaches have typically focused on capturing global statistics of features. However, existing methods ignore relations between features and thus may not be discriminative enough. Therefore, we propose a novel feature representation which captures statistics of pairwise co-occurring local spatio-temporal features. Our representation captures not only global distribution of features but also focuses on geometric and appearance (both visual and motion) relations among the features. Calculating a set of bag-of-words representations with different geometrical arrangement among the features, we keep an important association between appearance and geometric information. Using two benchmark datasets for human action recognition, we demonstrate that our representation enhances the discriminative power of features and improves action recognition performance.


  1. 1.
    Davis, J.: Hierarchical motion history images for recognizing human motion. In: IEEE Workshop on Detection and Recognition of Events in Video (2001)Google Scholar
  2. 2.
    Ahad, M., Tan, J., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Machine Vision and Applications (2010)Google Scholar
  3. 3.
    Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. CVIU (1999)Google Scholar
  4. 4.
    Kim, T.-S., Uddin, Z.: In: Silhouette-based Human Activity Recognition Using Independent Component Analysis, Linear Discriminant Analysis and Hidden Markov Model. InTech (2010)Google Scholar
  5. 5.
    Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)Google Scholar
  6. 6.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  7. 7.
    Raptis, M., Soatto, S.: Tracklet Descriptors for Action Modeling and Video Analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 577–590. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Kaaniche, M.-B., Bremond, F.: Gesture recognition by learning local motion signatures. In: CVPR (2010)Google Scholar
  9. 9.
    Wang, H., Klaser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  10. 10.
    Laptev, I.: On space-time interest points. IJCV (2005)Google Scholar
  11. 11.
    Rapantzikos, K., Avrithis, Y., Kollias, S.: Dense saliency-based spatiotemporal feature points for action recognition. In: CVPR (2009)Google Scholar
  12. 12.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)Google Scholar
  13. 13.
    Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: ICCV (2009)Google Scholar
  14. 14.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)Google Scholar
  15. 15.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, in Conjunction with ICCV (2005)Google Scholar
  16. 16.
    Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)Google Scholar
  18. 18.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  19. 19.
    Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)Google Scholar
  20. 20.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  21. 21.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  22. 22.
    Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR (2009)Google Scholar
  23. 23.
    Wang, J., Chen, Z., Wu, Y.: Action recognition with multiscale spatio-temporal contexts. In: CVPR (2011)Google Scholar
  24. 24.
    Banerjee, P., Nevatia, R.: Learning neighborhood co-occurrence statistics of sparse features for human activity recognition. In: AVSS (2011)Google Scholar
  25. 25.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)Google Scholar
  26. 26.
    Oikonomopoulos, A., Patras, I., Pantic, M.: An implicit spatiotemporal shape model for human activity localisation and recognition. In: Workshop on Human Communicative Behaviour Analysis, in Conjunction with CVPR (2009)Google Scholar
  27. 27.
    Ta, A.P., Wolf, C., Lavoue, G., Baskurt, A., Jolion, J.-M.: Pairwise features for human action recognition. In: ICPR (2010)Google Scholar
  28. 28.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)Google Scholar
  29. 29.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: CVPR (2009)Google Scholar
  30. 30.
    Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR (2011)Google Scholar
  31. 31.
    Kim, T.-K., Wong, S.-F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: CVPR (2007)Google Scholar
  32. 32.
    Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: ICCV (2011)Google Scholar
  33. 33.
    Jiang, Z., Lin, Z., Davis, L.: Recognizing human actions by learning and matching shape-motion prototype trees. PAMI (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Piotr Bilinski
    • 1
  • Francois Bremond
    • 1
  1. 1.STARS TeamINRIA Sophia AntipolisSophia AntipolisFrance

Personalised recommendations