Retrieving Actions in Group Contexts

  • Tian Lan
  • Yang Wang
  • Greg Mori
  • Stephen N. Robinovitch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6553)


We develop methods for action retrieval from surveillance video using contextual feature representations. The novelty of our proposed approach is two-fold. First, we introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behaviour of other people nearby. This feature representation is inspired by the fact that the context of what other people are doing provides very useful cues for recognizing the actions of each individual. Second, we formulate our problem as a retrieval/ranking task, which is different from previous work on action classification. We develop an action retrieval technique based on rank-SVM, a state-of-the-art approach for solving ranking problems. We apply our proposed approach on two real-world datasets. The first dataset consists of videos of multiple people performing several group activities. The second dataset consists of surveillance videos from a nursing home environment. Our experimental results show the advantage of using contextual information for disambiguating different actions and the benefit of using rank-SVMs instead of regular SVMs for video retrieval problems.


Action Recognition Surveillance Video Retrieval Task Foreground Pixel Group Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE Trans. PAMI 31, 1762–1774 (2009)CrossRefGoogle Scholar
  2. 2.
    Wang, X., Ma, X., Grimson, E.: Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans. PAMI 31, 539–555 (2009)CrossRefGoogle Scholar
  3. 3.
    Loy, C.C., Xiang, T., Gong, S.: Modelling activity global temporal dependencies using time delayed probabilistic graphical model. In: ICCV (2009)Google Scholar
  4. 4.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  5. 5.
    Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  6. 6.
    Xiang, T., Gong, S.: Beyond tracking: Modelling activity and understanding behaviour. Int. Journal of Computer Vision 67, 21–51 (2006)CrossRefGoogle Scholar
  7. 7.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots - learning a visually grounded storyline model from annotated videos. In: CVPR (2009)Google Scholar
  8. 8.
    Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recogn. (2004)Google Scholar
  9. 9.
    Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: CVPR (2009)Google Scholar
  10. 10.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: VS (2009)Google Scholar
  11. 11.
    Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD (2002)Google Scholar
  12. 12.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  14. 14.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: 17th International Conference on Pattern Recognition (2004)Google Scholar
  15. 15.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. 10th Int. Conf. Computer Vision (2005)Google Scholar
  16. 16.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC (2006)Google Scholar
  17. 17.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  18. 18.
    Joachims, T.: A support vector method for multivariate performance measures. In: International Conference on Machine Learning (2005)Google Scholar
  19. 19.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar
  20. 20.
    Chapelle, O., Le, Q., Smola, A.: Large margin optimization of ranking measures. In: NIPS Workshop on Learning to Rank (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tian Lan
    • 1
  • Yang Wang
    • 1
  • Greg Mori
    • 1
  • Stephen N. Robinovitch
    • 2
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada
  2. 2.School of KinesiologySimon Fraser UniversityCanada

Personalised recommendations