Combining Per-frame and Per-track Cues for Multi-person Action Recognition

  • Sameh Khamis
  • Vlad I. Morariu
  • Larry S. Davis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7572)


We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual’s action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lan, T., Wang, Y., Mori, G., Robinovitch, S.N.: Retrieving actions in group contexts. In: SGA (2010)Google Scholar
  2. 2.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  3. 3.
    Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)Google Scholar
  4. 4.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  5. 5.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: VS (2009)Google Scholar
  6. 6.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  7. 7.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  8. 8.
    Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)Google Scholar
  9. 9.
    Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. IJCV 67, 21–51 (2006)CrossRefGoogle Scholar
  10. 10.
    Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. In: AI (2007)Google Scholar
  11. 11.
    Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. IJCV 93, 183–200 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR (2009)Google Scholar
  13. 13.
    Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: CVPR (2011)Google Scholar
  14. 14.
    Brendel, W., Todorovic, S., Fern, A.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)Google Scholar
  15. 15.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  16. 16.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR (2008)Google Scholar
  17. 17.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)Google Scholar
  18. 18.
    Berclaz, J., Fleuret, F., Türetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. PAMI 33, 1806–1819 (2011)CrossRefGoogle Scholar
  19. 19.
    Shitrit, H.B., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global appearance constraints. In: ICCV (2011)Google Scholar
  20. 20.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  21. 21.
    Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)Google Scholar
  22. 22.
    Komodakis, N., Paragios, N., Tziritas, G.: Mrf optimization via dual decomposition: Message-passing revisited. In: ICCV (2007)Google Scholar
  23. 23.
    Pearl, J.: Reverend bayes on inference engines: A distributed hierarchical approach. In: AAAI, pp. 133–136 (1982)Google Scholar
  24. 24.
    Gamarnik, D., Shah, D., Wei, Y.: Belief propagation for min-cost network flow: convergence & correctness. In: SODA (2010)Google Scholar
  25. 25.
    Sutton, C., McCallum, A.: Piecewise training for undirected models. In: UAI (2005)Google Scholar
  26. 26.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: A library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  28. 28.
    Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum-weight independent set. In: CVPR (2011)Google Scholar
  29. 29.
    Weinberger, K.Q., Saul, L.K.: Fast solvers and efficient implementations for distance metric learning. In: ICML (2008)Google Scholar
  30. 30.
    Gonfaus, J.M., Boix, X., de Weijer, J.V., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)Google Scholar
  31. 31.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)Google Scholar
  32. 32.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sameh Khamis
    • 1
  • Vlad I. Morariu
    • 1
  • Larry S. Davis
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations