Abstract
The collective activity of a group of persons is more than a mere sum of individual person actions, since interactions and the context of the overall group behavior have crucial influence. Consequently, the current standard paradigm for group activity recognition is to model the spatiotemporal pattern of individual person bounding boxes and their interactions. Despite this trend towards increasingly global representations, activities are often defined by semi-local characteristics and their interrelation between different persons. For capturing the large visual variability with small semi-local parts, a large number of them are required, thus rendering manual annotation infeasible. To automatically learn activity constituents that are meaningful for the collective activity, we sample local parts and group related ones not merely based on visual similarity but based on the function they fulfill on a set of validation images. Then max-margin multiple instance learning is employed to jointly i) remove clutter from these groups and focus on only the relevant samples, ii) learn the activity constituents, and iii) train the multi-class activity classifier. Experiments on standard activity benchmark sets show the advantage of this joint procedure and demonstrate the benefit of functionally grouped latent activity constituents for group activity recognition.
Chapter PDF
Similar content being viewed by others
Keywords
References
Amer, M.R., Todorovic, S.: A chains model for localizing participants of group activities in videos. In: ICCV, pp. 786–793 (2011)
Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012)
Antic, B., Ommer, B.: Video parsing for abnormality detection. In: ICCV, pp. 2415–2422 (2011)
Antić, B., Ommer, B.: Robust multiple-instance learning with superbags. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 242–255. Springer, Heidelberg (2013)
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)
Chapelle, O., Keerthi, S.S.: Multi-class feature selection with support vector machines. In: Proc. of the American Statistical Assoc. (2008)
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)
Choi, W., Savarese, S.: Understanding collective activities of people from videos. Pattern Analysis and Machine Intelligence (99), 1 (2013)
Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: Proc. of 9th International Workshop on Visual Surveillance (VSWS 2009) in Conjuction with ICCV (2009)
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Schmid, C., Soatto, S., Tomasi, C. (eds.)International Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 886–893 (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)
Eigenstetter, A., Takami, M., Ommer, B.: Randomized Max-Margin Compositions for Visual Recognition. In: CVPR - International Conference on Computer Vision and Pattern Recognition, Columbus, USA (2014)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2003)
Gehler, P.V., Chapelle, O.: Deterministic annealing for multiple-instance learning. In: International Conference on Artificial Intelligence and Statistics, pp. 123–130 (2007)
Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012)
Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Lan, T., Wang, Y., Mori, G., Robinovitch, S.: Retrieving actions in group contexts. In: International Workshop on Sign Gesture Activity (2010)
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: Advances in Neural Information Processing Systems, NIPS (2010)
Lan, T., Wang, Y., Yang, W., Robinovitch, S., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision & Pattern Recognition (2008)
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: IEEE International Conference on Computer Vision and Pattern Recognition, CVPR (2011)
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: Conference on Computer Vision & Pattern Recognition (2009)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application (VISSAPP 2009), pp. 331–340. INSTICC Press (2009)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79(3), 299–318 (2008)
Ommer, B., Mader, T., Buhmann, J.M.: Seeing the objects behind the dots: Recognition in videos from a moving camera. International Journal of Computer Vision 83(1), 57–71 (2009)
Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. International Journal of Computer Vision 93(2), 183–200 (2011)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (3), pp. 32–36 (2004)
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1473–1488 (2008)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action Recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176 (2011)
Xiang, T., Gong, S.: Beyond tracking: Modelling activity and understanding behaviour. International Journal of Computer Vision 67(1), 21–51 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Antic, B., Ommer, B. (2014). Learning Latent Constituents for Recognition of Group Activities in Video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham. https://doi.org/10.1007/978-3-319-10590-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-10590-1_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10589-5
Online ISBN: 978-3-319-10590-1
eBook Packages: Computer ScienceComputer Science (R0)