This paper describes a stochastic methodology for the recognition of various types of high-level group activities. Our system maintains a probabilistic representation of a group activity, describing how individual activities of its group members must be organized temporally, spatially, and logically. In order to recognize each of the represented group activities, our system searches for a set of group members that has the maximum posterior probability of satisfying its representation. A hierarchical recognition algorithm utilizing a Markov chain Monte Carlo (MCMC)-based probability distribution sampling has been designed, detecting group activities and finding the acting groups simultaneously. The system has been tested to recognize complex activities such as ‘a group of thieves stealing an object from another group’ and ‘a group assaulting a person’. Videos downloaded from YouTube as well as videos that we have taken are tested. Experimental results show that our system recognizes a wide range of group activities more reliably and accurately, as compared to previous approaches.
This is a preview of subscription content, log in to check access.
Buy single article
Instant unlimited access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Aggarwal, J. K., & Cai, Q. (1999). Human motion analysis: A review. Computer Vision and Image Understanding: CVIU, 73(3), 428–440.
Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11), 832–843.
Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic. Journal of Logic and Computation, 4(5), 531–579.
Cupillard, F., Bremond, F., & Thonnat, M. (2002). Group behavior recognition with multiple cameras. In Proceedings of sixth IEEE workshop on applications of computer vision (WACV) (pp. 177–183).
Francois, A. R. J., Nevatia, R., Hobbs, J., & Bolles, R. C. (2005). Verl: An ontology framework for representing and annotating video events. IEEE MultiMedia, 12(4), 76–86.
Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In IEEE international conference on computer vision (ICCV) (p. 742).
Hakeem, A., Sheikh, Y., & Shah, M. (2004). CASEE: A hierarchical event representation for the analysis of videos. In Proceedings of the 20th national conference on artificial intelligence (AAAI) (pp. 263–268).
Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding: CVIU, 96(2), 129–162.
Intille, S. S., & Bobick, A. F. (1999). A framework for recognizing multi-agent action from visual evidence. In AAAI/IAAI (pp. 518–525).
Ivanov, Y. A., & Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Khan, S. M., & Shah, M. (2005). Detecting group activities using rigidity of formation. In ACM multimedia.
Khan, Z., Balch, T., & Dellaert, F. (2005). Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11), 1805–1819.
Liao, L., Fox, D., & Kautz, H. (2005). Location-based activity recognition using relational Markov networks. In Proceedings of the nineteenth international conference on artificial intelligence (IJCAI).
Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., & Kolobov, A. (2005). Blog: Probabilistic models with unknown objects. In Proceedings of the 19th international joint conference on artificial intelligence (IJCAI) (pp. 1352–1359).
Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.
Park, S., & Aggarwal, J. K. (2004). A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 10(2), 164–179.
Pinhanez, C. S., & Bobick, A. F. (1998). Human action detection using pnf propagation of temporal constraints. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (p. 898).
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136.
Ryoo, M. S., & Aggarwal, J. K. (2008a). Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects. In IEEE conference on computer vision and pattern recognition (CVPR).
Ryoo, M. S., & Aggarwal, J. K. (2008b). Recognition of high-level group activities based on activities of individual members. In Proceedings of IEEE workshop on motion and video computing (WMVC).
Ryoo, M. S., & Aggarwal, J. K. (2009). Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision (IJCV), 32(1), 1–24.
Siskind, J. M. (2001). Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research (JAIR), 15, 31–90.
Song, X., & Nevatia, R. (2004). Detection and tracking of moving vehicles in crowded scenes. In Proceedings of IEEE workshop on motion and video computing (WMVC).
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the conference on uncertainty in artificial intelligence (UAI).
Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using Markov logic networks. In Proceedings of European conference on computer vision (ECCV) (pp. 610–623).
Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O. (2008). Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1473–1488.
Vaswani, N., Roy Chowdhury, A., & Chellappa, R. (2003). Activity recognition using the dynamics of the configuration of interacting objects. In IEEE conference on computer vision and pattern recognition (CVPR).
Viola, P., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition (CVPR).
Vu, V.-T., Brémond, F., & Thonnat, M. (2003). Automatic video interpretation: A novel algorithm for temporal scenario recognition. In International joint conference on artificial intelligence (IJCAI) (pp. 1295–1302).
Zhang, D., Gatica-Perez, D., Bengio, S., & McCowan, I. (2006). Modeling individual and group actions in meetings with layered hmms. IEEE Transactions on Multimedia, 8(3), 509–520.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Ryoo, M.S., Aggarwal, J.K. Stochastic Representation and Recognition of High-Level Group Activities. Int J Comput Vis 93, 183–200 (2011). https://doi.org/10.1007/s11263-010-0355-5
- Human activity recognition
- Group activity recognition
- Description-based event detection
- Stochastic grammar