Abstract
Violence detection is a hot topic for surveillance systems. However, it has not been studied as much as for action recognition. Existing vision-based methods mainly concentrate on violence detection and make little effort to determine the location of violence. In this paper, we propose a fast and robust framework for detecting and localizing violence in surveillance scenes. For this purpose, a Gaussian Model of Optical Flow (GMOF) is proposed to extract candidate violence regions, which are adaptively modeled as a deviation from the normal behavior of crowd observed in the scene. Violence detection is then performed on each video volume constructed by densely sampling the candidate violence regions. To distinguish violent events from nonviolent events, we also propose a novel descriptor, named as Orientation Histogram of Optical Flow (OHOF), which are fed into a linear SVM for classification. Experimental results on several benchmark datasets have demonstrated the superiority of our proposed method over the state-of-the-arts in terms of both detection accuracy and processing speed, even in crowded scenes.
Similar content being viewed by others
References
Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43:1–43
Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466
Bermejo E, Deniz O, Bueno G, and Sukthankar R (2011) Violence detection in video using computer vision techniques. Proc. of the 14th Int Conf Comput Anal Images Patterns II: 332–339
Bertini M, Bimbo AD and Seidenari L (2012) Multi-scale and real-time non-parametric approach for anomaly detection and localization. CVIU 320–329
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Bouguet JY (1999) Pyramidal implementation of the Lucas Kanade feature tracker. Microsoft Res Labs Tech Rep
Chen MY, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Tech Rep Carnegie Mellon University
Chen D, Wactlar H, Chen M, Gao C, Bharucha A, Hauptmann A (2008) Recognition of aggressive human behavior using binary local motion descriptors. Eng Med Biol Soc 20:5238–5241
Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proc ACM SIGMM Work Multimedia Inf Retr 109–115
Clarin CT, Dionisio JAM, Echavez MT, Naval PCJ (2005) DOVE: detection of movie violence using motion intensity analysis on skin and blood. Tech Rep University of the Philippines
Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 257–267
Cupillard F, Bremond F, Thonnat M (2002) Group behavior recognition with multiple cameras. WACV 177–183
Dai P, Di H, Dong L, Tao L, Xu G (2008) Group interaction analysis in dynamic context. IEEE Trans Syst Man Cybern 38(1):275–282
Damen D, Hogg D (2009) Recognizing linked events: searching the space of feasible explanations. CVPR 927–934
Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. ICIP 433–438
de Souza FDM, Chavez GC, do Valle EA, de A Araujo A (2010) Violence detection in video using spatio-temporal features. SIBGRAPI 224–230
Gong S, Xiang T (2003) Recognition of group activities using dynamic probabilistic networks. ICCV 2:742–749
Gupta A, Davis LS (2007) Objects in action: an approach for combining action understanding and object perception. CVPR pp 1–8
Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. CVPR 2012–2019
Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. CVPRW 1–6
Huesmann LR, Moise-Titus J, Podolski CL, Eron LD (2003) Longitudinal relations between children’s exposure to TV violence and their aggressive and violent behavior in young adulthood: 1977–1992. Dev Psychol 39:201–221
Intille SS, Bobick AF (1999) A framework for recognizing multiagent action from visual evidence, In: AAAI-99. AAAI Press, Menlo Park, pp 518–525
Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872
LIB-SVM. http://www.csie.ntu.edu.tw/~cjlin/
Lin J, Wang WQ (2009) Weakly-supervised violence detection in movies with audio and video based co-training. PCM 990–935
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. CVPR 1975–1981
Minnen D, Essa I, Starner T (2003) Expectation grammars: leveraging high-level expectations for activity recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2:626–632
Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-free grammar. Proc AAAI Natl Conf AI 770–776
Nam JH, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. ICIP 353–357
Natarajan P, Nevatia R (2007) Coupled hidden semi Markov models for activity recognition. IEEE Work Motion Video Comput pp 1–8
Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. CVPR Work 4:39–47
Nguyen NT, Phung DQ, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. CVPR 2:955–960
Oikonomopoulos A, Patras I, Pantic M, Paragios N (2007) Trajectory-based representation of human actions. Artif Intell Hum Comput 4451:133–154
Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. Proc. 4th IEEE Int Conf Multi-modal Inter faces 3–8
Pinhanez CS, Bobick AF (1998) Human action detection using pnf propagation of temporal constraints. Proc. IEEE Comput Soc Conf Comput Vis Pattern Recognit 898–904
Popoola Oluwatoyin P and Wang KJ (2012) Video-Based Abnormal Human Behavior recognition - a review. IEEE Trans. Cybernet 865–878
Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82:1–24
Shechtman E, Irani M (2005) Space-time behavior based correlation. CVPR 1:405–412
Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. CVPR 2:862–869
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. CVPR
The BEHAVE dataset. http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/
The CAVIAR dataset. http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/
Tran D, Alexander S (2008) Human activity recognition with metric learning. ECCV 548–561
Tran D, Sorokin A (2008) Human activity recognition with metric learning. ECCV 548–561
Vishwakarma S, Sapre A, Agrawal A (2011) Action recognition using cuboids of interest points. IEEE Int Conf Signal Process Commun Comput (ICSPCC) 1–6
Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatiotemporal features for action recognition. BMVC 127–140
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. CVPR 379–385
Yang ZJ, Zhang T, Yang J, Wu Q, Bai L, Yao LX (2013) violence detection based on histogram of optical flow orientation, in Proc. SPIE 9067, Sixth Int Conf Mach Vision 1–4
Yu E, Aggarwal JK (2006) Detection of fence climbing from monocular video. 18th Int Conf Pattern Recognit 1:375–378
Zhang J, Chen CH (2007) Moving object detection and segmentation in dynamic video backgrounds. IEEE Conf Technol Homeland Security 64–69
Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimed 8(3):509–520
Acknowledgments
This research is partly supported by NSFC, China (No: 61273258, 61105001).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Zhang, T., Yang, Z., Jia, W. et al. A new method for violence detection in surveillance scenes. Multimed Tools Appl 75, 7327–7349 (2016). https://doi.org/10.1007/s11042-015-2648-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2648-8