Skip to main content

Advertisement

Log in

A new method for violence detection in surveillance scenes

Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Violence detection is a hot topic for surveillance systems. However, it has not been studied as much as for action recognition. Existing vision-based methods mainly concentrate on violence detection and make little effort to determine the location of violence. In this paper, we propose a fast and robust framework for detecting and localizing violence in surveillance scenes. For this purpose, a Gaussian Model of Optical Flow (GMOF) is proposed to extract candidate violence regions, which are adaptively modeled as a deviation from the normal behavior of crowd observed in the scene. Violence detection is then performed on each video volume constructed by densely sampling the candidate violence regions. To distinguish violent events from nonviolent events, we also propose a novel descriptor, named as Orientation Histogram of Optical Flow (OHOF), which are fed into a linear SVM for classification. Experimental results on several benchmark datasets have demonstrated the superiority of our proposed method over the state-of-the-arts in terms of both detection accuracy and processing speed, even in crowded scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43:1–43

    Article  Google Scholar 

  2. Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466

    Article  Google Scholar 

  3. Bermejo E, Deniz O, Bueno G, and Sukthankar R (2011) Violence detection in video using computer vision techniques. Proc. of the 14th Int Conf Comput Anal Images Patterns II: 332–339

  4. Bertini M, Bimbo AD and Seidenari L (2012) Multi-scale and real-time non-parametric approach for anomaly detection and localization. CVIU 320–329

  5. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  6. Bouguet JY (1999) Pyramidal implementation of the Lucas Kanade feature tracker. Microsoft Res Labs Tech Rep

  7. Chen MY, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Tech Rep Carnegie Mellon University

  8. Chen D, Wactlar H, Chen M, Gao C, Bharucha A, Hauptmann A (2008) Recognition of aggressive human behavior using binary local motion descriptors. Eng Med Biol Soc 20:5238–5241

    Google Scholar 

  9. Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proc ACM SIGMM Work Multimedia Inf Retr 109–115

  10. Clarin CT, Dionisio JAM, Echavez MT, Naval PCJ (2005) DOVE: detection of movie violence using motion intensity analysis on skin and blood. Tech Rep University of the Philippines

  11. Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 257–267

  12. Cupillard F, Bremond F, Thonnat M (2002) Group behavior recognition with multiple cameras. WACV 177–183

  13. Dai P, Di H, Dong L, Tao L, Xu G (2008) Group interaction analysis in dynamic context. IEEE Trans Syst Man Cybern 38(1):275–282

    Article  Google Scholar 

  14. Damen D, Hogg D (2009) Recognizing linked events: searching the space of feasible explanations. CVPR 927–934

  15. Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. ICIP 433–438

  16. de Souza FDM, Chavez GC, do Valle EA, de A Araujo A (2010) Violence detection in video using spatio-temporal features. SIBGRAPI 224–230

  17. Gong S, Xiang T (2003) Recognition of group activities using dynamic probabilistic networks. ICCV 2:742–749

    Google Scholar 

  18. Gupta A, Davis LS (2007) Objects in action: an approach for combining action understanding and object perception. CVPR pp 1–8

  19. Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. CVPR 2012–2019

  20. Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. CVPRW 1–6

  21. Huesmann LR, Moise-Titus J, Podolski CL, Eron LD (2003) Longitudinal relations between children’s exposure to TV violence and their aggressive and violent behavior in young adulthood: 1977–1992. Dev Psychol 39:201–221

    Article  Google Scholar 

  22. Intille SS, Bobick AF (1999) A framework for recognizing multiagent action from visual evidence, In: AAAI-99. AAAI Press, Menlo Park, pp 518–525

    Google Scholar 

  23. Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872

    Article  Google Scholar 

  24. LIB-SVM. http://www.csie.ntu.edu.tw/~cjlin/

  25. Lin J, Wang WQ (2009) Weakly-supervised violence detection in movies with audio and video based co-training. PCM 990–935

  26. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. CVPR 1975–1981

  27. Minnen D, Essa I, Starner T (2003) Expectation grammars: leveraging high-level expectations for activity recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2:626–632

    Google Scholar 

  28. Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-free grammar. Proc AAAI Natl Conf AI 770–776

  29. Nam JH, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. ICIP 353–357

  30. Natarajan P, Nevatia R (2007) Coupled hidden semi Markov models for activity recognition. IEEE Work Motion Video Comput pp 1–8

  31. Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. CVPR Work 4:39–47

    Google Scholar 

  32. Nguyen NT, Phung DQ, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. CVPR 2:955–960

    Google Scholar 

  33. Oikonomopoulos A, Patras I, Pantic M, Paragios N (2007) Trajectory-based representation of human actions. Artif Intell Hum Comput 4451:133–154

    Article  Google Scholar 

  34. Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. Proc. 4th IEEE Int Conf Multi-modal Inter faces 3–8

  35. Pinhanez CS, Bobick AF (1998) Human action detection using pnf propagation of temporal constraints. Proc. IEEE Comput Soc Conf Comput Vis Pattern Recognit 898–904

  36. Popoola Oluwatoyin P and Wang KJ (2012) Video-Based Abnormal Human Behavior recognition - a review. IEEE Trans. Cybernet 865–878

  37. Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82:1–24

    Article  Google Scholar 

  38. Shechtman E, Irani M (2005) Space-time behavior based correlation. CVPR 1:405–412

    Google Scholar 

  39. Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. CVPR 2:862–869

    Google Scholar 

  40. Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. CVPR

  41. The BEHAVE dataset. http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/

  42. The CAVIAR dataset. http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/

  43. Tran D, Alexander S (2008) Human activity recognition with metric learning. ECCV 548–561

  44. Tran D, Sorokin A (2008) Human activity recognition with metric learning. ECCV 548–561

  45. Vishwakarma S, Sapre A, Agrawal A (2011) Action recognition using cuboids of interest points. IEEE Int Conf Signal Process Commun Comput (ICSPCC) 1–6

  46. Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatiotemporal features for action recognition. BMVC 127–140

  47. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. CVPR 379–385

  48. Yang ZJ, Zhang T, Yang J, Wu Q, Bai L, Yao LX (2013) violence detection based on histogram of optical flow orientation, in Proc. SPIE 9067, Sixth Int Conf Mach Vision 1–4

  49. Yu E, Aggarwal JK (2006) Detection of fence climbing from monocular video. 18th Int Conf Pattern Recognit 1:375–378

    Google Scholar 

  50. Zhang J, Chen CH (2007) Moving object detection and segmentation in dynamic video backgrounds. IEEE Conf Technol Homeland Security 64–69

  51. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimed 8(3):509–520

    Article  Google Scholar 

Download references

Acknowledgments

This research is partly supported by NSFC, China (No: 61273258, 61105001).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tao Zhang or Jie Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Yang, Z., Jia, W. et al. A new method for violence detection in surveillance scenes. Multimed Tools Appl 75, 7327–7349 (2016). https://doi.org/10.1007/s11042-015-2648-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2648-8

Keywords

Navigation