Skip to main content

Detecting Violent Content in Hollywood Movies and User-Generated Videos

  • Chapter
  • First Online:
Smart Information Systems

Abstract

Detecting violent scenes in videos is an important content understanding functionality, e.g., for providing automated youth protection services. The key issues in designing violence detection algorithms are the choice of discriminative features and learning effective models. We employ low and mid-level audio-visual features and evaluate their discriminative power within the context of the MediaEval Violent Scenes Detection (VSD) task. The audio-visual cues are fused at the decision level. As audio features, Mel-Frequency Cepstral Coefficients (MFCC), and as visual features dense histogram of oriented gradient (HoG), histogram of oriented optical flow (HoF), Violent Flows (ViF), and affect-related color descriptors are used. We perform feature space partitioning of the violence training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs). The experimental results in Hollywood movies and short web videos show that mid-level audio features are more discriminative than the visual features, and that the performance is further enhanced by fusing the audio-visual cues at the decision level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://research.technicolor.com/rennes/.

  2. 2.

    http://fsf.de/jugendmedienschutz/international/filmfreigaben/.

  3. 3.

    http://www.fsk.de/.

  4. 4.

    Annotations were made available by Fudan University, Vietnam University of Science, and Technicolor.

  5. 5.

    https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/.

  6. 6.

    http://homepages.inf.ed.ac.uk/juijling/index.php#page=software/.

  7. 7.

    http://spams-devel.gforge.inria.fr/.

  8. 8.

    http://www.vlfeat.org/.

  9. 9.

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

  1. E. Acar, F. Hopfgartner, S. Albayrak, Detecting violent content in Hollywood Movies by mid-level audio representations, in CBMI 2013 (IEEE 2013)

    Google Scholar 

  2. R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to imbalanced datasets. Mach. Learn.: ECML 2004, 39–50 (2004)

    Google Scholar 

  3. S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning. Adv. Neural Inf. Process. Syst. 15, 561–568 (2002)

    Google Scholar 

  4. B.J. Bushman, L.R. Huesmann, Short-term and long-term effects of violent media on aggression in children and adults. Arch. Pediatr. Adolesc. Med. 160(4), 348 (2006)

    Article  Google Scholar 

  5. L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Horror video scene recognition via multiple-instance learning, in ICASSP (2011)

    Google Scholar 

  6. L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Violence detection in movies, in 2011 Eighth International Conference on Computer Graphics, Imaging and Visualization (CGIV) (IEEE, 2011), pp. 119–124

    Google Scholar 

  7. F.D.M. de Souza, G.C. Chávez, E.A. do Valle, A. de A. Araujo. Violence detection in video using spatio-temporal features, in 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (IEEE, 2010), pp. 224–230

    Google Scholar 

  8. C.-H. Demarty, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C. Penet, Benchmarking violent scenes detection in movies, in 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI) (IEEE, 2014), pp. 1–6

    Google Scholar 

  9. C.-H. Demarty, C. Penet, M. Schedl, B. Ionescu, Vu L. Quang, Yu-G. Jiang, The MediaEval 2013 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2013 Workshop, Barcelona, Spain, 18–19 October 2013

    Google Scholar 

  10. N. Derbas, G. Quénot, Joint audio-visual words for violent scenes detection in movies, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 483

    Google Scholar 

  11. X. Ding, B. Li, W. Hu, W. Xiong, Z. Wang, Horror video scene recognition based on multi-view multi-instance learning, in Computer Vision-ACCV 2012 (Springer, 2013), pp. 599–610

    Google Scholar 

  12. B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. T. Giannakopoulos, D. Kosmopoulos, A. Aristidou, S. Theodoridis, Violence content classification using audio features. Adv. Artif. Intell. 3955, 502–507 (2006)

    Article  Google Scholar 

  14. T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, S. Theodoridis, Audio-visual fusion for detecting violent scenes in videos. Artif. Intell.: Theor. Model. Appl. 6040, 91–100 (2010)

    Google Scholar 

  15. Y. Gong, W. Wang, S. Jiang, Q. Huang, W. Gao, Detecting violent scenes in movies by auditory and visual cues. Adv. Multimed. Inf. Process.-PCM 2008, 317–326 (2008)

    Google Scholar 

  16. S. Goto, T. Aoki, Violent scenes detection using mid-level violence clustering. Comput. Sci. (2014)

    Google Scholar 

  17. D. Hasler, S.E. Suesstrunk, Measuring colorfulness in natural images, in Electronic Imaging 2003. International Society for Optics and Photonics, pp. 87–95 (2003)

    Google Scholar 

  18. T. Hassner, Y. Itcher, O. Kliper-Gross, Violent flows: real-time detection of violent crowd behavior, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2012), pp. 1–6

    Google Scholar 

  19. H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  20. B.K. Horn, B.G. Schunck, Determining optical flow, in 1981 Technical Symposium East. International Society for Optics and Photonics, pp. 319–331 (1981)

    Google Scholar 

  21. B. Ionescu, J. Schlüter, I. Mironica, M. Schedl, A naive mid-level concept-based fusion approach to violence detection in Hollywood Movies, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval (ACM, 2013), pp. 215–222

    Google Scholar 

  22. O. Kliper-Gross, T. Hassner, L. Wolf, The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)

    Article  Google Scholar 

  23. I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (IEEE, 2008), pages 1–8

    Google Scholar 

  24. J. Lin, W. Wang, Weakly-supervised violence detection in movies with audio and video based co-training. Adv. Multimed. Inf. Process.-PCM 2009, 930–935 (2009)

    Google Scholar 

  25. J. Machajdik, A. Hanbury, Affective image classification using features inspired by psychology and art theory, in Proceedings of the International Conference on Multimedia (ACM, 2010), pp. 83–92

    Google Scholar 

  26. J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)

    MathSciNet  MATH  Google Scholar 

  27. D. Maniry, E. Acar, F. Hopfgartner, S. Albayrak, A visualization tool for violent scenes detection, in Proceedings of ACM Conference on Multimedia Retrieval, ICMR’14 (ACM, 2014), pp. 522–523

    Google Scholar 

  28. E.B. Nievas, O.D. Suarez, G.B. García, R. Sukthankar, Violence detection in video using computer vision techniques, in Computer Analysis of Images and Patterns (Springer, 2011), pp. 332–339

    Google Scholar 

  29. C. Penet, C.-H. Demarty, G. Gravier, P. Gros, Multimodal information fusion and temporal integration for violence detection in movies, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2012), pp. 2393–2396

    Google Scholar 

  30. C. Penet, C.-H. Demarty, G. Gravier, P. Gros et al., Technicolor/inria team at the mediaeval 2013 violent scenes detection task. MediaEval 2013 Working Notes (2013)

    Google Scholar 

  31. M. Sjöberg, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C.-H. Demarty. The MediaEval 2014 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, 16–17 October 2014

    Google Scholar 

  32. J.R.R. Uijlings, I. Duta, N. Rostamzadeh, N. Sebe, Realtime video classification using dense HOF/HOG, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 145

    Google Scholar 

  33. H.L. Wang, L.F. Cheong, Affective understanding in film. IEEE Trans. Circuits Syst. Video Technol. 16(6), 689–704 (2006)

    Article  Google Scholar 

  34. W. Ting-Fan, C.-J. Lin, R.C. Weng, Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)

    MATH  Google Scholar 

  35. L. Xu, C. Gong, J. Yang, Q. Wu, L. Yao, Violent video detection based on MoSIFT feature and sparse coding, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2014), pp. 3538–3542

    Google Scholar 

  36. M. Xu, N.C. Maddage, C. Xu, M. Kankanhalli, Q. Tian, Creating audio keywords for event detection in soccer video, in ICME’03 (IEEE 2003)

    Google Scholar 

  37. R. Yan, M. Naphade, Semi-supervised cross feature learning for semantic concept detection in videos, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1 (IEEE, 2005), pp. 657–663

    Google Scholar 

  38. L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in 2009 IEEE 12th International Conference on Computer Vision (IEEE, 2009), pp. 492–497

    Google Scholar 

Download references

Acknowledgments

The research leading to these results has received funding from the European Community FP7 under grant agreement number 261743 (NoE VideoSense). We would like to thank Technicolor (http://www.technicolor.com/) for providing the ground truth, video shot boundaries, and the corresponding keyframes which have been used in this work. Our thanks also go to Fudan University and Vietnam University of Science for providing the ground truth of the Web video dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Esra Acar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Acar, E., Irrgang, M., Maniry, D., Hopfgartner, F. (2015). Detecting Violent Content in Hollywood Movies and User-Generated Videos. In: Hopfgartner, F. (eds) Smart Information Systems. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-14178-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14178-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14177-0

  • Online ISBN: 978-3-319-14178-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics