Detecting Violent Content in Hollywood Movies and User-Generated Videos

Acar, Esra; Irrgang, Melanie; Maniry, Dominique; Hopfgartner, Frank

doi:10.1007/978-3-319-14178-7_11

Esra Acar³,
Melanie Irrgang³,
Dominique Maniry³ &
…
Frank Hopfgartner³

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

2802 Accesses
1 Citations

Abstract

Detecting violent scenes in videos is an important content understanding functionality, e.g., for providing automated youth protection services. The key issues in designing violence detection algorithms are the choice of discriminative features and learning effective models. We employ low and mid-level audio-visual features and evaluate their discriminative power within the context of the MediaEval Violent Scenes Detection (VSD) task. The audio-visual cues are fused at the decision level. As audio features, Mel-Frequency Cepstral Coefficients (MFCC), and as visual features dense histogram of oriented gradient (HoG), histogram of oriented optical flow (HoF), Violent Flows (ViF), and affect-related color descriptors are used. We perform feature space partitioning of the violence training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs). The experimental results in Hollywood movies and short web videos show that mid-level audio features are more discriminative than the visual features, and that the performance is further enhanced by fusing the audio-visual cues at the decision level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://research.technicolor.com/rennes/.
2.
http://fsf.de/jugendmedienschutz/international/filmfreigaben/.
3.
http://www.fsk.de/.
4.
Annotations were made available by Fudan University, Vietnam University of Science, and Technicolor.
5.
https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/.
6.
http://homepages.inf.ed.ac.uk/juijling/index.php#page=software/.
7.
http://spams-devel.gforge.inria.fr/.
8.
http://www.vlfeat.org/.
9.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

E. Acar, F. Hopfgartner, S. Albayrak, Detecting violent content in Hollywood Movies by mid-level audio representations, in CBMI 2013 (IEEE 2013)
Google Scholar
R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to imbalanced datasets. Mach. Learn.: ECML 2004, 39–50 (2004)
Google Scholar
S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning. Adv. Neural Inf. Process. Syst. 15, 561–568 (2002)
Google Scholar
B.J. Bushman, L.R. Huesmann, Short-term and long-term effects of violent media on aggression in children and adults. Arch. Pediatr. Adolesc. Med. 160(4), 348 (2006)
Article Google Scholar
L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Horror video scene recognition via multiple-instance learning, in ICASSP (2011)
Google Scholar
L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Violence detection in movies, in 2011 Eighth International Conference on Computer Graphics, Imaging and Visualization (CGIV) (IEEE, 2011), pp. 119–124
Google Scholar
F.D.M. de Souza, G.C. Chávez, E.A. do Valle, A. de A. Araujo. Violence detection in video using spatio-temporal features, in 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (IEEE, 2010), pp. 224–230
Google Scholar
C.-H. Demarty, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C. Penet, Benchmarking violent scenes detection in movies, in 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI) (IEEE, 2014), pp. 1–6
Google Scholar
C.-H. Demarty, C. Penet, M. Schedl, B. Ionescu, Vu L. Quang, Yu-G. Jiang, The MediaEval 2013 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2013 Workshop, Barcelona, Spain, 18–19 October 2013
Google Scholar
N. Derbas, G. Quénot, Joint audio-visual words for violent scenes detection in movies, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 483
Google Scholar
X. Ding, B. Li, W. Hu, W. Xiong, Z. Wang, Horror video scene recognition based on multi-view multi-instance learning, in Computer Vision-ACCV 2012 (Springer, 2013), pp. 599–610
Google Scholar
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
T. Giannakopoulos, D. Kosmopoulos, A. Aristidou, S. Theodoridis, Violence content classification using audio features. Adv. Artif. Intell. 3955, 502–507 (2006)
Article Google Scholar
T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, S. Theodoridis, Audio-visual fusion for detecting violent scenes in videos. Artif. Intell.: Theor. Model. Appl. 6040, 91–100 (2010)
Google Scholar
Y. Gong, W. Wang, S. Jiang, Q. Huang, W. Gao, Detecting violent scenes in movies by auditory and visual cues. Adv. Multimed. Inf. Process.-PCM 2008, 317–326 (2008)
Google Scholar
S. Goto, T. Aoki, Violent scenes detection using mid-level violence clustering. Comput. Sci. (2014)
Google Scholar
D. Hasler, S.E. Suesstrunk, Measuring colorfulness in natural images, in Electronic Imaging 2003. International Society for Optics and Photonics, pp. 87–95 (2003)
Google Scholar
T. Hassner, Y. Itcher, O. Kliper-Gross, Violent flows: real-time detection of violent crowd behavior, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2012), pp. 1–6
Google Scholar
H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
B.K. Horn, B.G. Schunck, Determining optical flow, in 1981 Technical Symposium East. International Society for Optics and Photonics, pp. 319–331 (1981)
Google Scholar
B. Ionescu, J. Schlüter, I. Mironica, M. Schedl, A naive mid-level concept-based fusion approach to violence detection in Hollywood Movies, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval (ACM, 2013), pp. 215–222
Google Scholar
O. Kliper-Gross, T. Hassner, L. Wolf, The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)
Article Google Scholar
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (IEEE, 2008), pages 1–8
Google Scholar
J. Lin, W. Wang, Weakly-supervised violence detection in movies with audio and video based co-training. Adv. Multimed. Inf. Process.-PCM 2009, 930–935 (2009)
Google Scholar
J. Machajdik, A. Hanbury, Affective image classification using features inspired by psychology and art theory, in Proceedings of the International Conference on Multimedia (ACM, 2010), pp. 83–92
Google Scholar
J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)
MathSciNet MATH Google Scholar
D. Maniry, E. Acar, F. Hopfgartner, S. Albayrak, A visualization tool for violent scenes detection, in Proceedings of ACM Conference on Multimedia Retrieval, ICMR’14 (ACM, 2014), pp. 522–523
Google Scholar
E.B. Nievas, O.D. Suarez, G.B. García, R. Sukthankar, Violence detection in video using computer vision techniques, in Computer Analysis of Images and Patterns (Springer, 2011), pp. 332–339
Google Scholar
C. Penet, C.-H. Demarty, G. Gravier, P. Gros, Multimodal information fusion and temporal integration for violence detection in movies, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2012), pp. 2393–2396
Google Scholar
C. Penet, C.-H. Demarty, G. Gravier, P. Gros et al., Technicolor/inria team at the mediaeval 2013 violent scenes detection task. MediaEval 2013 Working Notes (2013)
Google Scholar
M. Sjöberg, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C.-H. Demarty. The MediaEval 2014 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, 16–17 October 2014
Google Scholar
J.R.R. Uijlings, I. Duta, N. Rostamzadeh, N. Sebe, Realtime video classification using dense HOF/HOG, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 145
Google Scholar
H.L. Wang, L.F. Cheong, Affective understanding in film. IEEE Trans. Circuits Syst. Video Technol. 16(6), 689–704 (2006)
Article Google Scholar
W. Ting-Fan, C.-J. Lin, R.C. Weng, Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)
MATH Google Scholar
L. Xu, C. Gong, J. Yang, Q. Wu, L. Yao, Violent video detection based on MoSIFT feature and sparse coding, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2014), pp. 3538–3542
Google Scholar
M. Xu, N.C. Maddage, C. Xu, M. Kankanhalli, Q. Tian, Creating audio keywords for event detection in soccer video, in ICME’03 (IEEE 2003)
Google Scholar
R. Yan, M. Naphade, Semi-supervised cross feature learning for semantic concept detection in videos, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1 (IEEE, 2005), pp. 657–663
Google Scholar
L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in 2009 IEEE 12th International Conference on Computer Vision (IEEE, 2009), pp. 492–497
Google Scholar

Download references

Acknowledgments

The research leading to these results has received funding from the European Community FP7 under grant agreement number 261743 (NoE VideoSense). We would like to thank Technicolor (http://www.technicolor.com/) for providing the ground truth, video shot boundaries, and the corresponding keyframes which have been used in this work. Our thanks also go to Fudan University and Vietnam University of Science for providing the ground truth of the Web video dataset.

Author information

Authors and Affiliations

Technische Universität Berlin, Berlin, Germany
Esra Acar, Melanie Irrgang, Dominique Maniry & Frank Hopfgartner

Authors

Esra Acar
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Irrgang
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Maniry
View author publications
You can also search for this author in PubMed Google Scholar
Frank Hopfgartner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Esra Acar .

Editor information

Editors and Affiliations

Technische Universität Berlin, Germany
Frank Hopfgartner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Acar, E., Irrgang, M., Maniry, D., Hopfgartner, F. (2015). Detecting Violent Content in Hollywood Movies and User-Generated Videos. In: Hopfgartner, F. (eds) Smart Information Systems. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-14178-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-14178-7_11
Published: 15 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14177-0
Online ISBN: 978-3-319-14178-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics