Abstract
Detecting violent scenes in videos is an important content understanding functionality, e.g., for providing automated youth protection services. The key issues in designing violence detection algorithms are the choice of discriminative features and learning effective models. We employ low and mid-level audio-visual features and evaluate their discriminative power within the context of the MediaEval Violent Scenes Detection (VSD) task. The audio-visual cues are fused at the decision level. As audio features, Mel-Frequency Cepstral Coefficients (MFCC), and as visual features dense histogram of oriented gradient (HoG), histogram of oriented optical flow (HoF), Violent Flows (ViF), and affect-related color descriptors are used. We perform feature space partitioning of the violence training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs). The experimental results in Hollywood movies and short web videos show that mid-level audio features are more discriminative than the visual features, and that the performance is further enhanced by fusing the audio-visual cues at the decision level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
Annotations were made available by Fudan University, Vietnam University of Science, and Technicolor.
- 5.
- 6.
- 7.
- 8.
- 9.
References
E. Acar, F. Hopfgartner, S. Albayrak, Detecting violent content in Hollywood Movies by mid-level audio representations, in CBMI 2013 (IEEE 2013)
R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to imbalanced datasets. Mach. Learn.: ECML 2004, 39–50 (2004)
S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning. Adv. Neural Inf. Process. Syst. 15, 561–568 (2002)
B.J. Bushman, L.R. Huesmann, Short-term and long-term effects of violent media on aggression in children and adults. Arch. Pediatr. Adolesc. Med. 160(4), 348 (2006)
L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Horror video scene recognition via multiple-instance learning, in ICASSP (2011)
L.-H. Chen, H.-W. Hsu, L.-Y. Wang, C.-W. Su, Violence detection in movies, in 2011 Eighth International Conference on Computer Graphics, Imaging and Visualization (CGIV) (IEEE, 2011), pp. 119–124
F.D.M. de Souza, G.C. Chávez, E.A. do Valle, A. de A. Araujo. Violence detection in video using spatio-temporal features, in 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (IEEE, 2010), pp. 224–230
C.-H. Demarty, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C. Penet, Benchmarking violent scenes detection in movies, in 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI) (IEEE, 2014), pp. 1–6
C.-H. Demarty, C. Penet, M. Schedl, B. Ionescu, Vu L. Quang, Yu-G. Jiang, The MediaEval 2013 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2013 Workshop, Barcelona, Spain, 18–19 October 2013
N. Derbas, G. Quénot, Joint audio-visual words for violent scenes detection in movies, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 483
X. Ding, B. Li, W. Hu, W. Xiong, Z. Wang, Horror video scene recognition based on multi-view multi-instance learning, in Computer Vision-ACCV 2012 (Springer, 2013), pp. 599–610
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
T. Giannakopoulos, D. Kosmopoulos, A. Aristidou, S. Theodoridis, Violence content classification using audio features. Adv. Artif. Intell. 3955, 502–507 (2006)
T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, S. Theodoridis, Audio-visual fusion for detecting violent scenes in videos. Artif. Intell.: Theor. Model. Appl. 6040, 91–100 (2010)
Y. Gong, W. Wang, S. Jiang, Q. Huang, W. Gao, Detecting violent scenes in movies by auditory and visual cues. Adv. Multimed. Inf. Process.-PCM 2008, 317–326 (2008)
S. Goto, T. Aoki, Violent scenes detection using mid-level violence clustering. Comput. Sci. (2014)
D. Hasler, S.E. Suesstrunk, Measuring colorfulness in natural images, in Electronic Imaging 2003. International Society for Optics and Photonics, pp. 87–95 (2003)
T. Hassner, Y. Itcher, O. Kliper-Gross, Violent flows: real-time detection of violent crowd behavior, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2012), pp. 1–6
H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
B.K. Horn, B.G. Schunck, Determining optical flow, in 1981 Technical Symposium East. International Society for Optics and Photonics, pp. 319–331 (1981)
B. Ionescu, J. Schlüter, I. Mironica, M. Schedl, A naive mid-level concept-based fusion approach to violence detection in Hollywood Movies, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval (ACM, 2013), pp. 215–222
O. Kliper-Gross, T. Hassner, L. Wolf, The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (IEEE, 2008), pages 1–8
J. Lin, W. Wang, Weakly-supervised violence detection in movies with audio and video based co-training. Adv. Multimed. Inf. Process.-PCM 2009, 930–935 (2009)
J. Machajdik, A. Hanbury, Affective image classification using features inspired by psychology and art theory, in Proceedings of the International Conference on Multimedia (ACM, 2010), pp. 83–92
J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)
D. Maniry, E. Acar, F. Hopfgartner, S. Albayrak, A visualization tool for violent scenes detection, in Proceedings of ACM Conference on Multimedia Retrieval, ICMR’14 (ACM, 2014), pp. 522–523
E.B. Nievas, O.D. Suarez, G.B. García, R. Sukthankar, Violence detection in video using computer vision techniques, in Computer Analysis of Images and Patterns (Springer, 2011), pp. 332–339
C. Penet, C.-H. Demarty, G. Gravier, P. Gros, Multimodal information fusion and temporal integration for violence detection in movies, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2012), pp. 2393–2396
C. Penet, C.-H. Demarty, G. Gravier, P. Gros et al., Technicolor/inria team at the mediaeval 2013 violent scenes detection task. MediaEval 2013 Working Notes (2013)
M. Sjöberg, B. Ionescu, Y.-G. Jiang, V.L. Quang, M. Schedl, C.-H. Demarty. The MediaEval 2014 affect task: violent scenes detection, in Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, 16–17 October 2014
J.R.R. Uijlings, I. Duta, N. Rostamzadeh, N. Sebe, Realtime video classification using dense HOF/HOG, in Proceedings of International Conference on Multimedia Retrieval (ACM, 2014), p. 145
H.L. Wang, L.F. Cheong, Affective understanding in film. IEEE Trans. Circuits Syst. Video Technol. 16(6), 689–704 (2006)
W. Ting-Fan, C.-J. Lin, R.C. Weng, Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)
L. Xu, C. Gong, J. Yang, Q. Wu, L. Yao, Violent video detection based on MoSIFT feature and sparse coding, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2014), pp. 3538–3542
M. Xu, N.C. Maddage, C. Xu, M. Kankanhalli, Q. Tian, Creating audio keywords for event detection in soccer video, in ICME’03 (IEEE 2003)
R. Yan, M. Naphade, Semi-supervised cross feature learning for semantic concept detection in videos, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1 (IEEE, 2005), pp. 657–663
L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in 2009 IEEE 12th International Conference on Computer Vision (IEEE, 2009), pp. 492–497
Acknowledgments
The research leading to these results has received funding from the European Community FP7 under grant agreement number 261743 (NoE VideoSense). We would like to thank Technicolor (http://www.technicolor.com/) for providing the ground truth, video shot boundaries, and the corresponding keyframes which have been used in this work. Our thanks also go to Fudan University and Vietnam University of Science for providing the ground truth of the Web video dataset.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Acar, E., Irrgang, M., Maniry, D., Hopfgartner, F. (2015). Detecting Violent Content in Hollywood Movies and User-Generated Videos. In: Hopfgartner, F. (eds) Smart Information Systems. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-14178-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-14178-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14177-0
Online ISBN: 978-3-319-14178-7
eBook Packages: Computer ScienceComputer Science (R0)