Abstract
The automatic classification of violent actions performed by two or more persons is an important task for both societal and scientific purposes. In this paper, we propose a machine learning approach, based a Support Vector Machine (SVM), to detect if a human action, captured on a video, is or not violent. Using a pose estimation algorithm, we focus mostly on feature engineering, to generate the SVM inputs. In particular, we hand-engineered a set of input features based on keypoints (angles, velocity and contact detection) and used them, under distinct combinations, to study their effect on violent behavior recognition from video. Overall, an excellent classification was achieved by the best performing SVM model, which used keypoints, angles and contact features computed over a 60 frame image input range.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afsar, P., Cortez, P., Santos, H.: Automatic visual detection of human behavior: a review from 2000 to 2014. Expert Syst. Appl. 42(20), 6935–6956 (2015). https://doi.org/10.1016/j.eswa.2015.05.023
Afsar, P., Cortez, P., Santos, H.M.D.: Automatic human trajectory destination prediction from video. Expert Syst. Appl. 110, 41–51 (2018). https://doi.org/10.1016/j.eswa.2018.03.035
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. CoRR abs/1506.03607 (2015). http://arxiv.org/abs/1506.03607
Clarin, C.T., Dionisio, J.A.M., Echavez, M.T., Naval, P.C.: DOVE: detection of movie violence using motion intensity analysis on skin and blood. Technical report, University of the Philippines (2005)
Coppola, C., Faria, D., Nunes, U., Bellotto, N.: Social activity recognition based on probabilistic merging of skeleton features with proximity priors from RGB-D data. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5055–5061 (2016)
Datta, A., Shah, M., Lobo, N.D.V.: Person-on-person violence detection in video data. In: Object Recognition Supported by User Interaction for Service Robots, vol. 1, pp. 433–438, August 2002. https://doi.org/10.1109/ICPR.2002.1044748
Deniz, O., Serrano, I., Bueno, G., Kim, T.: Fast violence detection in video. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 478–485, January 2014
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) Pattern Recognition, pp. 517–531. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3002-4_43
Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3745–3754, October 2017. https://doi.org/10.1109/ICCV.2017.402
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. CoRR abs/1404.7584 (2014). http://arxiv.org/abs/1404.7584
Herath, S., Harandi, M.T., Porikli, F.: Going deeper into action recognition: a survey. CoRR abs/1605.04988 (2016). http://arxiv.org/abs/1605.04988
Kong, Y., Fu, Y.: Human Action Recognition and Prediction: A Survey. ArXiv e-prints, June 2018
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L.: Skeleton-based relational modeling for action recognition. CoRR abs/1805.02556 (2018). http://arxiv.org/abs/1805.02556
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. CoRR abs/1607.07043 (2016). http://arxiv.org/abs/1607.07043
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. CoRR abs/1802.09232 (2018). http://arxiv.org/abs/1802.09232
Ng, A.: Machine Learning Yearning. deeplearning.ai (2018)
Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Exploiting deep residual networks for human action recognition from skeletal data. CoRR abs/1803.07781 (2018). http://arxiv.org/abs/1803.07781
Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Learning and recognizing human action from skeleton movement with deep residual neural networks. CoRR abs/1803.07780 (2018). http://arxiv.org/abs/1803.07780
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. CoRR abs/1709.06531 (2017). http://arxiv.org/abs/1709.06531
Vasconcelos, N., Lippman, A.: Towards semantically meaningful feature spaces for the characterization of video content. In: Proceedings of International Conference on Image Processing, vol. 1, pp. 25–28, October 1997. https://doi.org/10.1109/ICIP.1997.647375
Wang, Q.: A survey of visual analysis of human motion and its applications. CoRR abs/1608.00700 (2016). http://arxiv.org/abs/1608.00700
Witten, I., Frank, E., Hall, M., Pal, C.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, San Franscico (2017)
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. CoRR abs/1704.00616 (2017). http://arxiv.org/abs/1704.00616
Acknowledgments
The work of P. Cortez was supported by Fundação para a Ciência e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Nova, D., Ferreira, A., Cortez, P. (2019). A Machine Learning Approach to Detect Violent Behaviour from Video. In: Cortez, P., Magalhães, L., Branco, P., Portela, C., Adão, T. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 273. Springer, Cham. https://doi.org/10.1007/978-3-030-16447-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-16447-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16446-1
Online ISBN: 978-3-030-16447-8
eBook Packages: Computer ScienceComputer Science (R0)