Advertisement

A Machine Learning Approach to Detect Violent Behaviour from Video

  • David Nova
  • André Ferreira
  • Paulo CortezEmail author
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 273)

Abstract

The automatic classification of violent actions performed by two or more persons is an important task for both societal and scientific purposes. In this paper, we propose a machine learning approach, based a Support Vector Machine (SVM), to detect if a human action, captured on a video, is or not violent. Using a pose estimation algorithm, we focus mostly on feature engineering, to generate the SVM inputs. In particular, we hand-engineered a set of input features based on keypoints (angles, velocity and contact detection) and used them, under distinct combinations, to study their effect on violent behavior recognition from video. Overall, an excellent classification was achieved by the best performing SVM model, which used keypoints, angles and contact features computed over a 60 frame image input range.

Keywords

Machine learning Support Vector Machine Action recognition Pose estimation Video analysis 

Notes

Acknowledgments

The work of P. Cortez was supported by Fundação para a Ciência e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/2013.

References

  1. 1.
    Afsar, P., Cortez, P., Santos, H.: Automatic visual detection of human behavior: a review from 2000 to 2014. Expert Syst. Appl. 42(20), 6935–6956 (2015).  https://doi.org/10.1016/j.eswa.2015.05.023CrossRefGoogle Scholar
  2. 2.
    Afsar, P., Cortez, P., Santos, H.M.D.: Automatic human trajectory destination prediction from video. Expert Syst. Appl. 110, 41–51 (2018).  https://doi.org/10.1016/j.eswa.2018.03.035CrossRefGoogle Scholar
  3. 3.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  4. 4.
    Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. CoRR abs/1506.03607 (2015). http://arxiv.org/abs/1506.03607
  5. 5.
    Clarin, C.T., Dionisio, J.A.M., Echavez, M.T., Naval, P.C.: DOVE: detection of movie violence using motion intensity analysis on skin and blood. Technical report, University of the Philippines (2005)Google Scholar
  6. 6.
    Coppola, C., Faria, D., Nunes, U., Bellotto, N.: Social activity recognition based on probabilistic merging of skeleton features with proximity priors from RGB-D data. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5055–5061 (2016)Google Scholar
  7. 7.
    Datta, A., Shah, M., Lobo, N.D.V.: Person-on-person violence detection in video data. In: Object Recognition Supported by User Interaction for Service Robots, vol. 1, pp. 433–438, August 2002.  https://doi.org/10.1109/ICPR.2002.1044748
  8. 8.
    Deniz, O., Serrano, I., Bueno, G., Kim, T.: Fast violence detection in video. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 478–485, January 2014Google Scholar
  9. 9.
    Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) Pattern Recognition, pp. 517–531. Springer, Singapore (2016).  https://doi.org/10.1007/978-981-10-3002-4_43CrossRefGoogle Scholar
  10. 10.
    Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3745–3754, October 2017.  https://doi.org/10.1109/ICCV.2017.402
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
  12. 12.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. CoRR abs/1404.7584 (2014). http://arxiv.org/abs/1404.7584
  13. 13.
    Herath, S., Harandi, M.T., Porikli, F.: Going deeper into action recognition: a survey. CoRR abs/1605.04988 (2016). http://arxiv.org/abs/1605.04988
  14. 14.
    Kong, Y., Fu, Y.: Human Action Recognition and Prediction: A Survey. ArXiv e-prints, June 2018Google Scholar
  15. 15.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  16. 16.
    Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L.: Skeleton-based relational modeling for action recognition. CoRR abs/1805.02556 (2018). http://arxiv.org/abs/1805.02556
  17. 17.
    Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. CoRR abs/1607.07043 (2016). http://arxiv.org/abs/1607.07043
  18. 18.
    Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. CoRR abs/1802.09232 (2018). http://arxiv.org/abs/1802.09232
  19. 19.
    Ng, A.: Machine Learning Yearning. deeplearning.ai (2018)Google Scholar
  20. 20.
    Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Exploiting deep residual networks for human action recognition from skeletal data. CoRR abs/1803.07781 (2018). http://arxiv.org/abs/1803.07781
  21. 21.
    Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Learning and recognizing human action from skeleton movement with deep residual neural networks. CoRR abs/1803.07780 (2018). http://arxiv.org/abs/1803.07780
  22. 22.
    Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. CoRR abs/1709.06531 (2017). http://arxiv.org/abs/1709.06531
  23. 23.
    Vasconcelos, N., Lippman, A.: Towards semantically meaningful feature spaces for the characterization of video content. In: Proceedings of International Conference on Image Processing, vol. 1, pp. 25–28, October 1997.  https://doi.org/10.1109/ICIP.1997.647375
  24. 24.
    Wang, Q.: A survey of visual analysis of human motion and its applications. CoRR abs/1608.00700 (2016). http://arxiv.org/abs/1608.00700
  25. 25.
    Witten, I., Frank, E., Hall, M., Pal, C.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, San Franscico (2017)Google Scholar
  26. 26.
    Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. CoRR abs/1704.00616 (2017). http://arxiv.org/abs/1704.00616

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019

Authors and Affiliations

  1. 1.ALGORITMI Centre, Department of Information SystemsUniversity of MinhoGuimarãesPortugal
  2. 2.Department of InformaticsUniversity of MinhoBragaPortugal

Personalised recommendations