Video Surveillance for Violence Detection Using Deep Learning

  • Manan SharmaEmail author
  • Rishabh Baghel
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 37)


In order to detect violence through surveillance cameras, we provide a neural architecture which can sense violence and can be a measure to prevent any chaos. This architecture uses a pre-trained ResNet-50 model to extract features from the video frames and then feeds them further into a ConvLSTM block. We use a short-term difference of video frames to provide more robustness in order to get rid of occlusions and discrepancies. Convolutional neural networks allow us to get more concentrated spatio-temporal features in the frames, which aids the sequential nature of videos to be fed in LSTMs. The model incorporates a pre-trained convolutional neural network connected to convolutional LSTM layer. The model takes raw videos as an input, converts it into frames, and outputs a binary classification of violence or non-violence label. We have pre-processed the video frames using cropping, dark-edge removal, and other data augmentation techniques to make data get rid of unnecessary details. For evaluation of the performance of our proposed method, three standard public datasets were used, and accuracy as the metric evaluation is used.


Violence detection Residual networks (ResNets) Convolutional long short-term memory cells (ConvLSTM) Deep learning 


  1. 1.
    F.D. De Souza, G.C. Chavez, E.A. do Valle Jr., A.D.A. Arajo, Violence detection in video using spatio-temporal features. in Graphics, Patterns and Images (SIBGRAPI), 2010 23rd SIBGRAPI Conference on (IEEE, 2010), pp. 224–230Google Scholar
  2. 2.
    P. Bilinski, F. Bremond, Human violence recognition and detection in surveillance videos. in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2016), pp. 30–36Google Scholar
  3. 3.
    A. Datta, M. Shah, N.D.V. Lobo, Person-on-person violence detection in video data. in Pattern Recognition, 2002. Proceedings. 16th Inter-national Conference on vol. 1 (IEEE, 2002), pp. 433–438Google Scholar
  4. 4.
    J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venu-gopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 2625–2634Google Scholar
  5. 5.
    T. Giannakopoulos, A. Pikrakis, S. Theodoridis, A multi-class audio classification method with respect to violent content in movies using bayesian networks. in Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on (IEEE, 2007), pp. 90–93Google Scholar
  6. 6.
    I.S. Gracia, O.D. Suarez, G.B. Garcia, T.K. Kim, Fast ght detection. PLoS ONE 10(4), e0120448 (2015)CrossRefGoogle Scholar
  7. 7.
    T. Hassner, Y. Itcher, O. Kliper-Gross, Violent flows: real-time detection of violent crowd behavior. in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on (IEEE, 2012), pp. 1–6Google Scholar
  8. 8.
    S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. in Advances in Neural Information Processing Systems (2012), pp. 1097–1105Google Scholar
  10. 10.
    J.R. Medel, A. Savakis, Anomaly detection in video using predictive convolutional long short-term memory networks (2016). arXiv preprint arXiv:1612.00390
  11. 11.
    S. Mohammadi, H. Kiani, A. Perina, V. Murino, Violence detection in crowded scenes using substantial derivative. in 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2015), pp. 1–6Google Scholar
  12. 12.
    E.B. Nievas, O.D. Suarez, G.B. Garca, R. Sukthankar, Violence detection in video using computer vision techniques. in International Conference on Computer Analysis of Images and Patterns (Springer, Berlin, Heidelberg, 2011), pp. 332–339Google Scholar
  13. 13.
    P. Rota, N. Conci, N. Sebe, J.M. Rehg, Real-life violent social interaction detection. in Image Processing (ICIP), 2015 IEEE International Conference on (IEEE, 2015), pp. 3456–3460Google Scholar
  14. 14.
    I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural networks. in Advances in Neural Information Processing Systems (2014), pp. 3104–3112Google Scholar
  15. 15.
    K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos. in Advances in Neural Information Processing Systems (2014), pp. 568–576Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Indian Institute of Information Technology GuwahatiGuwahatiIndia

Personalised recommendations