Abnormal behavior recognition using 3D-CNN combined with LSTM


The research of abnormal behavior recognition is critical to personal and property security. In this paper, a 3D-CNN and Long Short-Term Memory (LSTM) based abnormal behavior recognition method has been proposed. The feature image composed of optical flow (OF) and motion history image (MHI) takes place of RGB image as the input of 3D-CNN. Because of the illumination changes and background jitter in complex scenes, a structural similarity background modeling method has been developed to suppress illumination variations. It is applied to updated dynamically both optical flow and motion history image. A new sample expansion method is developed to deal with the problem of abnormal behavior class imbalance. The OF and MHI feature image clips are randomly cropped firstly. Then clustering method is applied and cluster centers are collected to get new samples in quantity. LSTM with spatial temporal attention is developed to extract long-time spatial-temporal features for abnormal behavior recognition. Compared with state-of-the-art methods, our proposed method has excellent performance in abnormal behavior recognition on some challenging datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Ahmed F, Tarlow D, Batra D (2015) Optimizing expected intersection-over-union with candidate-constrained CRFs. Processings of IEEE International Conference on Computer Vision:1850–1858

  2. 2.

    Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  3. 3.

    Brox T, Bruhn A, Papenberg N et al (2004) High accuracy optical flow estimation based on a theory for warping. Processings of European Conference on Computer Vision:25–36

  4. 4.

    Brunet D, Vrscay ER, Wang Z (2011) On the mathematical properties of the structural similarity index. IEEE Trans Image Process 21(4):1488–1499

    MathSciNet  Article  Google Scholar 

  5. 5.

    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Processings of IEEE Conference on Computer Vision and Pattern Recognition:886–893

  6. 6.

    Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal human activity recognition. Eng Appl Artif Intell 77(1):21–45

    Article  Google Scholar 

  7. 7.

    Donahue J, Anne Hendricks L, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Processings of IEEE Conference on Computer Vision and Pattern Recognition:2625–2634

  8. 8.

    Eum H, Yoon C, Lee H, Park M (2015) Continuous human action recognition using depth-MHI-HOG and a spotter model. Sensors 15(3):5197–5227

    Article  Google Scholar 

  9. 9.

    Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. Scandinavian conference on Image analysis:363–370

  10. 10.

    Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. Processings of IEEE Conference on Computer Vision and Pattern Recognition:1933–1941

  11. 11.

    Frederick RI, Bowden SC (2009) The test validation summary. Assessment 16(3):215–236

    Article  Google Scholar 

  12. 12.

    Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3D residual networks for action recognition. Processings of IEEE International Conference on Computer Vision:3154–3160

  13. 13.

    Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Processings of IEEE Conference on Computer Vision and Pattern Recognition:6546–6555

  14. 14.

    How DNT, Loo CK, Sahari KSM (2016) Behavior recognition for humanoid robots using long short-term memory. Int J Adv Robot Syst 13(6):1–14

    Article  Google Scholar 

  15. 15.

    Ijjina EP, Chalavadi KM (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59(11):199–212

    Article  Google Scholar 

  16. 16.

    Khong VM, Tran TH (2018) Improving human action recognition with two-stream 3d convolutional neural network. Processings of International Conference on Multimedia Analysis and Pattern Recognition:1–6

  17. 17.

    Kuehne H, Jhuang H, Garrote E et al (2011) HMDB: a large video database for human motion recognition. Processings of IEEE International Conference on Computer Vision:2556–2563

  18. 18.

    Kumaran N, Vadivel A, Kumar SS (2018) Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance. Multimed Tools Appl 77(18):23115–23147

    Article  Google Scholar 

  19. 19.

    Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. Processings of IEEE Conference on Computer Vision and Pattern Recognition:1996–2003

  20. 20.

    Luo Y, Guan YP (2015) Motion objects segmentation based on structural similarity background modelling. IET Comput Vis 9(4):476–488

    MathSciNet  Article  Google Scholar 

  21. 21.

    Mabrouk AB, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91(1):480–491

    Article  Google Scholar 

  22. 22.

    Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensors 17(11):2556

    Article  Google Scholar 

  23. 23.

    Murtaza F, Yousaf MH, Velastin SA (2016) Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description. IET Comput Vis 10(7):758–767

    Article  Google Scholar 

  24. 24.

    Naveenkumar M, Vadivel A (2016) 3-D projected PCA based DMM feature fusing with SMO-SVM for human action recognition. 89(8):759–763

  25. 25.

    Núñez-Marcos A, Azkune G, Arganda-Carreras I (2017) Vision-based fall detection with convolutional neural networks. Wirel Commun Mob Comput 2017(12):1–16

    Article  Google Scholar 

  26. 26.

    Rani IA, Vadivel A (2019) Human activity recognition on multivariate time series data: a technical review. Processings of ICDSMLA 2020:356–364

    Google Scholar 

  27. 27.

    Shi J, Wen H, Zhang Y, Han K, Liu Z (2018) Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Hum Brain Mapp 39(5):2269–2282

    Article  Google Scholar 

  28. 28.

    Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human action classes from videos in the wild. CRCV-TR-12-01

  29. 29.

    Tay NC, Connie T, Ong TS et al (2019) A robust abnormal behavior detection method using convolutional neural network. Processings of Computational Science and Technology:37–47

  30. 30.

    Tay NC, Tee C, Ong TS et al (2019) Abnormal behavior recognition using CNN-LSTM with attention mechanism. Processings of International Conference on Electrical, Control and Instrumentation Engineering, pp 1–5

    Google Scholar 

  31. 31.

    The EC Funded CAVIAR project. http://homepages.inf.ed.ac.uk/rbf/CAVIAR/

  32. 32.

    The Web Datasets. http://www.vision.eecs.ucf.edu/projects/rmehran/cvpr2009/Abnormal_Crowd.html.

  33. 33.

    Ullah A, Ahmad J, Muhammad K et al (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6(11):1155–1166

    Google Scholar 

  34. 34.

    UMN: Unusual Crowd Activity Dataset of University of Minnesota. http://mha.cs.umn.edu/Movies/Crowd- Activity-All.avi

  35. 35.

    Xie S, Guan Y (2016) Motion instability based unsupervised online abnormal behaviors detection. Multimed Tools Appl 75(12):7423–7444

    Article  Google Scholar 

Download references


This work is supported in part by the National Natural Science Foundation of China (Grant no. 11176016, 60872117), National Key R&D Program of China (Grant no. 2019YFC15 2050, 2020YFC1523004), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant no. 20123108110014).

Author information



Corresponding author

Correspondence to Yepeng Guan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Guan, Y., Hu, W. & Hu, X. Abnormal behavior recognition using 3D-CNN combined with LSTM. Multimed Tools Appl (2021). https://doi.org/10.1007/s11042-021-10667-9

Download citation


  • Abnormal behavior recognition
  • Optical flow
  • Motion history image
  • 3D convolutional neural networks
  • Long short-term memory
  • Spatial temporal attention