The research of abnormal behavior recognition is critical to personal and property security. In this paper, a 3D-CNN and Long Short-Term Memory (LSTM) based abnormal behavior recognition method has been proposed. The feature image composed of optical flow (OF) and motion history image (MHI) takes place of RGB image as the input of 3D-CNN. Because of the illumination changes and background jitter in complex scenes, a structural similarity background modeling method has been developed to suppress illumination variations. It is applied to updated dynamically both optical flow and motion history image. A new sample expansion method is developed to deal with the problem of abnormal behavior class imbalance. The OF and MHI feature image clips are randomly cropped firstly. Then clustering method is applied and cluster centers are collected to get new samples in quantity. LSTM with spatial temporal attention is developed to extract long-time spatial-temporal features for abnormal behavior recognition. Compared with state-of-the-art methods, our proposed method has excellent performance in abnormal behavior recognition on some challenging datasets.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Ahmed F, Tarlow D, Batra D (2015) Optimizing expected intersection-over-union with candidate-constrained CRFs. Processings of IEEE International Conference on Computer Vision:1850–1858
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Brox T, Bruhn A, Papenberg N et al (2004) High accuracy optical flow estimation based on a theory for warping. Processings of European Conference on Computer Vision:25–36
Brunet D, Vrscay ER, Wang Z (2011) On the mathematical properties of the structural similarity index. IEEE Trans Image Process 21(4):1488–1499
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Processings of IEEE Conference on Computer Vision and Pattern Recognition:886–893
Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal human activity recognition. Eng Appl Artif Intell 77(1):21–45
Donahue J, Anne Hendricks L, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Processings of IEEE Conference on Computer Vision and Pattern Recognition:2625–2634
Eum H, Yoon C, Lee H, Park M (2015) Continuous human action recognition using depth-MHI-HOG and a spotter model. Sensors 15(3):5197–5227
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. Scandinavian conference on Image analysis:363–370
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. Processings of IEEE Conference on Computer Vision and Pattern Recognition:1933–1941
Frederick RI, Bowden SC (2009) The test validation summary. Assessment 16(3):215–236
Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3D residual networks for action recognition. Processings of IEEE International Conference on Computer Vision:3154–3160
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Processings of IEEE Conference on Computer Vision and Pattern Recognition:6546–6555
How DNT, Loo CK, Sahari KSM (2016) Behavior recognition for humanoid robots using long short-term memory. Int J Adv Robot Syst 13(6):1–14
Ijjina EP, Chalavadi KM (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59(11):199–212
Khong VM, Tran TH (2018) Improving human action recognition with two-stream 3d convolutional neural network. Processings of International Conference on Multimedia Analysis and Pattern Recognition:1–6
Kuehne H, Jhuang H, Garrote E et al (2011) HMDB: a large video database for human motion recognition. Processings of IEEE International Conference on Computer Vision:2556–2563
Kumaran N, Vadivel A, Kumar SS (2018) Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance. Multimed Tools Appl 77(18):23115–23147
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. Processings of IEEE Conference on Computer Vision and Pattern Recognition:1996–2003
Luo Y, Guan YP (2015) Motion objects segmentation based on structural similarity background modelling. IET Comput Vis 9(4):476–488
Mabrouk AB, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91(1):480–491
Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensors 17(11):2556
Murtaza F, Yousaf MH, Velastin SA (2016) Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description. IET Comput Vis 10(7):758–767
Naveenkumar M, Vadivel A (2016) 3-D projected PCA based DMM feature fusing with SMO-SVM for human action recognition. 89(8):759–763
Núñez-Marcos A, Azkune G, Arganda-Carreras I (2017) Vision-based fall detection with convolutional neural networks. Wirel Commun Mob Comput 2017(12):1–16
Rani IA, Vadivel A (2019) Human activity recognition on multivariate time series data: a technical review. Processings of ICDSMLA 2020:356–364
Shi J, Wen H, Zhang Y, Han K, Liu Z (2018) Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Hum Brain Mapp 39(5):2269–2282
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human action classes from videos in the wild. CRCV-TR-12-01
Tay NC, Connie T, Ong TS et al (2019) A robust abnormal behavior detection method using convolutional neural network. Processings of Computational Science and Technology:37–47
Tay NC, Tee C, Ong TS et al (2019) Abnormal behavior recognition using CNN-LSTM with attention mechanism. Processings of International Conference on Electrical, Control and Instrumentation Engineering, pp 1–5
The EC Funded CAVIAR project. http://homepages.inf.ed.ac.uk/rbf/CAVIAR/
The Web Datasets. http://www.vision.eecs.ucf.edu/projects/rmehran/cvpr2009/Abnormal_Crowd.html.
Ullah A, Ahmad J, Muhammad K et al (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6(11):1155–1166
UMN: Unusual Crowd Activity Dataset of University of Minnesota. http://mha.cs.umn.edu/Movies/Crowd- Activity-All.avi
Xie S, Guan Y (2016) Motion instability based unsupervised online abnormal behaviors detection. Multimed Tools Appl 75(12):7423–7444
This work is supported in part by the National Natural Science Foundation of China (Grant no. 11176016, 60872117), National Key R&D Program of China (Grant no. 2019YFC15 2050, 2020YFC1523004), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant no. 20123108110014).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Guan, Y., Hu, W. & Hu, X. Abnormal behavior recognition using 3D-CNN combined with LSTM. Multimed Tools Appl (2021). https://doi.org/10.1007/s11042-021-10667-9
- Abnormal behavior recognition
- Optical flow
- Motion history image
- 3D convolutional neural networks
- Long short-term memory
- Spatial temporal attention