Surveillance videos classification based on multilayer long short-term memory networks

  • 15 Accesses


Image classification and video recognition are always a key issue in computer vision. Until now, the recognition of videos has not achieved good results in some application filed, such as the recognition of surveillance videos. In order to achieve better recognition results, in this paper, we propose a new algorithm to recognize video by five coherent pictures. Firstly, the features of the video frames are extracted by Resnet, and then the features are sent to a 2-layer LSTM for processing, and finally classification by gathering the fully connected layer. We use the collected shipping data as a dataset to detect the algorithm model in this paper. The results of experiment show that the recognition of the proposed algorithm are better than other methods, and the total accuracy increased from 0.967 to 0.981.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Ballas N, Yao L, Pal C et al (2015) Delving deeper into convolutional networks for learning video representations[J]. Comput Sci

  2. 2.

    Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In ECCV 5

  3. 3.

    Chen YN, Han CC, Wang CT et al (2006) The application of a convolution neural network on face and license plate detection[C]. Int Conf Pattern Recogn IEEE Comput Soc 552–555

  4. 4.

    Chung J, Gulcehre C, Cho KH et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. Eprint Arxiv

  5. 5.

    Deng J, Dong W, Socher R et al (2009) ImageNet: A large-scale hierarchical image database[C]. Comput Vis Pattern Recogn 2009. CVPR 2009. IEEE Conference IEEE 248–255

  6. 6.

    Deutsch. Supervised Sequence Labelling with Recurrent Neural Networks | Springer[J]. Springer-Verlag Berlin Heidelberg, 2012

  7. 7.

    Donahue J, Hendricks LA, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description[C]. Comput Vis Pattern Recogn IEEE 677

  8. 8.

    Donahue J, Hendricks LA, Rohrbach M et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2625–2634

  9. 9.

    Glorot X, Bordes A, Bengio Y (2012) Deep Sparse Rectifier Neural Networks[C]. Int Conf Art Intell Stat 315–323

  10. 10.

    Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 273–278

  11. 11.

    He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. IEEE Conf Comput Vis Pattern Recogn IEEE Comput Soc 770–778

  12. 12.

    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks[J]. Science 313(5786):504–507

  13. 13.

    Hochreiter S (1998) Recurrent neural net learning and vanishing gradient[J]

  14. 14.

    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. 448–456

  15. 15.

    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35(1):221–231 2, 5

  16. 16.

    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. Proceedings of the ACM International Conference on Image and Video Retrieval. Amsterdam, Netherlands, 494–501

  17. 17.

    Kiperwasser E, Goldberg Y (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations[J]

  18. 18.

    Kolen JF, Kremer SC (2001) Gradient flow in recurrent nets: the difficulty of learning long term dependencies[J]. 28(2):237–243

  19. 19.

    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]. Int Conf Neural Inf Process Syst. Curran Associates Inc. 1097–1105

  20. 20.

    Lecun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551

  21. 21.

    Ng YH, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: deep networks for video classification[J]

  22. 22.

    Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks[C]. International Conference on International Conference on Machine Learning., III-1310

  23. 23.

    Rumelhart DE, Hinton GE et al (1986) Learning representations by back-propagating errors[J]. 323(6088):399–421

  24. 24.

    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199, 2, 5, 6

  25. 25.

    Simonyan K, Zisserman A (2014) Two-stream convolutional networks foraction recognition in videos. Proceedings of the International Conference on neural information processing systems. Montreal, Canada, 568–576

  26. 26.

    Sutskever I (2013) Training recurrent neural networks[J]. Doctoral

  27. 27.

    Szarvas M, Yoshizawa A, Yamamoto M et al (2005) Pedestrian detection with convolutional neural networks[C]. Intelligent Vehicles Symposium, 2005. Proc IEEE IEEE 224–229

  28. 28.

    Szegedy C, Ioffe S, Vanhoucke V et al (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning[J]

  29. 29.

    Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. IEEE Conf Comput Vis Pattern Recogn IEEE 1–9

  30. 30.

    Tivive FHC, Bouzerdoum A (2003) A new class of convolutional neural networks (SICoNNets) and their application of face detection[C]. International Joint Conference on Neural Networks. IEEE 3:2157–2162

  31. 31.

    Tivive FHC, Bouzerdown A (2006) An eye feature detector based on convolutional neural network[C]. Eighth Int Symp Signal Process Applic IEEE 90–93

  32. 32.

    Tran D, Bourdev L, Fergus R et al (2014) Learning spatiotemporal features with 3D convolutional networks[J]

  33. 33.

    Wang X, Ji Q (2015) Video event recognition with deep hierarchical context model. Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA, 4418–4427

  34. 34.

    Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification[C]. Proc IEEE Conf Comput Vis Pattern Recogn. Piscataway, NJ: IEEE Press, 1794–1801

  35. 35.

    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks[C]. European Conference on Computer Vision. Cham, Switzerland: Springer International Publishing AG, 818–833

  36. 36.

    Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 38(10):1943–1955

  37. 37.

    Zhu L, Xu Z, Yang Y et al (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421

Download references


This research is supported by the National Natural Science Foundation of China (No.61373109, No. 61602349), the Educational Research Project from the Educational Commission of Hubei Province (2016234).

Author information

Correspondence to Hong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Zhao, L. & Dai, G. Surveillance videos classification based on multilayer long short-term memory networks. Multimed Tools Appl (2020).

Download citation


  • Video recognition
  • Deep learning
  • Resnet
  • LSTM