Advertisement

Spatiotemporal wavelet correlogram for human action recognition

  • Hamid Abrishami MoghaddamEmail author
  • Amin Zare
Regular Paper

Abstract

In this paper, we present a spatiotemporal wavelet correlogram (STWC) as a new feature for human action recognition (HAR) in videos. The proposed feature benefits from a different approach with respect to bag of visual words, interest point detection and descriptor representation method. The new approach requires neither motion estimation (tracking) nor background/foreground subtraction. STWC is generated more efficiently compared to the state-of-the-art HAR methods and achieves comparable results. STWC utilizes the multi-scale, multi-resolution property of wavelet transform and considers the correlation of wavelet coefficients. It is generated by computing spatiotemporal correlogram of quantized wavelet coefficients. These coefficients are computed using 3D wavelet decomposition and a simple quantization method. Based on the present findings, recommendations are made for the selection of the richest wavelet subbands to compute STWC.

Keywords

Spatiotemporal wavelet correlogram Autocorrelogram subvector Quantized coefficients Wavelet subbands Human action recognition 3D discrete wavelet transform 

Notes

References

  1. 1.
    Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491CrossRefGoogle Scholar
  2. 2.
    Mühling M, Meister M, Korfhage N et al (2018) Content-based video retrieval in historical collections of the German broadcasting archive. Int J Digit Libr.  https://doi.org/10.1007/s00799-018-0236-z Google Scholar
  3. 3.
    Deng M, Wang C, Cheng F, Zeng W (2017) Fusion of spatial-temporal and kinematic features for gait recognition with deterministic learning. Pattern Recognit 67:186–200CrossRefGoogle Scholar
  4. 4.
    Jiang Y, Wang J, Liang Y, Xia J (2018) Combining static and dynamic features for real-time moving pedestrian detection. Multimed Tools Appl.  https://doi.org/10.1007/s11042-018-6057-7 Google Scholar
  5. 5.
    Ullah MM, Laptev I (2012) Actlets: a novel local representation for human action recognition in video. In: 19th IEEE international conference on image processing (ICIP). IEEE, pp 777–780Google Scholar
  6. 6.
    Zhou Q, Wang G (2012) Atomic action features: a new feature for action recognition. In: Computer vision—ECCV. Workshops and demonstrations lecture notes in computer science. pp 291–300Google Scholar
  7. 7.
    Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent Part A 40:159–167.  https://doi.org/10.1016/j.jvcir.2016.06.023 CrossRefGoogle Scholar
  8. 8.
    Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257CrossRefGoogle Scholar
  9. 9.
    Lu G, Kudo M (2014) Learning action patterns in difference images for efficient action recognition. Neurocomputing 123:328–336CrossRefGoogle Scholar
  10. 10.
    Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recognit 81:443–455.  https://doi.org/10.1016/j.patcog.2018.04.015 CrossRefGoogle Scholar
  11. 11.
    Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference computer vision and pattern recognition, pp 1–8Google Scholar
  12. 12.
    Tran D, Bourdev L, Fergus R, et al (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497Google Scholar
  13. 13.
    Baccouche M, Mamalet F, Wolf C et al (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39CrossRefGoogle Scholar
  14. 14.
    Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518.  https://doi.org/10.1016/j.patcog.2005.05.010 CrossRefGoogle Scholar
  15. 15.
    Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50:283–339.  https://doi.org/10.1007/s10462-017-9545-7 CrossRefGoogle Scholar
  16. 16.
    Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput Part 2 55:42–52.  https://doi.org/10.1016/j.imavis.2016.06.007 CrossRefGoogle Scholar
  17. 17.
    Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013Google Scholar
  18. 18.
    Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit 4:556–567CrossRefGoogle Scholar
  19. 19.
    Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 1–8Google Scholar
  20. 20.
    Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI’81 proceedings of the 7th international joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc, pp 674–679Google Scholar
  21. 21.
    Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939Google Scholar
  22. 22.
    Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 514–521Google Scholar
  23. 23.
    Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79MathSciNetCrossRefGoogle Scholar
  24. 24.
    Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp 147–151Google Scholar
  25. 25.
    Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and pattern recognition, pp 1996–2003Google Scholar
  26. 26.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human Detection. In: Computer vision and pattern recognition (CVPR), pp 886–893Google Scholar
  27. 27.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Comput Vis 60:91–110CrossRefGoogle Scholar
  28. 28.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359.  https://doi.org/10.1016/j.cviu.2007.09.014 CrossRefGoogle Scholar
  29. 29.
    Laptev I (2003) Space-time interest points. Comput Vis 64:107–123CrossRefGoogle Scholar
  30. 30.
    Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, pp 650–663Google Scholar
  31. 31.
    Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British machine vision conferenceGoogle Scholar
  32. 32.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRefGoogle Scholar
  33. 33.
    Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314Google Scholar
  34. 34.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576Google Scholar
  35. 35.
    Nguyen T-V, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25:77–86CrossRefGoogle Scholar
  36. 36.
    Bobick A, Davis J (2001) The recognition of human movement using temporal templates. Pattern Anal Mach Intell 23:257–267CrossRefGoogle Scholar
  37. 37.
    Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187zbMATHGoogle Scholar
  38. 38.
    Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng.  https://doi.org/10.1016/j.compeleceng.2018.01.037 Google Scholar
  39. 39.
    Castro-Muñoz G, Martínez-Carballido J, Rosas-Romero R (2015) A human action recognition approach with a novel reduced feature set based on the natural domain knowledge of the human figure. Signal Process Image Commun 30:190–205CrossRefGoogle Scholar
  40. 40.
    Huang J, Kumar SR, Mitra M et al (1997) Image indexing using color correlograms. Comput Vis Pattern Recognit.  https://doi.org/10.1109/cvpr.1997.609412 Google Scholar
  41. 41.
    Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Pattern Anal Mach Intell 11:674–693CrossRefzbMATHGoogle Scholar
  42. 42.
    Rahman Ahad MA, Islam MN, Jahan I (2016) Action recognition based on binary patterns of action-history and histogram of oriented gradient. J Multimodal User Interfaces 10:335–344.  https://doi.org/10.1007/s12193-016-0229-4 CrossRefGoogle Scholar
  43. 43.
    Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27CrossRefGoogle Scholar
  44. 44.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of IEEE international conference pattern recognition pp 32–36Google Scholar
  45. 45.
    Ji S, Yang M, Yu K et al (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231.  https://doi.org/10.1109/TPAMI.2012.59 CrossRefGoogle Scholar
  46. 46.
    Charalampous K, Gasteratos A (2016) On-line deep learning method for action recognition. Pattern Anal Appl 19:337–354.  https://doi.org/10.1007/s10044-014-0404-8 MathSciNetCrossRefGoogle Scholar
  47. 47.
    Wang S, Ma Z, Yang Y et al (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16:289–298CrossRefGoogle Scholar
  48. 48.
    Dou JL (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896CrossRefGoogle Scholar
  49. 49.
    Yu J, Jeon M, Pedrycz W (2014) Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131:200–207CrossRefGoogle Scholar
  50. 50.
    Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29:546–555CrossRefGoogle Scholar
  51. 51.
    Gorelick L, Blank M, Shechtman E et al (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253CrossRefGoogle Scholar
  52. 52.
    Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: In Proceedings of IEEE international conference on computer vision and pattern recognitionGoogle Scholar
  53. 53.
    Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80CrossRefGoogle Scholar
  54. 54.
    Arunnehru J, Chamundeeswari G, Bharathi SP (2018) Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci 133:471–477.  https://doi.org/10.1016/j.procs.2018.07.059 CrossRefGoogle Scholar
  55. 55.
    Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN). pp 463–469Google Scholar
  56. 56.
    Li N, Huang J, Li T et al (2018) Detecting action tubes via spatial action estimation and temporal path inference. Neurocomputing 311:65–77.  https://doi.org/10.1016/j.neucom.2018.05.033 CrossRefGoogle Scholar
  57. 57.
    Dilmen E, Beyhan S (2018) An enhanced online LS-SVM approach for classification problems. Soft Comput 22:4457–4475.  https://doi.org/10.1007/s00500-017-2713-5 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Electrical and Computer EngineeringK.N. Toosi University of TechnologyTehranIran
  2. 2.Department of Computer Engineering, Science and Research BranchIslamic Azad UniversityTehranIran

Personalised recommendations