Skip to main content
Log in

Application of 3D-wavelet statistics to video analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video activity analysis is used in various video applications such as human action recognition, video retrieval, video archiving. In this paper, we propose to apply 3D wavelet transform statistics to natural video signals and employ the resulting statistical attributes for video modeling and analysis. From the 3D wavelet transform, we investigate the marginal and joint statistics as well as the Mutual Information (MI) estimates. We show that marginal histograms are approximated quite well by Generalized Gaussian Density (GGD) functions; and the MI between coefficients decreases when the activity level increases in videos. Joint statistics attributes are applied to scene activity grouping, leading to 87.3% accurate grouping of videos. Also, marginal and joint statistics features extracted from the video are used for human action classification employing Support Vector Machine (SVM) classifiers and 93.4% of the human activities are properly classified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Boashash B (2003) Time-frequency signal analysis and processing: a comprehensive reference. Elsevier Science, Oxford

    Google Scholar 

  2. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267. doi:10.1109/34.910878

    Article  Google Scholar 

  3. Chang CC, Lin CJ (2001) LIBSVM : a library for support vector machines

  4. Chen W, Zhang YJ (2008) Parametric model for video content analysis. Elsevier B.V. Pattern Recogn Lett 29:181–191. doi:10.1016/j.patrec.2007.09.020

    Article  MATH  Google Scholar 

  5. Coudert F, Benois-Pineau J, Le Lann PY, Barba D (1999) Binkey: a system for video content analysis on the fly. In: Proceedings of IEEE Int’l Conf. Multimedia Comput. Syst., 1:679–684

  6. Cover TM, Thomas JA (1991) Elements of information theory. Wiley Interscience, NewYork

    Book  MATH  Google Scholar 

  7. Cunha AL, Do MN, Vetterli M (2007) A stochastic model for video and its information rates. In: Proceedings of the 2007 Data Compression Conference, pp. 3–12

  8. DeVore RA, Lucier BJ (1992) Wavelets. In: Iserles A (ed) Proceedings of Acta Numerica 92. Cambridge University Press, New York, pp 1–56

    Google Scholar 

  9. Do MN (2001) Directional multiresolution image representations. PhD thesis, Swiss Federal Institute of Technology

  10. Do MN, Vetterli M (2000) Texture similarity measurement using Kullback–Leibler distance on wavelet subbands. In Proc. of IEEE Int’l Conf. on Image Processing 3:730–733. doi: 10.1109/ICIP.2000.899558

  11. Duan LY, Xu M, Tian Q, Xu CS, Jesse SJ (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083. doi:10.1109/TMM.2005.858395

    Article  Google Scholar 

  12. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceeding of Int’l Conference on Computer Vision and Pattern, pp. 1–8

  13. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space–time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253. doi:10.1109/TPAMI.2007.70711

    Article  Google Scholar 

  14. Greenspan H, Goldberger J, Mayer A (2004) Probabilistic space-time video modeling via piecewise GMM. IEEE Trans Pattern Anal Mach Intell 26(3):384–396. doi:10.1109/TPAMI.2004.1262334

    Article  Google Scholar 

  15. http://en.wikipedia.org/wiki/Wavelet. Accessed 14 September 2011

  16. http://nsl.cs.sfu.ca/wiki/index.php/Video_Library_and_Tools. Accessed 15 September 2011

  17. http://taco.poly.edu/WaveletSoftware/standard3D.html. Accessed 15 April 2011

  18. http://www.irisa.fr/vista/Equipe/People/Laptev/download.html. Accessed 15 September 2011

  19. http://www.open-video.org. Accessed 15 April 2011

  20. Ikizler N, Cinbis RG, Duygulu P (2008) Human action recognition with line and flow histograms. In: Proceedings of Int’l Conference on Pattern Recognition, pp. 1–4

  21. ITU-R Recommendation BT.500-11 (2002) Methodology for the subjective assessment of the quality of television pictures.

  22. Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of Int’l Conference on Computer Vision, pp. 1–8

  23. Kienzle V, Bakir GH, Franz MO, Schölkopf B (2004) Efficient approximations for support vector machines in object detection. Pattern Recognition, Lecture Notes in Computer Science 3175:54–61. doi:10.1007/978-3-540-28649-3_7

    Article  Google Scholar 

  24. Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428. doi:10.1109/TPAMI.2008.167

    Article  Google Scholar 

  25. Laptev I, Lindeberg T (2004) Local descriptors for spatio-temporal recognition. In: Proceedings of ECCV Workshop, Spatial Coherence for Visual Motion Analysis, pp. 91–103

  26. Laptev I, Lindeberg T (2004) Velocity adaptation of space-time interest points. In Proc. Of In’l Conf. on Pattern Recognition 1:52–56. doi: 10.1109/ICPR.2004.1334003

  27. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of Int’l Conference on Computer Vision and Pattern Recognition, pp. 1–8

  28. Lawrence Raniner RR, Biing-Hwang J (1993) Fundamentals of speech processing. Prentice-Hall International

  29. Li Z, Liu G (2011) Video scene analysis in 3D wavelet transform domain. J Multimed Tool Appl. doi:10.1007/s11042-010-0594-z

  30. Lian S, Sun J, Wang Z (2004) A secure 3D-SPIHT codec. In: Proceedings of European Signal Processing Conference, pp. 813–816

  31. Liu J, Moulin P (2001) Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. IEEE Trans Image Process 10(11):1647–1658. doi:10.1109/83.967393

    Article  MathSciNet  MATH  Google Scholar 

  32. Lu F, Yang X, Lin W, Zhang R, Yu S (2011) Image classification with multiple feature channels. Opt Eng 50(05). doi:10.1117/1.3582852

  33. Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693. doi:10.1109/34.192463

    Article  MATH  Google Scholar 

  34. Meyer Y (1989) Wavelets. In: Combes JM et al (eds) Proceedings of. Springer Verlag, Berlin, pp. 21

  35. Mo X, Wilson R (2004) Video modeling and segmentation using Gaussian Mixture Models. In: Proceedings of the 17th Int’l Conference on Pattern Recognition, ICPR 3:854–857

  36. Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16(3):233–246

    Article  MathSciNet  Google Scholar 

  37. Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision based human motion capture and analysis. Comput Vis Image Understand 104(2–3):90–126. doi:10.1016/j.cviu.2006.08.002

    Article  Google Scholar 

  38. Moulin P, Liu J (1999) Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans Inform Theor 45:909–919. doi:10.1109/18.761332

    Article  MathSciNet  MATH  Google Scholar 

  39. Ngo CW, Pong TC, Zhang HJ (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142. doi:10.1023/A:1020341931699

    Article  MATH  Google Scholar 

  40. Nicolas H, Manaury A, Benois-Pineau J, Dupuy W, Barba D (2004) Grouping video shots into scenes based on 1D mosaic descriptors. In: Proceedings of Int’l Conf. on Image Processing, ICIP, 1:637–640

  41. Niebles JC, Wang H, Fei LF (2008) Unsupervised learning of human action categories using spatial–temporal words. Int J Comput Vis 79(3):299–318. doi:10.1007/s11263-007-0122-4

    Article  Google Scholar 

  42. Oh TH, Besar R (2003) JPEG2000 and JPEG: image quality measures of compressed medical images. In Proceedings of 4th National Conf. on Telecommunication Tech., pp. 31–35

  43. Oikonomopoulos A, Patras I, Pantic M (2006) Spatiotemporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern B Cybern 36(3):710–719. doi:10.1109/TSMCB.2005.861864

    Article  Google Scholar 

  44. Oikonomopoulos A, Pantic M, Patras I (2009) Sparse B-spline polynomial descriptors for human activity recognition. Image Vis Comput 27(12):1814–1825. doi:10.1016/j.imavis.2009.05.010

    Article  Google Scholar 

  45. Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2010) Autoregressive Video Modeling through 2D Wavelet Statistics. In: Proceedings of the IEEE Int’l Conf. on Intelligent Information Hiding and Multimedia Signal Processing 1:272–275. doi: 10.1109/IIHMSP.2010.75

  46. Po DD-Y, Do MN (2003) Directional multiscale statistical modeling of images. Wavelets: Applications in Signal and Image Processing 5207:69–79. doi:10.1117/12.506412

    Article  Google Scholar 

  47. Po DD-Y, Do MN (2006) Directional multiscale modeling of images using the contourlet transform. IEEE Trans Image Process 15(6):1610–1620. doi:10.1109/TIP.2006.873450

    Article  MathSciNet  Google Scholar 

  48. Poppe R (2010) A survey on vision based human action recognition. Image Vis Comput, Elsevier 28(6):976–990. doi:10.1016/j.imavis.2009.11.014

    Article  Google Scholar 

  49. Rajagopalan R, Orchard MT (2002) Synthesizing processed video by filtering temporal relationships. IEEE Trans Image Process 11(1):26–36. doi:10.1109/83.977880

    Article  Google Scholar 

  50. Rapantzikos K, Avrithis YS, Kollias SD (2007) Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition. CIVR 2007, pp. 294–301

  51. Recommendation ITU-R BT 500–6 (1994) Method for the subjective assessment of the quality of television pictures

  52. Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW (2005) The humanID gait challenge problem: data sets, performance, and analysis. IEEE Trans Pattern Anal Mach Intell 27(2):162–177. doi:10.1109/TPAMI.2005.39

    Article  Google Scholar 

  53. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: Proceedings of Int’l Conference on Patter Regocgnition, pp. 32–36

  54. Sharifi K, Leon-Garcia A (1995) Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans Circ Syst Video Tech 5:52–56. doi:10.1109/76.350779

    Article  Google Scholar 

  55. Simoncelli EP, Duccigrossi RW (1997) Embedded wavelet image compression based on a joint property model. In: Proceedings of the IEEE Int’l Conf. On Image Processing 1:640–643. doi: 10.1109/ICIP.1997.647994

  56. Simoncelli EP, Portilla J (1998) Texture characterization via joint statistics of wavelet coefficient magnitudes. In Proc. of IEEE Int’l Conf. on Image Processing 2:62–66. doi: 10.1109/ICIP.1998.723417

  57. Song Y, Goncalves L, Perona P (2003) Unsupervised learning of human motion. IEEE Trans Pattern Anal Mach Intell 25(7):814–827. doi:10.1109/TPAMI.2003.1206511

    Article  Google Scholar 

  58. Sun X, Chen M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In Proc. Of IEEE Int’l Conf. on Computer Vision and Pattern Recognition Workshops 58–65. doi: 10.1109/CVPRW.2009.5204255

  59. Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Tech 18(11):1473–1488. doi:10.1109/TCSVT.2008.2005594

    Article  Google Scholar 

  60. Wong SF, Cipolla R (2007) Extracting spatiotemporal interest points using global information. In: Proceedings of Int’l Conference on Computer Vision, pp. 1–8

  61. Wong SF, Kim TK, Cipolla R (2007) Learning motion categories using both semantic and structural information. In: Proceedings of Int’l Conference on Computer Vision and Pattern Recognition, pp. 1–8

  62. Wouwer GV, Scheunders P, Dyck DV (1999) Statistical texture characterization from discrete wavelet representations. IEEE Trans Image Process 8(4):592–598. doi:10.1109/83.753747

    Article  Google Scholar 

  63. Xu G, Ma YF, Zhang HJ, Yang SQ (2005) HMM-based framework for video semantic analysis. IEEE Trans Circ Syst Video Tech 15(11):1422–1433. doi:10.1109/TCSVT.2005.856903

    Article  Google Scholar 

  64. Zhai Y, Shah M (2006) Video scene segmentation using Markov chain Monte Carlo. IEEE Trans Multimed 8(4):686–697. doi:10.1109/TMM.2006.876299

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Omidyeganeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Omidyeganeh, M., Ghaemmaghami, S. & Shirmohammadi, S. Application of 3D-wavelet statistics to video analysis. Multimed Tools Appl 65, 441–465 (2013). https://doi.org/10.1007/s11042-012-1012-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1012-5

Keywords

Navigation