Perceptual video hashing based on temporal wavelet transform and random projections with application to indexing and retrieval of near-identical videos

  • Sandeep RameshnathEmail author
  • P. K. Bora


A perceptual video hash function extracts a short fixed-length bit string called a perceptual hash on the basis of the visual contents of the video. Such a function should be robust to the content-preserving operations and at the same time, sensitive to the content differences. In this work, the discrete wavelet transform (DWT) along the temporal direction, referred to as the temporal wavelet transform (TWT), is used for generating the temporally informative representative images (TIRIs). The resultant low pass data are projected onto the Achlioptas’s random basis to generate the hash. The TWT and the random projection technique not only reduce the dimensions but also retains the important features. Simulation results show that the proposed algorithm performs better for both the content-preserving and the content changing attacks when compared to that of the existing video hashing algorithms with the added advantage of computational efficiency. The proposed algorithm is applied to the indexing and retrieval of near-identical video application and the performance is evaluated using average precision-recall curves.


Perceptual video hashing Temporal wavelet transform Achlioptas’s random matrix Random projections Near-identical video indexing and retrieval 



  1. 1.
    Achlioptas D (2001) Database-friendly random projections. pp 274–281, ACM PressGoogle Scholar
  2. 2.
    Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687MathSciNetCrossRefGoogle Scholar
  3. 3.
    Adleman L (1978) Two theorems on random polynomial time. In: 1978 19th annual symposium on foundations of computer science, pp 75–83Google Scholar
  4. 4.
    Ailon N, Chazelle B (2006) Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In: Proceedings of the thirty-eighth annual ACM symposium on theory of computing, pp 557–563. ACMGoogle Scholar
  5. 5.
    Ailon N, Chazelle B (2009) The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J Comput 39(1):302–322MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bingham E, Mannila H (2001) Random projection in dimensionality reduction applications to image and text data. ACM, New YorkGoogle Scholar
  7. 7.
    Cichocki A, Zdunek R, Phan AH, Amari SI (2009) Nonnegative matrix and tensor factorizations - applications to exploratory multi-way data analysis and blind source separation. Wiley, New YorkGoogle Scholar
  8. 8.
    Comon P, Luciani X, de Almeida ALF (2009) Tensor decompositions, alternating least squares and other tales. J Chemom 23(7):393–405CrossRefGoogle Scholar
  9. 9.
    Coskun B, Sankur B (2004) Robust video hash extraction. In: 2004. Proceedings of the IEEE 12th signal processing and communications applications conference, pp 292–295Google Scholar
  10. 10.
    Coskun B, Sankur B, Memon N (2006) Spatio-temporal transform based video hashing. IEEE Trans Multimedia 8(6):1190–1208CrossRefGoogle Scholar
  11. 11.
    Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings 40th annual IEEE symposium foundations of computer science, pp 634–644Google Scholar
  12. 12.
    Dasgupta S, Gupta A (2002) An elementary proof of the Johnson-Lindenstrauss lemma. Random Struct Algoritm 22:60–65CrossRefGoogle Scholar
  13. 13.
    De Roover C, De Vleeschouwer C, Lefebvre F, Macq B (2005) Robust video hashing based on radial projections of key frames. IEEE Trans Signal Process 53 (10):4020–4037MathSciNetCrossRefGoogle Scholar
  14. 14.
    De Roover C, De Vleeschouwer C, Lefèbvre F, Macq BM (2005) Robust image hashing based on radial variance of pixels. In: ICIP (3), pp 77–80Google Scholar
  15. 15.
    Dietzfelbinger M (2004) Primality testing in polynomial time: from randomized algorithms to PRIMES is in p. LNCS 3000 SpringerGoogle Scholar
  16. 16.
    Dittmann J, Steinmetz A, Steinmetz R (1999) Content-based digital signature for motion pictures authentication and content-fragile watermarking. In: 1999 IEEE international conference on multimedia computing and systems, vol 2, pp 209 –213Google Scholar
  17. 17.
    Esmaeili MM, Fatourechi M, Ward RK (2011) A robust and fast video copy detection system using content-based fingerprinting. IEEE Trans Inf Forensics Secur 6(1):213–226CrossRefGoogle Scholar
  18. 18.
    Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874MathSciNetCrossRefGoogle Scholar
  19. 19.
    Fei M, Li J, Liu H (2015) Visual tracking based on improved foreground detection and perceptual hashing. Neurocomputing 152:413–428CrossRefGoogle Scholar
  20. 20.
    Fei M, Li J, Shao L, Ju Z, Ouyang G (2015) Robust visual tracking based on improved perceptual hashing for robot vision. In: International conference on intelligent robotics and applications, pp 331–340. SpringerGoogle Scholar
  21. 21.
    Fei M, Zhaojie J, Zhen X, Li J (2017) Real-time visual tracking based on improved perceptual hashing. Multimed Tools Appl 76(3):4617–4634CrossRefGoogle Scholar
  22. 22.
    Gill JT III (1974) Computational complexity of probabilistic turing machines. In: Proceedings of the sixth annual ACM symposium on theory of computing, STOC’74, pp 91–95. ACM, New YorkGoogle Scholar
  23. 23.
    Hamon K, Schmucker M, Zhou X (2006) Histogram-based perceptual hashing for minimally changing video sequencesGoogle Scholar
  24. 24.
    Han S-H, Chu C-H (2010) Content-based image authentication: current status, issues, and challenges. Int J Inf Secur 9:19–32. CrossRefGoogle Scholar
  25. 25.
    Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC’98, pp 604–613. ACMGoogle Scholar
  26. 26.
    Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Bellow A, Beals R, Beck A, Hajian A (eds) Contemporary Mathematics Proceedings of the conference on Modern Analysis and Probability, vol 26, pp 189–206Google Scholar
  27. 27.
    Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500MathSciNetCrossRefGoogle Scholar
  28. 28.
    Lan X, Ma AJ, Yuen PC (2014) Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1194–1201Google Scholar
  29. 29.
    Lan X, Ma AJ, Yuen PC, Chellappa R (2015) Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans Image Process 24(12):5826–5841MathSciNetCrossRefGoogle Scholar
  30. 30.
    Lan X, Zhang S, Yuen PC (2016) Robust joint discriminative feature learning for visual tracking. In: IJCAI, pp 3403–3410Google Scholar
  31. 31.
    Lan X, Yuen PC, Chellappa R (2017) Robust mil-based feature template learning for object tracking. In: AAAI, pp 4118–4125Google Scholar
  32. 32.
    Lan X, Zhang S, Yuen PC, Chellappa R (2018) Learning common and feature-specific patterns: a novel multiple-sparse-representation-based tracker. IEEE Trans Image Process 27(4):2022–2037MathSciNetCrossRefGoogle Scholar
  33. 33.
    Lan X, Ye M, Zhang S, Yuen PC (2018) Robust collaborative discriminative learning for rgb-infrared tracking. In: AAAI, pp 7008–7015Google Scholar
  34. 34.
    Lee S, Yoo CD (2008) Robust video fingerprinting based on affine covariant regions. In: 2008. ICASSP 2008. IEEE international conference on acoustics, speech and signal processing, pp 1237–1240Google Scholar
  35. 35.
    Lee S, Yoo CD (2008) Robust video fingerprinting for content-based video identification. IEEE Trans Circuits Syst Video Technol 18(7):983–988CrossRefGoogle Scholar
  36. 36.
    Li M, Monga V (2011) Desynchronization resilient video fingerprinting via randomized, low-rank tensor approximations. In: 2011 IEEE 13th international workshop on multimedia signal processing (MMSP), pp 1–6Google Scholar
  37. 37.
    Li M, Monga V (2012) Robust video hashing via multilinear subspace projections. IEEE Trans Image Process 21(10):4397–4409MathSciNetCrossRefGoogle Scholar
  38. 38.
    Lv X, Wang ZJ (2008) Fast Johnson-Lindenstrauss transform for robust and secure image hashing. In: 2008 IEEE 10th workshop on multimedia signal processing, pp 725–729Google Scholar
  39. 39.
    Lv Xudong, Wang ZJ (2009) An extended image hashing concept: content-based fingerprinting using FJLT. EURASIP J Inf Secur 2009:2:1–2:16Google Scholar
  40. 40.
    Lv X, Wang ZJ (2012) Perceptual image hashing based on shape contexts and local feature points. IEEE Trans Inf Forensics Secur PP(99):1Google Scholar
  41. 41.
    Ma C, Liu C, Peng F, Liu J (2016) Multi-feature hashing tracking. Pattern Recogn Lett 69:62–71CrossRefGoogle Scholar
  42. 42.
    Mani M, Mehrdad F, Rabab KW (2009) Video copy detection using temporally informative representative images. In: Fourth international conference on machine learning and applications, pp 69–74Google Scholar
  43. 43.
    Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, New yorkCrossRefGoogle Scholar
  44. 44.
    Monga V (2005) Perceptually based methods for robust image hashing. Phd thesis The University of Texas at AustinGoogle Scholar
  45. 45.
    Monga V, Mhcak MK (2007) Robust and secure image hashing via non-negative matrix factorizations. IEEE Trans Inf Forensics Secur 2(3):376–390CrossRefGoogle Scholar
  46. 46.
    Oseledets IV, Savostyanov DV, Tyrtyshnikov EE (2008) Tucker dimensionality reduction of three-dimensional arrays in linear time. SIAM J Matrix Anal Appl 30 (3):939–956MathSciNetCrossRefGoogle Scholar
  47. 47.
    Said BAE, Hadmi A, Puech W, Ouahman AA (2012) Perceptual image hashing, watermarking - volume 2. InTechGoogle Scholar
  48. 48.
    Saikia N, Bora PK (2007) Video authentication using temporal wavelet transform. In: 2012 18th international conference on advanced computing and communications (ADCOM), pp 648–653Google Scholar
  49. 49.
    Sandeep R, Bora PK (2013) Perceptual video hashing based on the Achlioptas’s random projections. In: 2013 4th national conference on computer vision pattern recognition, image processing and graphics (NCVPRIPG), pp 1–4Google Scholar
  50. 50.
    Sandeep R, Sharma S, Thakur M, Bora PK (2016) Perceptual video hashing based on Tucker decomposition with application to indexing and retrieval of near-identical videos. Multimed Tools Appl 75(13):7779–7797CrossRefGoogle Scholar
  51. 51.
    Singhal A (2001) Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Committee Data Eng 24(4):35–43Google Scholar
  52. 52.
    Stephane G, Mallat A (1989) Theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans Pattern Anal Mach Intell 2(7):674–693zbMATHGoogle Scholar
  53. 53.
    (2012) Test video sequencesGoogle Scholar
  54. 54.
    (2016) Test video sequencesGoogle Scholar
  55. 55.
    Vaidyanathan PP (1993) Multirate Systems and Filter Banks. Prentice-Hall, Inc., Upper Saddle River, NJ, USAGoogle Scholar
  56. 56.
    Venkatesan R, Koon S-M, Jakubowski MH, Moulin P (2000) Robust image hashing. In: 2000 Proceedings. 2000 international conference on image processing, vol 3, pp 664–666Google Scholar
  57. 57.
    Yang Z, Jia D, Ioannidis S, Mi N, Sheng B (2018) Intermediate data caching optimization for multi-stage and parallel big data frameworks. arXiv:1804.10563
  58. 58.
    Zhou B, Yao Y (2010) Evaluating information retrieval system performance based on user preference. J Intell Inf Syst 34(3):227–248CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringVidyavardhaka College of EngineeringMysuruIndia
  2. 2.Department of Electronics and Electrical EngineeringIndian Institute of TechnologyGuwahatiIndia

Personalised recommendations