Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3677–3697 | Cite as

Hidden semantic hashing for fast retrieval over large scale document collection

  • Fuhao Zou
  • Xiaoman Tang
  • Kai Li
  • Yunfei Wang
  • Jingkuan Song
  • Shuangyuan Yang
  • Hefei Ling
Article
  • 375 Downloads

Abstract

As is well known, the semantics of documents are exposed to us in latent way. However, most existing hashing methods ignore this fact and thus fail to discover the hidden semantic structure. To overcome this issue, we pay more attention to discover its latent semantic structure when hashing for document corpus in this paper. We mainly adopt two measures to discover the hidden structures. On the one hand, the Laplacian graph constructed in semantic space rather than in term-document space is used to capture the semantic structure for document corpus during hashing. On the other hand, motivated by the fact that non-negative matrix factorization (NMF) is an effective algorithm to discover the latent semantic structure for documents, we employ NMF to extract a parts-based representation for document. In addition, to reduce semantic loss when mapping parts-based representation into Hamming space, we impose sparse constraints to make the element of parts-based representation more close to binary values. The experimental results demonstrate that the proposed hashing method is competitive with the state-of-the-art methods in document hashing.

Keywords

Semantic hashing Non-negative matrix factorization Laplacian graph Multiplicative update rules 

Notes

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant No.61672254 and 61300222, Key project of National Natural Science Foundation of China Grant No U1536203, Natural Science Foundation of Hubei Province Grant No.2015CFB687 and Natural Science Foundation of Fujian Province, Grant No. 2015J01288, the Fundamental Research Funds for the Central Universities, HUST:2016YXMS088. The authors appreciate the valuable suggestions from the anonymous reviewers and the Editors.

References

  1. 1.
    Bentley JL (1990) K-d trees for semidynamic point sets. In: Proceedings of the sixth annual symposium on computational geometry, pp 187–197Google Scholar
  2. 2.
    Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, ICML 2006, vol 148, pp 97–104Google Scholar
  3. 3.
    Blei D M, Ng A Y, Trevor JMI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(4–5):993–1022MATHGoogle Scholar
  4. 4.
    Cai D, He X, Han J, Huang T S (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell IEEE 33 (8):1548–1560CrossRefGoogle Scholar
  5. 5.
    Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst PP(99):1–12MathSciNetGoogle Scholar
  6. 6.
    Chang EY, Zhu K, Wang H, Bai H, Li J, Qiu Z, Cui H (2007) PSVM: parallelizing support vector machines on distributed computers. In: Proceedings of the conference on the advances in neural information processing systems, vol 20, pp 1–8Google Scholar
  7. 7.
    Chang X, Ma Z, Yang Y, Zeng Z et al (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197CrossRefGoogle Scholar
  8. 8.
    Chang X, Ma Z, Lin M, Yang Y et al (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632CrossRefGoogle Scholar
  10. 10.
    Datar M, Indyk P, Immorlica N, Mirrokni V S (2004) Locality-sensitive hashing scheme based on p-stable distributions.. In: Proceedings of the 20th annual symposium on computational geometry (SCG’04), pp 253–262Google Scholar
  11. 11.
    Deerwester S C, Dumais S T, Landauer T K, Furnas GW, Harshman R A (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407CrossRefGoogle Scholar
  12. 12.
    Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence. IEEE, pp 342–347Google Scholar
  13. 13.
    Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 12(35):2916–2929CrossRefGoogle Scholar
  14. 14.
    Gonzalez E F, Zhang Y (2015) Accelerating the Lee-Seung algorithm for nonnegative matrix factorization, Department of Computational and Applied Mathematics, Rice University, Houston, Texas 77005, technical report: TR05-02Google Scholar
  15. 15.
    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. ACM SIGMOD Rec 14(2):47–57CrossRefGoogle Scholar
  16. 16.
    Hoyer P O (2002) Non-negative sparse coding.. In: Proceedings of the 2002 IEEE signal processing society workshop, vol 2002, pp 557–565Google Scholar
  17. 17.
    Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. of 30th STOC. ACM pp 604–613Google Scholar
  18. 18.
    Jabeen F, Khusro S, Majid A, Rauf A et al (2016) Semantics discovery in social tagging systems: a review. Multimed Tools Appl 75(1):573–605CrossRefGoogle Scholar
  19. 19.
    Jiang Q-Y, Li W-J (2015) Scalable graph hashing with feature transformation. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015), vol 2015, pp 2248–2254Google Scholar
  20. 20.
    Jiang X, Zhang H, Liu R, Zuo Y (2016) A diversifying hidden units method based on NMF for document representation.. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications, vol 2016, pp 103–107Google Scholar
  21. 21.
    Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the IEEE international conference on computer vision, pp 2130–2137Google Scholar
  22. 22.
    Lee H, Battle A, Raina R, Ng A (2006) Efficient sparse coding algorithms, advances in neural information processing systems. NIPGS 401(6755):801–808Google Scholar
  23. 23.
    Lei Z, Jialie S, Liang X, Zhiyong C (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern PP(99):1–14Google Scholar
  24. 24.
    Li H, Guan Y, Liu L, Wang F et al (2016) Re-ranking for microblog retrieval via multiple graph model. Multimed Tools Appl 75(1):8939–89548CrossRefGoogle Scholar
  25. 25.
    Liang R-Z, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. In: Proceedings of the international conference on pattern recognition, pp 2954–2958Google Scholar
  26. 26.
    Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw IEEE 18(6):1589–1596CrossRefGoogle Scholar
  27. 27.
    Liu W, Wang J, Kumar S, Chang S-F (2011) Hashing with graphs.. In: Proceedings of the 28th international conference on machine learning (ICML 2011), pp 1–8Google Scholar
  28. 28.
    Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases (VLDB 2007), pp 950–961Google Scholar
  29. 29.
    Ma Z, Chang X, Yang Y, Sebe N et al (2017) The many shades of negativity. IEEE Trans Multimed 7(19):1558–1568CrossRefGoogle Scholar
  30. 30.
    Nugumanova A, Mansurova M, Baiburin Y, Alimzhanov Y (2017) Using non-negative matrix factorization for text segmentation.. In: Proceedings of the international conference mathematical and information technologies, MIT 2016, vol 1839, pp 233–242Google Scholar
  31. 31.
    Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the annual ACM-SIAM symposium on discrete algorithms. IEEE, pp 1186–1195Google Scholar
  32. 32.
    Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reas IET 50(7):213–222Google Scholar
  33. 33.
    Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. Proc IEEE Int Conf Comput Vis 2(1):750–757CrossRefGoogle Scholar
  34. 34.
    Seung D, Lee L (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13(1):556–562Google Scholar
  35. 35.
    Tatwawadi K, Hernaez M, Ochoa I, WeissmanBentley T (2016) GTRAC: fast retrieval from compressed collections of genomic variants. Bioinformatics 17(32):i479–i486CrossRefGoogle Scholar
  36. 36.
    Wachsmuth E, Oram M W, Perrett D I (1994) Recognition of objects and their component parts Responses of single units in the temporal cortex of the macaque. Cogn Psychol 4(1):509–522Google Scholar
  37. 37.
    Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, advances in neural information processing systems. NIPS 1753–1760Google Scholar
  38. 38.
    Xie L, Shen J, Zhu L et al (2016) Online cross-modal hashing for web image retrieval. Proc AAAI 2016:294–300Google Scholar
  39. 39.
    Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H (2015) Convolutional neural networks for text hashing.. In: Proceedings of the 24th international joint conference on artificial intelligence, vol 2015, pp 1369–1375Google Scholar
  40. 40.
    Yang J, Li B, Tian K, Lv Z (2017) A fast image retrieval method designed for network big data. IEEE Trans Indus Inform PP(99):1–1Google Scholar
  41. 41.
    Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search.. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010), pp 18–25Google Scholar
  42. 42.
    Zhang D, Wang J, Cai D, Lu J (2010) Laplacian co-hashing of terms and documents. Adv Inf Retriev Springer XX(01):577–580Google Scholar
  43. 43.
    Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search.. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI 2016), vol 2016, pp 3959–3965Google Scholar
  44. 44.
    Zhu L, Shen J, Xie L, Cheng Z et al (2017) Unsupervised visual hashing with semantic assistance for efficient content-based web image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Fuhao Zou
    • 1
  • Xiaoman Tang
    • 1
  • Kai Li
    • 1
  • Yunfei Wang
    • 1
  • Jingkuan Song
    • 2
  • Shuangyuan Yang
    • 3
  • Hefei Ling
    • 1
  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.School of Engineering and Applied ScienceColumbia UniversityNew YorkUSA
  3. 3.School of Software EngineeringXiamen UniversityXiamenChina

Personalised recommendations