Hidden semantic hashing for fast retrieval over large scale document collection

Zou, Fuhao; Tang, Xiaoman; Li, Kai; Wang, Yunfei; Song, Jingkuan; Yang, Shuangyuan; Ling, Hefei

doi:10.1007/s11042-017-5219-3

Hidden semantic hashing for fast retrieval over large scale document collection

Published: 13 December 2017

Volume 77, pages 3677–3697, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fuhao Zou¹,
Xiaoman Tang¹,
Kai Li¹,
Yunfei Wang¹,
Jingkuan Song²,
Shuangyuan Yang³ &
…
Hefei Ling¹

654 Accesses
6 Citations
Explore all metrics

Abstract

As is well known, the semantics of documents are exposed to us in latent way. However, most existing hashing methods ignore this fact and thus fail to discover the hidden semantic structure. To overcome this issue, we pay more attention to discover its latent semantic structure when hashing for document corpus in this paper. We mainly adopt two measures to discover the hidden structures. On the one hand, the Laplacian graph constructed in semantic space rather than in term-document space is used to capture the semantic structure for document corpus during hashing. On the other hand, motivated by the fact that non-negative matrix factorization (NMF) is an effective algorithm to discover the latent semantic structure for documents, we employ NMF to extract a parts-based representation for document. In addition, to reduce semantic loss when mapping parts-based representation into Hamming space, we impose sparse constraints to make the element of parts-based representation more close to binary values. The experimental results demonstrate that the proposed hashing method is competitive with the state-of-the-art methods in document hashing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Short Text Hashing Improved by Integrating Multi-granularity Topics and Tags

Short Text Hashing Improved by Integrating Topic Features and Tags

Latent Structure Preserving Hashing

Article Open access 20 July 2016

Notes

References

Bentley JL (1990) K-d trees for semidynamic point sets. In: Proceedings of the sixth annual symposium on computational geometry, pp 187–197
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, ICML 2006, vol 148, pp 97–104
Blei D M, Ng A Y, Trevor JMI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
MATH Google Scholar
Cai D, He X, Han J, Huang T S (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell IEEE 33 (8):1548–1560
Article Google Scholar
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst PP(99):1–12
MathSciNet Google Scholar
Chang EY, Zhu K, Wang H, Bai H, Li J, Qiu Z, Cui H (2007) PSVM: parallelizing support vector machines on distributed computers. In: Proceedings of the conference on the advances in neural information processing systems, vol 20, pp 1–8
Chang X, Ma Z, Yang Y, Zeng Z et al (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Article Google Scholar
Chang X, Ma Z, Lin M, Yang Y et al (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Article MathSciNet MATH Google Scholar
Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632
Article Google Scholar
Datar M, Indyk P, Immorlica N, Mirrokni V S (2004) Locality-sensitive hashing scheme based on p-stable distributions.. In: Proceedings of the 20th annual symposium on computational geometry (SCG’04), pp 253–262
Deerwester S C, Dumais S T, Landauer T K, Furnas GW, Harshman R A (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407
Article Google Scholar
Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence. IEEE, pp 342–347
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 12(35):2916–2929
Article Google Scholar
Gonzalez E F, Zhang Y (2015) Accelerating the Lee-Seung algorithm for nonnegative matrix factorization, Department of Computational and Applied Mathematics, Rice University, Houston, Texas 77005, technical report: TR05-02
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. ACM SIGMOD Rec 14(2):47–57
Article Google Scholar
Hoyer P O (2002) Non-negative sparse coding.. In: Proceedings of the 2002 IEEE signal processing society workshop, vol 2002, pp 557–565
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. of 30th STOC. ACM pp 604–613
Jabeen F, Khusro S, Majid A, Rauf A et al (2016) Semantics discovery in social tagging systems: a review. Multimed Tools Appl 75(1):573–605
Article Google Scholar
Jiang Q-Y, Li W-J (2015) Scalable graph hashing with feature transformation. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015), vol 2015, pp 2248–2254
Jiang X, Zhang H, Liu R, Zuo Y (2016) A diversifying hidden units method based on NMF for document representation.. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications, vol 2016, pp 103–107
Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the IEEE international conference on computer vision, pp 2130–2137
Lee H, Battle A, Raina R, Ng A (2006) Efficient sparse coding algorithms, advances in neural information processing systems. NIPGS 401(6755):801–808
Google Scholar
Lei Z, Jialie S, Liang X, Zhiyong C (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern PP(99):1–14
Google Scholar
Li H, Guan Y, Liu L, Wang F et al (2016) Re-ranking for microblog retrieval via multiple graph model. Multimed Tools Appl 75(1):8939–89548
Article Google Scholar
Liang R-Z, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. In: Proceedings of the international conference on pattern recognition, pp 2954–2958
Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw IEEE 18(6):1589–1596
Article Google Scholar
Liu W, Wang J, Kumar S, Chang S-F (2011) Hashing with graphs.. In: Proceedings of the 28th international conference on machine learning (ICML 2011), pp 1–8
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases (VLDB 2007), pp 950–961
Ma Z, Chang X, Yang Y, Sebe N et al (2017) The many shades of negativity. IEEE Trans Multimed 7(19):1558–1568
Article Google Scholar
Nugumanova A, Mansurova M, Baiburin Y, Alimzhanov Y (2017) Using non-negative matrix factorization for text segmentation.. In: Proceedings of the international conference mathematical and information technologies, MIT 2016, vol 1839, pp 233–242
Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the annual ACM-SIAM symposium on discrete algorithms. IEEE, pp 1186–1195
Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reas IET 50(7):213–222
Google Scholar
Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. Proc IEEE Int Conf Comput Vis 2(1):750–757
Article Google Scholar
Seung D, Lee L (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13(1):556–562
Google Scholar
Tatwawadi K, Hernaez M, Ochoa I, WeissmanBentley T (2016) GTRAC: fast retrieval from compressed collections of genomic variants. Bioinformatics 17(32):i479–i486
Article Google Scholar
Wachsmuth E, Oram M W, Perrett D I (1994) Recognition of objects and their component parts Responses of single units in the temporal cortex of the macaque. Cogn Psychol 4(1):509–522
Google Scholar
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, advances in neural information processing systems. NIPS 1753–1760
Xie L, Shen J, Zhu L et al (2016) Online cross-modal hashing for web image retrieval. Proc AAAI 2016:294–300
Google Scholar
Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H (2015) Convolutional neural networks for text hashing.. In: Proceedings of the 24th international joint conference on artificial intelligence, vol 2015, pp 1369–1375
Yang J, Li B, Tian K, Lv Z (2017) A fast image retrieval method designed for network big data. IEEE Trans Indus Inform PP(99):1–1
Google Scholar
Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search.. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010), pp 18–25
Zhang D, Wang J, Cai D, Lu J (2010) Laplacian co-hashing of terms and documents. Adv Inf Retriev Springer XX(01):577–580
Article Google Scholar
Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search.. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI 2016), vol 2016, pp 3959–3965
Zhu L, Shen J, Xie L, Cheng Z et al (2017) Unsupervised visual hashing with semantic assistance for efficient content-based web image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Article Google Scholar

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant No.61672254 and 61300222, Key project of National Natural Science Foundation of China Grant No U1536203, Natural Science Foundation of Hubei Province Grant No.2015CFB687 and Natural Science Foundation of Fujian Province, Grant No. 2015J01288, the Fundamental Research Funds for the Central Universities, HUST:2016YXMS088. The authors appreciate the valuable suggestions from the anonymous reviewers and the Editors.

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Fuhao Zou, Xiaoman Tang, Kai Li, Yunfei Wang & Hefei Ling
School of Engineering and Applied Science, Columbia University, New York, NY, 10027, USA
Jingkuan Song
School of Software Engineering, Xiamen University, Xiamen, 361005, China
Shuangyuan Yang

Authors

Fuhao Zou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoman Tang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingkuan Song
View author publications
You can also search for this author in PubMed Google Scholar
Shuangyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hefei Ling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuhao Zou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, F., Tang, X., Li, K. et al. Hidden semantic hashing for fast retrieval over large scale document collection. Multimed Tools Appl 77, 3677–3697 (2018). https://doi.org/10.1007/s11042-017-5219-3

Download citation

Received: 15 March 2017
Revised: 02 September 2017
Accepted: 08 September 2017
Published: 13 December 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5219-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden semantic hashing for fast retrieval over large scale document collection

Abstract

Access this article

Similar content being viewed by others

Short Text Hashing Improved by Integrating Multi-granularity Topics and Tags

Short Text Hashing Improved by Integrating Topic Features and Tags

Latent Structure Preserving Hashing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hidden semantic hashing for fast retrieval over large scale document collection

Abstract

Access this article

Similar content being viewed by others

Short Text Hashing Improved by Integrating Multi-granularity Topics and Tags

Short Text Hashing Improved by Integrating Topic Features and Tags

Latent Structure Preserving Hashing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation