Semi-supervised discrete hashing for efficient cross-modal retrieval

Abstract

Cross-modal hashing has recently gained significant popularity to facilitate multimedia retrieval across different modalities. Since the acquisition of large-scale labeled training data are very labor intensive, most supervised cross-modal hashing methods are uncompetitive for real applications. With limited label available, this paper presents a novel S emi-S upervised D iscrete H ashing (SSDH) for efficient cross-modal retrieval. In contrast to most semi-supervised cross-modal hashing works that need to predict the label of unlabeled data, our proposed approach groups the labeled and unlabeled data together, and exploits the informative unlabeled data to promote hashing code learning directly. Specifically, the proposed SSDH approach utilizes the relaxed hash representations to characterize each modality, and learns the semi-supervised semantic-preserving regularization to correlate the semantic consistency between the heterogeneous modalities. Accordingly, an efficient objective function is proposed to learn the hash representation, while designing an efficient optimization algorithm to optimize the hash codes for both labeled and unlabeled data. Without sacrificing the retrieval performance, the proposed SSDH method is adaptive to benefit various kinds of retrieval tasks, i.e., unsupervised, semi-supervised and supervised. Experimental results compared with several competitive algorithms show the effectiveness of the proposed method and its superiority over state-of-the-arts.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    http://www.svcl.ucsd.edu/projects/crossmodal/

  2. 2.

    http://press.liacs.nl/mirflickr/

  3. 3.

    https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html

References

  1. 1.

    Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proc. IEEE conference on computer vision and pattern recognition, pp 3594–3601

  2. 2.

    Cao Y, Long M, Wang J, Yang Q, Philip SY (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1445–1454

  3. 3.

    Chen ZD, Li CX, Luo X, Nie L, Xu XS (2019) Scratch: a scalable discrete matrix factorization hashing framework for cross-modal retrieval. IEEE Trans Circuits Sys Video Technol 9(4):1–13

    Google Scholar 

  4. 4.

    Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide:a real-world web image database from national university of singapore. In: Proc. ACM international conference on image and video retrieval, pp 1–9

  5. 5.

    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (2010) Indexing by latent semantic analysis. J Am Soc Inf Sci Technol 41(6):391–407

    Article  Google Scholar 

  6. 6.

    Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proc. IEEE conference on computer vision and pattern recognition, pp 2083–2090

  7. 7.

    Dong X, Sun J, Duan P, Meng L, Tan Y, Wan W, Wu H, Zhang B, Zhang H (2018) Semi-supervised modality-dependent cross-media retrieval. Multimedia Tools & Applications 77(3):3579–3595

    Article  Google Scholar 

  8. 8.

    Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proc. ACM international conference on multimedia information retrieval, pp 39–43

  9. 9.

    Jiang Q, Li W (2019) Discrete latent factor model for cross-modal hashing. IEEE Trans Image Process 28(7):3490–3501

    MathSciNet  Article  Google Scholar 

  10. 10.

    Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3132–3240

  11. 11.

    Li A, Shan S, Chen X, Gao W (2011) Cross-pose face recognition based on partial least squares. Pattern Recogn Lett 32(15):1948–1955

    Article  Google Scholar 

  12. 12.

    Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval

  13. 13.

    Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proc. IEEE conference on computer vision and pattern recognition, pp 3864–3872

  14. 14.

    Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 6345–6353

  15. 15.

    Liu X, Hu Z, Ling H, Cheung Y (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 9(4):1–16

    Google Scholar 

  16. 16.

    Long M, Cao Y, Wang J, Yu PS (2016) Composite correlation quantization for efficient multimodal retrieval. In: Proc. International ACM SIGIR conference on research and development in information retrieval, pp 579–588

  17. 17.

    Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: Proc. IEEE conference on computer vision and pattern recognition, pp 2633–2641

  18. 18.

    Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  19. 19.

    Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proc. IEEE international conference on multimedia, pp 251–260

  20. 20.

    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    MathSciNet  Article  Google Scholar 

  21. 21.

    Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Proc. IEEE conference on computer vision and pattern recognition, pp 2160–2167

  22. 22.

    Shen F, Shen C, Liu W, Shen TH (2015) Supervised discrete hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 37–45

  23. 23.

    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  24. 24.

    Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp 785–796

  25. 25.

    Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    MathSciNet  Article  Google Scholar 

  26. 26.

    Toh KC, Yun S (2010) An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific J Optim 6(615-640):15

    MathSciNet  MATH  Google Scholar 

  27. 27.

    Wang D, Shang B, Wang Q, Wan B (2019) Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation. Multimed Tools Appl 78(17):24167–24185

    Article  Google Scholar 

  28. 28.

    Wang J, Li G, Pan P, Zhao X (2017) Semi-supervised semantic factorization hashing for fast cross-modal retrieval. Multimed Tools Appl 76(19):20197–20215

    Article  Google Scholar 

  29. 29.

    Xu X, Shen F, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26 (5):2494–2507

    MathSciNet  Article  Google Scholar 

  30. 30.

    Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proc. Twenty-eighth AAAI conference on artificial intelligence, pp 2177–2183

  31. 31.

    Zhang J, Peng Y, Yuan M (2018) Sch-gan: Semi-supervised cross-modal hashing by generative adversarial network. IEEE Transactions on Cybernetics

  32. 32.

    Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed 20(1):128–141

    Article  Google Scholar 

  33. 33.

    Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of ACM SIGIR Conference on Research & Development in Information Retrieval, pp 415–424

  34. 34.

    Zou F, Bai X, Luan C, Li K, Wang Y, Ling H (2019) Semi-supervised cross-modal learning for cross modal retrieval and image annotation. World Wide Web 22(2):825–841

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by National Science Foundation of China (Nos. 61673185, 61672444 and 61972167), Quanzhou City Science&Technology Program of China (No. 2018C107R), State Key Laboratory of Integrated Services Networks of Xidian University (No. ISN20-11), Promotion Program for graduate student in Scientific research and innovation ability of Huaqiao University (No. 17013083010).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xin Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Liu, X., Peng, S. et al. Semi-supervised discrete hashing for efficient cross-modal retrieval. Multimed Tools Appl (2020). https://doi.org/10.1007/s11042-020-09195-9

Download citation

Keywords

  • Semi-supervised discrete hashing
  • Cross-modal retrieval
  • Relaxed hash representation
  • Semantic consistency