Index and Retrieve Multimedia Data: Cross-Modal Hashing by Learning Subspace Relation

  • Luchen Liu
  • Yang YangEmail author
  • Mengqiu Hu
  • Xing Xu
  • Fumin Shen
  • Ning Xie
  • Zi Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)


Hashing methods have been extensively applied to efficient multimedia data indexing and retrieval on account of explosion of multimedia data. Cross-modal hashing usually learns binary codes by mapping multi-modal data into a common Hamming space. Most supervised methods utilize relation information like class labels as pairwise similarities of cross-modal data pair to narrow intra-modal and inter-modal gap. In this paper, we propose a novel supervised cross-modal hashing method dubbed Subspace Relation Learning for Cross-modal Hashing (SRLCH), which exploits relation information in semantic labels to make similar data from different modalities closer in the low-dimension Hamming subspace. SRLCH preserves the discrete constraints and nonlinear structures, while admitting a closed-form binary codes solution, which effectively enhances the training efficiency. An iterative alternative optimization algorithm is developed to simultaneously learn both hash functions and unified binary codes, indexing multimedia data in an efficient way. Evaluations in two cross-modal retrieval tasks on three widely-used datasets show that the proposed SRLCH outperforms most cross-modal hashing methods.


Multimedia index Retrieval Cross-modal hashing Discrete optimization 



This work was supported in part by the National Natural Science Foundation of China under Project 61572108, Project 61632007 and Project 61502081.


  1. 1.
    Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of CVPR, pp. 3594–3601 (2010)Google Scholar
  2. 2.
    Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of CVIR (2009)Google Scholar
  3. 3.
    Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of CVPR, pp. 2083–2090 (2014)Google Scholar
  4. 4.
    Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: Proceedings of CVPR, pp. 817–824 (2011)Google Scholar
  5. 5.
    Gui, J., Liu, T., Sun, Z., Tao, D., Tan, T.: Fast supervised discrete hashing. IEEE TPAMI 40(2), 490–496 (2018)CrossRefGoogle Scholar
  6. 6.
    Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRefGoogle Scholar
  7. 7.
    Hu, M., Yang, Y., Shen, F., Xie, N., Shen, H.T.: Hashing with angular reconstructive embeddings. IEEE TIP 27(2), 545–555 (2018)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Jiang, Q., Li, W.: Deep cross-modal hashing. In: Proceedings of CVPR, pp. 3270–3278 (2017)Google Scholar
  9. 9.
    Kang, Y., Kim, S., Choi, S.: Deep learning to hash with multiple representations. In: Proceedings of ICDM, pp. 930–935 (2012)Google Scholar
  10. 10.
    Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: Proceedings of NIPS, pp. 1042–1050 (2009)Google Scholar
  11. 11.
    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of ICCV, pp. 2130–2137 (2009)Google Scholar
  12. 12.
    Li, K., Qi, G., Ye, J., Hua, K.A.: Linear subspace ranking hashing for cross-modal retrieval. IEEE TPAMI 39(9), 1825–1838 (2017)CrossRefGoogle Scholar
  13. 13.
    Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: Proceedings of CVPR, pp. 3864–3872 (2015)Google Scholar
  14. 14.
    Liu, H., Ji, R., Wu, Y., Hua, G.: Supervised matrix factorization for cross-modality hashing. In: Proceedings of IJCAI, pp. 1767–1773 (2016)Google Scholar
  15. 15.
    Liu, H., Ji, R., Wu, Y., Huang, F., Zhang, B.: Cross-modality binary code learning via fusion similarity hashing. In: Proceedings of CVPR, pp. 6345–6353 (2017)Google Scholar
  16. 16.
    Liu, J., Wang, R., Gao, X., Yang, X., Chen, G.: Anglecut: a ring-based hashing scheme for distributed metadata management. In: Proceedings of DASFAA, pp. 71–86 (2017)Google Scholar
  17. 17.
    Liu, W., Wang, J., Ji, R., Jiang, Y., Chang, S.: Supervised hashing with kernels. In: Proceedings of CVPR, pp. 2074–2081 (2012)Google Scholar
  18. 18.
    Luo, Y., Yang, Y., Shen, F., Huang, Z., Zhou, P., Shen, H.T.: Robust discrete code modeling for supervised hashing. PR 75, 128–135 (2018)Google Scholar
  19. 19.
    McNamara, Q., de la Vega, A., Yarkoni, T.: Developing a comprehensive framework for multimodal feature extraction. In: Proceedings of ACM SIGKDD, pp. 1567–1574 (2017)Google Scholar
  20. 20.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3), 145–175 (2001)CrossRefGoogle Scholar
  21. 21.
    Peng, Y., Huang, X., Zhao, Y.: An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges. CoRR abs/1704.02223 (2017)Google Scholar
  22. 22.
    Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of ACM MM, pp. 251–260 (2010)Google Scholar
  23. 23.
    Rastegari, M., Choi, J., Fakhraei, S., III, H.D., Davis, L.S.: Predictable dual-view hashing. In: Proceedings of ICML, pp. 1328–1336 (2013)Google Scholar
  24. 24.
    Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: Proceedings of CVPR, pp. 37–45 (2015)Google Scholar
  25. 25.
    Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of ACM SIGMOD, pp. 785–796 (2013)Google Scholar
  26. 26.
    Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of ACM MM, pp. 154–162 (2017)Google Scholar
  27. 27.
    Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. CoRR abs/1408.2927 (2014)Google Scholar
  28. 28.
    Wang, J., Zhang, T., Song, J., Sebe, N., Shen, H.T.: A survey on learning to hash. CoRR abs/1606.00185 (2016)Google Scholar
  29. 29.
    Wang, K., Yin, Q., Wang, W., Wu, S., Wang, L.: A comprehensive survey on cross-modal retrieval. CoRR abs/1607.06215 (2016)Google Scholar
  30. 30.
    Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016)CrossRefGoogle Scholar
  31. 31.
    Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE TIP 26(5), 2494–2507 (2017)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Xu, Y., Yang, Y., Shen, F., Xu, X., Zhou, Y., Shen, H.T.: Attribute hashing for zero-shot image retrieval. In: Proceedings of ICME, pp. 133–138 (2017)Google Scholar
  33. 33.
    Yang, Y., Luo, Y., Chen, W., Shen, F., Shao, J., Shen, H.T.: Zero-shot hashing via transferring supervised knowledge. In: Proceedings of ACM MM, pp. 1286–1295 (2016)Google Scholar
  34. 34.
    Yang, Z., Li, Q., Liu, W., Ma, Y.: Learning manifold representation from multimodal data for event detection in flickr-like social media. In: Gao, H., Kim, J., Sakurai, Y. (eds.) DASFAA 2016. LNCS, vol. 9645, pp. 160–167. Springer, Cham (2016). Scholar
  35. 35.
    Yu, Z., Wu, F., Yang, Y., Tian, Q., Luo, J., Zhuang, Y.: Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of ACM SIGIR, pp. 395–404 (2014)Google Scholar
  36. 36.
    Zhang, D., Li, W.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of AAAI, pp. 2177–2183 (2014)Google Scholar
  37. 37.
    Zhen, Y., Yeung, D.: Co-regularized hashing for multimodal data. In: Proceedings of NIPS, pp. 1385–1393 (2012)Google Scholar
  38. 38.
    Zhen, Y., Yeung, D.: A probabilistic model for multimodal hash function learning. In: Proceedings of ACM SIGKDD, pp. 940–948 (2012)Google Scholar
  39. 39.
    Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of ACM SIGIR, pp. 415–424 (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Luchen Liu
    • 1
  • Yang Yang
    • 1
    Email author
  • Mengqiu Hu
    • 1
  • Xing Xu
    • 1
  • Fumin Shen
    • 1
  • Ning Xie
    • 1
  • Zi Huang
    • 2
  1. 1.Center for Future Media and School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  2. 2.School of Information Technology and Electrical EngineeringThe University of QueenslandBrisbaneAustralia

Personalised recommendations