Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

  • Baodong Tang
  • Xiaozhao FangEmail author
  • Shaohua Teng
  • Wei Zhang
  • Peipei Kang
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 849)


Cross-modal hashing is a method which projects heterogeneous multimedia data into a common low-dimensional latent space. Many methods based on hash codes try to keep the relationship between text and corresponding image, and relax the original discrete learning problem into a continuous learning problem. However, these methods may produce ineffective hash codes since they do not make full use of the relationship between different modalities and simply relax the discrete binary constraint into a continuous problem. Collective matrix factorization (CMF) has achieved impressive results in mining semantic concepts or latent topics from image/text. In this paper, we propose a new supervised learning framework which unifies CMF method that maximizes the correlation between two modalities and discrete cyclic coordinate descent (DCC) method that solves NP-hard problems, which ensures that the hash codes generated in the cross-modal are more accurate and efficient. Experiments on three benchmark data sets show the effectiveness of the proposed method.


Collective matrix factorization Discrete cyclic coordinate descent Cross-modal retrieval 



This work was supported in part by the Natural Science Foundation of China under Grant 61772141, in part by the Guangdong Provincial Natural Science Foundation, under Grant 17ZK0422, and in part by the Guangzhou Science and Technology Planning Project under Grants 201804010347.


  1. 1.
    Liu, W., Kumar, S., Kumar, S., et al.: Discrete graph hashing. In: International Conference on Neural Information Processing Systems. pp. 3419–3427 (2014)Google Scholar
  2. 2.
    Lin, Z., Ding, G., Hu, M., et al.: Semantics-preserving hashing for cross-view retrieval. 3864–3872 (2015)Google Scholar
  3. 3.
    Wang, J., Li, G., Pan, P., et al.: Semi-supervised semantic factorization hashing for fast cross-modal retrieval. Multimedia Tools Appl. 3, 1–19 (2017)Google Scholar
  4. 4.
    Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12) (2014)CrossRefGoogle Scholar
  5. 5.
    Cui, Y., Jiang, J., Lai, Z., et al.: Supervised discrete discriminant hashing for image retrieval. In: Pattern Recognition (2018)CrossRefGoogle Scholar
  6. 6.
    Xu, X., Shen, F., Yang, Y., et al.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 99, 1 (2017)MathSciNetGoogle Scholar
  7. 7.
    Shen, F., Shen, C., Liu, W., et al.: Supervised discrete hashing. In: pp. 37–45 (2015)Google Scholar
  8. 8.
    Bouchard, G., Guo, S., Yin, D.: Convex collective matrix factorization. In: 144–152 (2013)Google Scholar
  9. 9.
    Rasiwasia, N., Costa, P.J., Coviello, E.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)Google Scholar
  10. 10.
    Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hashfunction learning. In: On knowledge Discovery and Data Mining. ACM, pp. 940–948 (2012)Google Scholar
  11. 11.
    Chua, T.-S., Tang, J., Hong, R. H.: Nus-wide: A real-world web image database from national university of singapore. In: International Conference on Image and Video Retrieval 48, vol. 9, pp. 1–48 (2009)Google Scholar
  12. 12.
    Zhou, J.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the In ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–424 (2014)Google Scholar
  13. 13.
    Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2014)Google Scholar
  14. 14.
    Zhen, Y., Yeung, D.: Co-regularized hashing for multimodal data. In: Neural Information Processing Systems, pp. 1385–1393 (2012)Google Scholar
  15. 15.
    Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI Conference on Artificial Intelligence (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Baodong Tang
    • 1
  • Xiaozhao Fang
    • 1
    Email author
  • Shaohua Teng
    • 1
  • Wei Zhang
    • 1
  • Peipei Kang
    • 1
  1. 1.School of Computer Science and TechnologyGuangdong University of TechnologyGuangzhouChina

Personalised recommendations