Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval
Cross-modal hashing is a method which projects heterogeneous multimedia data into a common low-dimensional latent space. Many methods based on hash codes try to keep the relationship between text and corresponding image, and relax the original discrete learning problem into a continuous learning problem. However, these methods may produce ineffective hash codes since they do not make full use of the relationship between different modalities and simply relax the discrete binary constraint into a continuous problem. Collective matrix factorization (CMF) has achieved impressive results in mining semantic concepts or latent topics from image/text. In this paper, we propose a new supervised learning framework which unifies CMF method that maximizes the correlation between two modalities and discrete cyclic coordinate descent (DCC) method that solves NP-hard problems, which ensures that the hash codes generated in the cross-modal are more accurate and efficient. Experiments on three benchmark data sets show the effectiveness of the proposed method.
KeywordsCollective matrix factorization Discrete cyclic coordinate descent Cross-modal retrieval
This work was supported in part by the Natural Science Foundation of China under Grant 61772141, in part by the Guangdong Provincial Natural Science Foundation, under Grant 17ZK0422, and in part by the Guangzhou Science and Technology Planning Project under Grants 201804010347.
- 1.Liu, W., Kumar, S., Kumar, S., et al.: Discrete graph hashing. In: International Conference on Neural Information Processing Systems. pp. 3419–3427 (2014)Google Scholar
- 2.Lin, Z., Ding, G., Hu, M., et al.: Semantics-preserving hashing for cross-view retrieval. 3864–3872 (2015)Google Scholar
- 3.Wang, J., Li, G., Pan, P., et al.: Semi-supervised semantic factorization hashing for fast cross-modal retrieval. Multimedia Tools Appl. 3, 1–19 (2017)Google Scholar
- 7.Shen, F., Shen, C., Liu, W., et al.: Supervised discrete hashing. In: pp. 37–45 (2015)Google Scholar
- 8.Bouchard, G., Guo, S., Yin, D.: Convex collective matrix factorization. In: 144–152 (2013)Google Scholar
- 9.Rasiwasia, N., Costa, P.J., Coviello, E.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)Google Scholar
- 10.Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hashfunction learning. In: On knowledge Discovery and Data Mining. ACM, pp. 940–948 (2012)Google Scholar
- 11.Chua, T.-S., Tang, J., Hong, R. H.: Nus-wide: A real-world web image database from national university of singapore. In: International Conference on Image and Video Retrieval 48, vol. 9, pp. 1–48 (2009)Google Scholar
- 12.Zhou, J.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the In ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–424 (2014)Google Scholar
- 13.Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2014)Google Scholar
- 14.Zhen, Y., Yeung, D.: Co-regularized hashing for multimodal data. In: Neural Information Processing Systems, pp. 1385–1393 (2012)Google Scholar
- 15.Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI Conference on Artificial Intelligence (2014)Google Scholar