Domain Invariant Subspace Learning for Cross-Modal Retrieval

Liu, Chenlu; Xu, Xing; Yang, Yang; Lu, Huimin; Shen, Fumin; Ji, Yanli

doi:10.1007/978-3-319-73600-6_9

Chenlu Liu²¹,
Xing Xu²¹,
Yang Yang²¹,
Huimin Lu²²,
Fumin Shen²¹ &
…
Yanli Ji²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

International Conference on Multimedia Modeling

2938 Accesses
2 Citations

Abstract

Due to the rapid growth of multimodal data, cross-modal retrieval has drawn growing attention in recent years, which aims to take one type of data as the query to retrieve relevant data of another type. To enable directly matching between different modalities, the key issue in cross-modal retrieval is to eliminate the heterogeneity between modalities. A bundle of existing approaches directly project the samples of multimodal data into a common latent subspace with the supervision of class label information, and different samples within the same class contribute uniformly to the subspace construction. However, the subspace constructed by these methods may not reveal the true importance of each sample as well as the discrimination of different class label. To tackle this problem, in this paper we regard different modalities as different domains and propose a Domain Invariant Subspace Learning (DISL) method to associate multimodal data. Specifically, DISL simultaneously minimize the classification error with sample-wise weighting coefficients and preserve the structure similarity within and across modalities with the graph regularization. Therefore, the subspace learned by DISL can well reflect the sample-wise importance and capture the discrimination of different class labels in multi-modal data. Compared with several state-of-the-art algorithms, extensive experiments on three public datasets demonstrate the superiority of the proposed method for cross-modal retrieval tasks such as image-to-text and text-to-image.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shen, X., Shen, F., Sun, Q., Yang, Y., Yuan, Y., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. In: TCYB (2016)
Google Scholar
He, L., Xu, X., Lu, H., Yang, Y., Shen, F., Shen, H.T.: Unsupervised cross modal retrieval through adversarial learning. In: ICML (2017)
Google Scholar
Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM MM, pp. 33–42 (2013)
Google Scholar
Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.: Discrete collaborative filtering. In: SIGIR, pp. 325–334 (2016)
Google Scholar
Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10, 221–229 (2008)
Article Google Scholar
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10, 437–446 (2008)
Article Google Scholar
Xu, X., Shimada, A., Taniguchi, R., He, L.: Coupled dictionary learning and feature mapping for cross-modal retrieval. In: ICME, pp. 1–6 (2015)
Google Scholar
Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proces. 26, 2494–2507 (2017)
Article MathSciNet Google Scholar
Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)
Article MATH Google Scholar
Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015)
Google Scholar
Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)
Article Google Scholar
Hardoon, D.R., Shawe-Taylor, J.: KCCA for different level precision in content-based image retrieval. In: CIBM (2003)
Google Scholar
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2010–2023 (2016)
Article Google Scholar
Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012)
Google Scholar
Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)
Google Scholar
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2231–2239 (2012)
Google Scholar
Mignon, A., Jurie, F.: CMML: a new metric learning approach for cross modal matching, pp. 1–14 (2012)
Google Scholar
Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: ICML, 425–432 (2011)
Google Scholar
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)
Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2013)
Article MathSciNet MATH Google Scholar
Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)
Article Google Scholar
Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1145–1158 (2012)
Article Google Scholar
Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. CoRR (2016)
Google Scholar
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: MM, pp. 251–260 (2010)
Google Scholar
Li, A., Shan, S., Chen, X., Gao, W.: Cross-pose face recognition based on partial least squares. Pattern Recogn. Lett. 32, 1948–1955 (2011)
Article Google Scholar
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. In: Neural Computation, pp. 1247–1283 (2000)
Google Scholar
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)
Google Scholar
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: MM, pp. 604–611 (2003)
Google Scholar
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM, pp. 7–16 (2014)
Google Scholar
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol. 24, 965–978 (2014)
Article Google Scholar
Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia, 17, 370–381 (2015)
Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. IJCA I, 3846–3853 (2016)
Google Scholar

Download references

Acknowledgments

This paper is partially supported by NSFC grants No. 61602089, 61572108, 61632007, 61673088; Fundamental Research Funds for Central Universities ZYGX2014Z007 and ZYGX2016KYQD114; LEADER of MEXT-Japan (16809746), The Telecommunications Foundation, REDAS and SCAT.

Author information

Authors and Affiliations

Center for Future Media & School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Chenlu Liu, Xing Xu, Yang Yang & Fumin Shen
Kyushu Institute of Technology, Kitakyushu, Japan
Huimin Lu
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yanli Ji

Authors

Chenlu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Fumin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yanli Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xing Xu .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Xu, X., Yang, Y., Lu, H., Shen, F., Ji, Y. (2018). Domain Invariant Subspace Learning for Cross-Modal Retrieval. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-73600-6_9
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics