Skip to main content

Domain Invariant Subspace Learning for Cross-Modal Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

Abstract

Due to the rapid growth of multimodal data, cross-modal retrieval has drawn growing attention in recent years, which aims to take one type of data as the query to retrieve relevant data of another type. To enable directly matching between different modalities, the key issue in cross-modal retrieval is to eliminate the heterogeneity between modalities. A bundle of existing approaches directly project the samples of multimodal data into a common latent subspace with the supervision of class label information, and different samples within the same class contribute uniformly to the subspace construction. However, the subspace constructed by these methods may not reveal the true importance of each sample as well as the discrimination of different class label. To tackle this problem, in this paper we regard different modalities as different domains and propose a Domain Invariant Subspace Learning (DISL) method to associate multimodal data. Specifically, DISL simultaneously minimize the classification error with sample-wise weighting coefficients and preserve the structure similarity within and across modalities with the graph regularization. Therefore, the subspace learned by DISL can well reflect the sample-wise importance and capture the discrimination of different class labels in multi-modal data. Compared with several state-of-the-art algorithms, extensive experiments on three public datasets demonstrate the superiority of the proposed method for cross-modal retrieval tasks such as image-to-text and text-to-image.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shen, X., Shen, F., Sun, Q., Yang, Y., Yuan, Y., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. In: TCYB (2016)

    Google Scholar 

  2. He, L., Xu, X., Lu, H., Yang, Y., Shen, F., Shen, H.T.: Unsupervised cross modal retrieval through adversarial learning. In: ICML (2017)

    Google Scholar 

  3. Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM MM, pp. 33–42 (2013)

    Google Scholar 

  4. Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.: Discrete collaborative filtering. In: SIGIR, pp. 325–334 (2016)

    Google Scholar 

  5. Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10, 221–229 (2008)

    Article  Google Scholar 

  6. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10, 437–446 (2008)

    Article  Google Scholar 

  7. Xu, X., Shimada, A., Taniguchi, R., He, L.: Coupled dictionary learning and feature mapping for cross-modal retrieval. In: ICME, pp. 1–6 (2015)

    Google Scholar 

  8. Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proces. 26, 2494–2507 (2017)

    Article  MathSciNet  Google Scholar 

  9. Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)

    Article  MATH  Google Scholar 

  10. Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015)

    Google Scholar 

  11. Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)

    Article  Google Scholar 

  12. Hardoon, D.R., Shawe-Taylor, J.: KCCA for different level precision in content-based image retrieval. In: CIBM (2003)

    Google Scholar 

  13. Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2010–2023 (2016)

    Article  Google Scholar 

  14. Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012)

    Google Scholar 

  15. Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)

    Google Scholar 

  16. Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2231–2239 (2012)

    Google Scholar 

  17. Mignon, A., Jurie, F.: CMML: a new metric learning approach for cross modal matching, pp. 1–14 (2012)

    Google Scholar 

  18. Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: ICML, 425–432 (2011)

    Google Scholar 

  19. Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)

    Google Scholar 

  20. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)

    Article  Google Scholar 

  22. Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1145–1158 (2012)

    Article  Google Scholar 

  23. Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. CoRR (2016)

    Google Scholar 

  24. Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: MM, pp. 251–260 (2010)

    Google Scholar 

  25. Li, A., Shan, S., Chen, X., Gao, W.: Cross-pose face recognition based on partial least squares. Pattern Recogn. Lett. 32, 1948–1955 (2011)

    Article  Google Scholar 

  26. Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. In: Neural Computation, pp. 1247–1283 (2000)

    Google Scholar 

  27. Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)

    Google Scholar 

  28. Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: MM, pp. 604–611 (2003)

    Google Scholar 

  29. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM, pp. 7–16 (2014)

    Google Scholar 

  30. Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol. 24, 965–978 (2014)

    Article  Google Scholar 

  31. Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia, 17, 370–381 (2015)

    Google Scholar 

  32. Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. IJCA I, 3846–3853 (2016)

    Google Scholar 

Download references

Acknowledgments

This paper is partially supported by NSFC grants No. 61602089, 61572108, 61632007, 61673088; Fundamental Research Funds for Central Universities ZYGX2014Z007 and ZYGX2016KYQD114; LEADER of MEXT-Japan (16809746), The Telecommunications Foundation, REDAS and SCAT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Xu, X., Yang, Y., Lu, H., Shen, F., Ji, Y. (2018). Domain Invariant Subspace Learning for Cross-Modal Retrieval. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73600-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73599-3

  • Online ISBN: 978-3-319-73600-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics