Skip to main content

Deep Learning Generic Features for Cross-Media Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

Abstract

Cross-media retrieval is an imperative approach to handle the explosive growth of multimodal data on the web. However, how to effectively uncover the correlations between multimodal data has been a barrier to successful retrieval of cross-media data. The traditional approaches learn the connection between multiple modalities by direct utilization of hand-crafted low-level heterogeneous features and the learned correlation are merely constructed in terms of high-level feature representation. To well exploit the intrinsic structures of multimodal data, it is essential to build up an interpretable correlation between multimodal data. In this paper, we propose a deep model to learn the high-level feature representation shared by multiple modalities for cross-media retrieval. We learn the discriminative high-level feature representation in a data-driven manner before faithfully encoding the multimodal correlations. We use the large-scale multimodal data crawled from Internet to train our deep model and evaluate its effectiveness on cross-media retrieval based on NUS-WIDE dataset. The experimental results show that the proposed model outperforms other state-of-the-arts approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrew, G., Arora, R., Bilmes, J., Livescu, K.: Deep canonical correlation analysis. In: ICML (2013)

    Google Scholar 

  2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. In: NIPS (2007)

    Google Scholar 

  3. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y. Nus-wide: a real-world web image database from national university of singapore. In: CIVR (2009)

    Google Scholar 

  4. Dong, J., Cheng, B., Chen, X., Chua, T.-S., Yan, S., Zhou, X.: Robust image annotation via simultaneous feature and sample outlier pursuit. In: TOMCCAP (2013)

    Google Scholar 

  5. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)

    Article  MATH  Google Scholar 

  6. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: NIPS (2009)

    Google Scholar 

  8. Jia, Y., Salzmann, M., Darrell, T.: Factorized latent spaces with structured sparsity. In: NIPS (2010)

    Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  10. Le, Q.V., Karpenko, A., Ngiam, J., Ng, A.Y.: Ica with reconstruction cost for efficient overcomplete feature learning. In: NIPS (2011)

    Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  12. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)

    Google Scholar 

  13. Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACMMM (2010)

    Google Scholar 

  14. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 1–42 (2014)

    Google Scholar 

  15. Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory (1986)

    Google Scholar 

  16. Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. In: NIPS (2012)

    Google Scholar 

  17. Tieleman, T.: Training restricted boltzmann machines using approximations to the likelihood gradient. In: ICML (2008)

    Google Scholar 

  18. Vincent, P., Larochelle, H., Bengio, Y., Manzagol,P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)

    Google Scholar 

  19. Wu, F., Lu, X., Zhang, Z., Yan, S., Rui, Y., Zhuang, Y.: Cross-media semantic representation via bi-directional learning to rank. In: ACMMM (2013)

    Google Scholar 

  20. Yuan, Z., Sang, J., Liu, Y., Xu, C.: Latent feature learning in social media network. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 253–262. ACM (2013)

    Google Scholar 

  21. Zhang, Y., Schneider, J.G.: Multi-label output codes using canonical correlation analysis. In: AI Statistics (2011)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the Singapore National Research Foundation under its IRC@Singapore Funding Initiative and administered by IDMPO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xindi Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Shang, X., Zhang, H., Chua, TS. (2016). Deep Learning Generic Features for Cross-Media Retrieval. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27671-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27670-0

  • Online ISBN: 978-3-319-27671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics