Skip to main content

Cross-Media Information Retrieval with Deep Convolutional Neural Network

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 681))

Abstract

With the explosive growth of multimedia data, different types of media data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation so as to improve the retrieval results from cross-media data. However, how to effectively discover the correlations between multi-modal data has been a barrier to successful retrieval of cross-media information. To address the above problems, we propose a novel model projecting both the text modality and the visual modality into a common semantic feature space with the convolutional neural network feature. Unlike the existing approaches, the proposed model learns the high-level feature representation shared by multiple modalities for cross-media information retrieval. Experiments are conducted on public benchmark dataset, and results show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, pp. 127-134 (2003)

    Google Scholar 

  2. Pereira, J.C., Coviello, E., Doyle, G., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)

    Article  Google Scholar 

  3. Frome, A., Corrado, G.S., Shlens, J., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)

    Google Scholar 

  4. Socher, R., Karpathy, A., Le, Q.V., et al.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)

    Google Scholar 

  5. Karpathy, A., Joulin, A., Li, F.F.: Deep fragment embeddings for bidirectional image sentence mapping. Advances in Neural Information Processing Systems, pp. 1889–1897 (2014)

    Google Scholar 

  6. Young, P., Lai, A., Hodosh, M., et al.: From image descriptions to visual denotations: new simi-larity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)

    Google Scholar 

  7. Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management. ACM, pp. 2333–2338 (2013)

    Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyuan Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Bai, L., Yu, T., Guo, J., Yang, Z., Xie, Y. (2016). Cross-Media Information Retrieval with Deep Convolutional Neural Network. In: Gong, M., Pan, L., Song, T., Zhang, G. (eds) Bio-inspired Computing – Theories and Applications. BIC-TA 2016. Communications in Computer and Information Science, vol 681. Springer, Singapore. https://doi.org/10.1007/978-981-10-3611-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3611-8_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3610-1

  • Online ISBN: 978-981-10-3611-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics