Abstract
With the explosive growth of multimedia data, different types of media data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation so as to improve the retrieval results from cross-media data. However, how to effectively discover the correlations between multi-modal data has been a barrier to successful retrieval of cross-media information. To address the above problems, we propose a novel model projecting both the text modality and the visual modality into a common semantic feature space with the convolutional neural network feature. Unlike the existing approaches, the proposed model learns the high-level feature representation shared by multiple modalities for cross-media information retrieval. Experiments are conducted on public benchmark dataset, and results show the effectiveness of our approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, pp. 127-134 (2003)
Pereira, J.C., Coviello, E., Doyle, G., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Frome, A., Corrado, G.S., Shlens, J., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)
Socher, R., Karpathy, A., Le, Q.V., et al.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)
Karpathy, A., Joulin, A., Li, F.F.: Deep fragment embeddings for bidirectional image sentence mapping. Advances in Neural Information Processing Systems, pp. 1889–1897 (2014)
Young, P., Lai, A., Hodosh, M., et al.: From image descriptions to visual denotations: new simi-larity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management. ACM, pp. 2333–2338 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bai, L., Yu, T., Guo, J., Yang, Z., Xie, Y. (2016). Cross-Media Information Retrieval with Deep Convolutional Neural Network. In: Gong, M., Pan, L., Song, T., Zhang, G. (eds) Bio-inspired Computing – Theories and Applications. BIC-TA 2016. Communications in Computer and Information Science, vol 681. Springer, Singapore. https://doi.org/10.1007/978-981-10-3611-8_34
Download citation
DOI: https://doi.org/10.1007/978-981-10-3611-8_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3610-1
Online ISBN: 978-981-10-3611-8
eBook Packages: Computer ScienceComputer Science (R0)