Semantic Modeling of Textual Relationships in Cross-modal Retrieval

Yu, Jing; Yang, Chenghao; Qin, Zengchang; Yang, Zhuoqian; Hu, Yue; Shi, Zhiguo

doi:10.1007/978-3-030-29551-6_3

Semantic Modeling of Textual Relationships in Cross-modal Retrieval

Jing Yu¹¹,
Chenghao Yang¹²,
Zengchang Qin¹²,
Zhuoqian Yang¹²,
Yue Hu¹¹ &
…
Zhiguo Shi¹³

Conference paper
First Online: 21 August 2019

2715 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Abstract

Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relationships, statistical co-occurrence, and prior relationships in knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal similarity measure is learned by distance metric learning. Experimental results show that, by leveraging the rich relational semantics in texts, our model can outperform the state-of-the-art models by 3.4% on 6.3% in accuracy on two benchmark datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Castrejon, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: CVPR (2016)
Google Scholar
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS, pp. 3837–3845 (2016)
Google Scholar
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Article Google Scholar
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classifcation using graph mining-based feature extraction. Knowl. Based Syst. 23(4), 302–308 (2010)
Article Google Scholar
Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. TMM 17(3), 370–381 (2015)
Google Scholar
Kumar, V.B.G., Carneiro, G., Reid, I.: Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions. In: CVPR, pp. 5385–5394 (2016)
Google Scholar
Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: ECCV, pp. 1908–1917 (2017)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Qin, Z., Yu, J., Cong, Y., Wan, T.: Topic correlation model for cross-modal multimedia information retrieval. Pattern Anal. Appl. 19(4), 1007–1022 (2016)
Article MathSciNet Google Scholar
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: ACMMM, pp. 251–260. ACM (2010)
Google Scholar
Rousseau, F., Vazirgiannis, M.: Graph-of-word and TWIDF: new approach to ad hoc IR. In: CIKM, pp. 59–68 (2013)
Google Scholar
Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. PAMI 38(10), 2010–2023 (2016)
Article Google Scholar
Wang, K., He, R., Wang, W., Wang, L.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)
Google Scholar
Yu, J., et al.: Modeling text with graph convolutional network for cross-modal information retrieval. In: Hong, R., Cheng, W.-H., Yamasaki, T., Wang, M., Ngo, C.-W. (eds.) PCM 2018. LNCS, vol. 11164, pp. 223–234. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00776-8_21
Chapter Google Scholar
Zhang, L., Ma, B., He, J., Li, G., Huang, Q., Tian, Q.: Adaptively unified semi-supervised learning for cross-modal retrieval. In: IJCAI, pp. 3406–3412 (2017)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Key Research and Development Program (Grant No. 2017YFB0803301).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Jing Yu & Yue Hu
Intelligent Computing and Machine Learning Lab, Beihang University, Beijing, China
Chenghao Yang, Zengchang Qin & Zhuoqian Yang
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Zhiguo Shi

Authors

Jing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Chenghao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zengchang Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoqian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zengchang Qin .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J., Yang, C., Qin, Z., Yang, Z., Hu, Y., Shi, Z. (2019). Semantic Modeling of Textual Relationships in Cross-modal Retrieval. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-29551-6_3
Published: 21 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics