Multimodal Neural Machine Translation of Fashion E-Commerce Descriptions

Laenen, Katrien; Moens, Marie-Francine

doi:10.1007/978-3-030-15436-3_4

Included in the following conference series:

International Conference on Fashion communication: between tradition and future digital developments

4504 Accesses
2 Citations

Abstract

Neural networks become extremely popular in artificial intelligence. In this paper we show how they aid in automatically translating fashion item descriptions and how they use fashion images to generate the translations. More specifically, we propose a multimodal neural machine translation model in which the decoder that generates the translation attends to visually grounded representations that capture both the semantics of the fashion words in the source language and regions in the fashion image. We introduce this novel neural architecture in the context of fashion e-commerce, where product descriptions need to be available in multiple languages. We report state-of-the-art multimodal translation results on a real-world fashion e-commerce dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT 2016, pp. 1545–1554. ACL (2016)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)
Google Scholar
Caglayan, O., Aransa, W., Wang, Y., Masana, M., Garcìa-Martìnez, M., Bougares, F., Barrault, L., van de Weijer, J.: Does multimodality help human and machine for translation and image captioning? In: Proceedings of WMT 2016, pp. 627–633. ACL (2016)
Google Scholar
Caglayan, O., Aransa, W., Bardet, A., Garcìa-Martìnez, M., Bougares, F., Barrault, L., Masana, M., Herranz, L., van de Weijer, J.: LIUM-CVC submissions for WMT17 multimodal translation task. In: Proceedings of WMT 2017, Volume 2: Shared Task Papers, pp. 432–439 (2017)
Google Scholar
Calixto, I., Liu, Q.: Incorporating global visual features into attention-based neural machine translation. In: Proceedings of EMNLP 2017, pp. 992–1003 (2017)
Google Scholar
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)
Google Scholar
Elliott, D., Kàdàr, À.: Imagination improves multimodal translation. In: Proceedings of IJCNLP 2017, pp. 130–141 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Huang, P.-Y., Liu, F., Shiang, S.-R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of WMT 2016, Volume 2: Shared Task Papers, pp. 639–645 (2016)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of CVPR 2015, pp. 3128–3137 (2015)
Google Scholar
Laenen, K., Zoghbi, S., Moens, M.-F.: Web search of fashion items with multimodal querying. In: Proceedings of WSDM 2018 (2018)
Google Scholar
Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross-attention for image-text matching. In: Proceedings of ECCV 2018, pp. 212–228 (2018)
Chapter Google Scholar
Munigala, V., Mishra, A., Tamilselvam, S.G., Khare, S., Dasgupta, R., Sankaran, A.: Persuaide! An adaptive persuasive text generation system for fashion domain. In: Companion Proceedings of the Web Conference 2018, pp. 335–342. ACM (2018)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Article Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997)
Article Google Scholar
Sennrich, R., Firat, O., Cho, K., Birch, A., Haddow, B., Hitschler, J., Junczys-Dowmunt, M., Läubli, S., Miceli Barone, A.V., Mokry, J., Nadejde, M.: Nematus: a toolkit for neural machine translation. In: Proceedings of the Software Demonstrations of EACL 2017, pp. 65–68. ACL (2017)
Google Scholar
Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: Proceedings of ECCV 2016, pp. 451–466 (2016)
Chapter Google Scholar
Zhang, Y., Lu, H.: Deep cross-modal projection learning for image-text matching. In: Proceedings of ECCV 2018, pp. 707–723 (2018)
Chapter Google Scholar
Zhou, M., Cheng, R., Lee, Y.J., Yu, Z.: A visual attention grounding neural model for multimodal machine translation. In: Proceedings of EMNLP 2018, pp. 3643–3653. ACL (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

KU Leuven, Louvain, Belgium
Katrien Laenen & Marie-Francine Moens

Authors

Katrien Laenen
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katrien Laenen .

Editor information

Editors and Affiliations

Institute of Digital Technologies for Communication, USI – Università della Svizzera italiana, Lugano, Switzerland
Nadzeya Kalbaska
ISEM Fashion Business School, University of Navarra, Madrid, Spain
Teresa Sádaba
IREST/EIREST, Université Paris 1 Panthéon Sorbonne, Paris, France
Francesca Cominelli
Institute of Digital Technologies for Communication, USI – Università della Svizzera italiana, Lugano, Switzerland
Lorenzo Cantoni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laenen, K., Moens, MF. (2019). Multimodal Neural Machine Translation of Fashion E-Commerce Descriptions. In: Kalbaska, N., Sádaba, T., Cominelli, F., Cantoni, L. (eds) Fashion Communication in the Digital Age. FACTUM 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-15436-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-15436-3_4
Published: 04 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15435-6
Online ISBN: 978-3-030-15436-3
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics