Abstract
Neural networks become extremely popular in artificial intelligence. In this paper we show how they aid in automatically translating fashion item descriptions and how they use fashion images to generate the translations. More specifically, we propose a multimodal neural machine translation model in which the decoder that generates the translation attends to visually grounded representations that capture both the semantics of the fashion words in the source language and regions in the fashion image. We introduce this novel neural architecture in the context of fashion e-commerce, where product descriptions need to be available in multiple languages. We report state-of-the-art multimodal translation results on a real-world fashion e-commerce dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT 2016, pp. 1545–1554. ACL (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)
Caglayan, O., Aransa, W., Wang, Y., Masana, M., Garcìa-Martìnez, M., Bougares, F., Barrault, L., van de Weijer, J.: Does multimodality help human and machine for translation and image captioning? In: Proceedings of WMT 2016, pp. 627–633. ACL (2016)
Caglayan, O., Aransa, W., Bardet, A., Garcìa-Martìnez, M., Bougares, F., Barrault, L., Masana, M., Herranz, L., van de Weijer, J.: LIUM-CVC submissions for WMT17 multimodal translation task. In: Proceedings of WMT 2017, Volume 2: Shared Task Papers, pp. 432–439 (2017)
Calixto, I., Liu, Q.: Incorporating global visual features into attention-based neural machine translation. In: Proceedings of EMNLP 2017, pp. 992–1003 (2017)
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)
Elliott, D., Kà dà r, À.: Imagination improves multimodal translation. In: Proceedings of IJCNLP 2017, pp. 130–141 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Huang, P.-Y., Liu, F., Shiang, S.-R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of WMT 2016, Volume 2: Shared Task Papers, pp. 639–645 (2016)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of CVPR 2015, pp. 3128–3137 (2015)
Laenen, K., Zoghbi, S., Moens, M.-F.: Web search of fashion items with multimodal querying. In: Proceedings of WSDM 2018 (2018)
Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross-attention for image-text matching. In: Proceedings of ECCV 2018, pp. 212–228 (2018)
Munigala, V., Mishra, A., Tamilselvam, S.G., Khare, S., Dasgupta, R., Sankaran, A.: Persuaide! An adaptive persuasive text generation system for fashion domain. In: Companion Proceedings of the Web Conference 2018, pp. 335–342. ACM (2018)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997)
Sennrich, R., Firat, O., Cho, K., Birch, A., Haddow, B., Hitschler, J., Junczys-Dowmunt, M., Läubli, S., Miceli Barone, A.V., Mokry, J., Nadejde, M.: Nematus: a toolkit for neural machine translation. In: Proceedings of the Software Demonstrations of EACL 2017, pp. 65–68. ACL (2017)
Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: Proceedings of ECCV 2016, pp. 451–466 (2016)
Zhang, Y., Lu, H.: Deep cross-modal projection learning for image-text matching. In: Proceedings of ECCV 2018, pp. 707–723 (2018)
Zhou, M., Cheng, R., Lee, Y.J., Yu, Z.: A visual attention grounding neural model for multimodal machine translation. In: Proceedings of EMNLP 2018, pp. 3643–3653. ACL (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Laenen, K., Moens, MF. (2019). Multimodal Neural Machine Translation of Fashion E-Commerce Descriptions. In: Kalbaska, N., Sádaba, T., Cominelli, F., Cantoni, L. (eds) Fashion Communication in the Digital Age. FACTUM 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-15436-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-15436-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15435-6
Online ISBN: 978-3-030-15436-3
eBook Packages: Business and ManagementBusiness and Management (R0)