Look-Ahead Attention for Generation in Neural Machine Translation

Zhou, Long; Zhang, Jiajun; Zong, Chengqing

doi:10.1007/978-3-319-73618-1_18

Look-Ahead Attention for Generation in Neural Machine Translation

Long Zhou¹⁸,
Jiajun Zhang¹⁸ &
Chengqing Zong^18,19

Conference paper
First Online: 05 January 2018

3287 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

The attention model has become a standard component in neural machine translation (NMT) and it guides translation process by selectively focusing on parts of the source sentence when predicting each target word. However, we find that the generation of a target word does not only depend on the source sentence, but also rely heavily on the previous generated target words, especially the distant words which are difficult to model by using recurrent neural networks. To solve this problem, we propose in this paper a novel look-ahead attention mechanism for generation in NMT, which aims at directly capturing the dependency relationship between target words. We further design three patterns to integrate our look-ahead attention into the conventional attention model. Experiments on NIST Chinese-to-English and WMT English-to-German translation tasks show that our proposed look-ahead attention mechanism achieves substantial improvements over state-of-the-art baselines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The corpora include LDC2000T50, LDC2002T01, LDC2002E18, LDC2003E07, LDC2003E14, LDC2003T17 and LDC2004T07.
2.
http://www.statmt.org/wmt14/translation-task.html.
3.
https://github.com/isi-nlp/Zoph_RNN.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR 2015 (2015)
Google Scholar
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016)
Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Semi-supervised learning for neural machine translation. In: Proceedings of ACL 2016 (2016)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL 2005 (2005)
Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of EMNLP 2014 (2014)
Google Scholar
Cohn, T., Hoang, C.D.V., Vymolova, E., Yao, K., Dyer, C., Haffari, G.: Incorporating structural alignment biases into an attentional neural translation model. arXiv preprint arXiv:1601.01085 (2016)
He, W., He, Z., Wu, H., Wang, H.: Improved neural machine translation with SMT features. In: Proceedings of AAAI 2016 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory, vol. 9. MIT Press, Cambridge (1997)
Google Scholar
Junczys-Dowmunt, M., Dwojak, T., Hoang, H.: Is neural machine translation ready for deployment? A case study on 30 translation directions. In: Proceedings of IWSLT 2016 (2016)
Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of EMNLP 2013 (2013)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. Association for Computational Linguistics (2007)
Google Scholar
Koehn, P., Knowles, R.: Six challanges for neural machine translation. arXiv preprint arXiv:1706.03872 (2017)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of ACL-NAACL 2013 (2003)
Google Scholar
Li, X., Zhang, J., Zong, C.: Towards zero unknown word in neural machine translation. In: Proceedings of IJCAI 2016 (2016)
Google Scholar
Lin, Z., Feng, M., Santos, C.N.d., Yu, M., Xiang, B., Zhou, B., Bengio, Y.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015 (2015)
Google Scholar
Luong, M.T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. In: Proceedings of ACL 2015 (2015)
Google Scholar
Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: Proceedings of COLING 2016 (2016)
Google Scholar
Mi, H., Sankaran, B., Wang, Z., Ge, N., Ittycheriah, A.: A coverage embedding model for neural machine translation. In: Proceedings of EMNLP 2016 (2016)
Google Scholar
Mi, H., Wang, Z., Ge, N., Ittycheriah, A.: Supervised attentions for neural machine translation. In: Proceedings of EMNLP 2016 (2016)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL 2002 (2002)
Google Scholar
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL 2016 (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of ACL 2016 (2016)
Google Scholar
Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of ACL 2016 (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of NIPS 2014 (2014)
Google Scholar
Tu, Z., Liu, Y., Lu, Z., Liu, X., Li, H.: Context gates for neural machine translation. arXiv preprint arXiv:1608.06043 (2016)
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of ACL 2016 (2016)
Google Scholar
Vawani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2016)
Wang, X., Lu, Z., Tu, Z., Li, H., Xiong, D., Zhang, M.: Neural machine translation advised by statistical machine translation. In: Proceedings of AAAI 2017 (2017)
Google Scholar
Wu, S., Zhang, D., Yang, N., Li, M., Zhou, M.: Sequence-to-dependency neural machine translation. In: Proceedings of ACL 2017 (2017)
Google Scholar
Zhai, F., Zhang, J., Zhou, Y., Zong, C., et al.: Tree-based translation without using parse trees. In: Proceedings of COLING 2012 (2012)
Google Scholar
Zhang, J., Zong, C.: Exploiting source-side monolingual data in neural machine translation. In: Proceedings of EMNLP 2016 (2016)
Google Scholar
Zhou, J., Cao, Y., Wang, X., Li, P., Xu, W.: Deep recurrent models with fast-forward connections for neural machine translation. arXiv preprint arXiv:1606.04199 (2016)
Zhou, L., Hu, W., Zhang, J., Zong, C.: Neural system combination for machine translation. In: Proceedings of ACL 2017 (2017)
Google Scholar

Download references

Acknowledgments

The research work has been funded by the Natural Science Foundation of China under Grant No. 61673380, No. 61402478 and No. 61403379.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, CASIA, University of Chinese Academy of Sciences, Beijing, China
Long Zhou, Jiajun Zhang & Chengqing Zong
CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
Chengqing Zong

Authors

Long Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengqing Zong .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Zhang, J., Zong, C. (2018). Look-Ahead Attention for Generation in Neural Machine Translation. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_18
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics