Abstract
The existing work of using generative model in multi-turn dialogue system is often based on RNN (Recurrent neural network) even though the Transformer structure has achieved great success in other fields of NLP. In the multi-turn conversation task, a response is produced according to both the source utterance and the utterances in the previous turn which are regarded as context utterances. However, vanilla Transformer processes utterances in isolation and hence cannot explicitly handle the differences between context utterances and source utterance. In addition, even the same word could have different meanings in different contexts as there are rich information within context utterance and source utterance in multi-turn conversation. Based on context and multi-dimensional attention mechanism, an end-to-end model, which is extended from vanilla Transformer, is proposed for response generation. With the context mechanism, information from the context utterance can flow to the source and hence jointly control response generation. Multi-dimensional attention mechanism enables our model to capture more context and source utterance information by 2D vectoring the attention weights. Experiments show that the proposed model outperforms other state-of-the-art models (+35.8% better than the best baseline).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The subscript i is omitted for clarity.
- 2.
They are all native speakers and have graduate degrees or above.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kuang, S., Xiong, D.: Fusing recency into neural machine translation with an inter-sentence gate model. arXiv preprint arXiv:1806.04466 (2018)
Mou, L., Song, Y., Yan, R., Li, G., Zhang, L., Jin, Z.: Sequence to backward and forward sequences: a content-introducing approach to generative short-text conversation. In: COLING 2016: Technical Papers, pp. 3349–3358 (2016)
Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: EMNLP, pp. 583–593. Association for Computational Linguistics (2011)
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A.C., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: AAAI, vol. 16, pp. 3776–3784 (2016)
Serban, I.V., et al.: A hierarchical latent variable encoder-decoder model for generating dialogues. In: AAAI, pp. 3295–3301 (2017)
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1577–1586 (2015)
Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018)
Sordoni, A., et al.: A neural network approach to context-sensitive generation of conversational responses. In: NAACL: Human Language Technologies, pp. 196–205 (2015)
Tao, C., Mou, L., Zhao, D., Yan, R.: Ruber: an unsupervised method for automatic evaluation of open-domain dialog systems. In: AAAI (2018)
Tian, Z., Yan, R., Mou, L., Song, Y., Feng, Y., Zhao, D.: How to make context more useful? an empirical study on context-aware neural conversational models. In: ACL (Volume 2: Short Papers), vol. 2, pp. 231–236 (2017)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Williams, J.D., Asadi, K., Zweig, G.: Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In: ACL (Volume 1: Long Papers), vol. 1, pp. 665–677 (2017)
Xing, C., Wu, Y., Wu, W., Huang, Y., Zhou, M.: Hierarchical recurrent attention network for response generation. In: AAAI (2018)
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: AAAI, pp. 4618–4626 (2017)
Zhang, J., et al.: Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017)
Zhou, X., et al.: Multi-turn response selection for chatbots with deep attention matching network. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1118–1127 (2018)
Acknowledgement
This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337, U1736207), and the National Key Research and Development Program of China NO. 2016QY03D0604.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, R., Sun, J., Su, B., Liu, G. (2019). Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)