Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation

Tan, Ruxin; Sun, Jiahui; Su, Bo; Liu, Gongshen

doi:10.1007/978-3-030-32236-6_16

Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation

Ruxin Tan¹³,
Jiahui Sun¹³,
Bo Su¹³ &
…
Gongshen Liu¹³

Conference paper
First Online: 30 September 2019

4703 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

The existing work of using generative model in multi-turn dialogue system is often based on RNN (Recurrent neural network) even though the Transformer structure has achieved great success in other fields of NLP. In the multi-turn conversation task, a response is produced according to both the source utterance and the utterances in the previous turn which are regarded as context utterances. However, vanilla Transformer processes utterances in isolation and hence cannot explicitly handle the differences between context utterances and source utterance. In addition, even the same word could have different meanings in different contexts as there are rich information within context utterance and source utterance in multi-turn conversation. Based on context and multi-dimensional attention mechanism, an end-to-end model, which is extended from vanilla Transformer, is proposed for response generation. With the context mechanism, information from the context utterance can flow to the source and hence jointly control response generation. Multi-dimensional attention mechanism enables our model to capture more context and source utterance information by 2D vectoring the attention weights. Experiments show that the proposed model outperforms other state-of-the-art models (+35.8% better than the best baseline).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The subscript i is omitted for clarity.
2.
They are all native speakers and have graduate degrees or above.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kuang, S., Xiong, D.: Fusing recency into neural machine translation with an inter-sentence gate model. arXiv preprint arXiv:1806.04466 (2018)
Mou, L., Song, Y., Yan, R., Li, G., Zhang, L., Jin, Z.: Sequence to backward and forward sequences: a content-introducing approach to generative short-text conversation. In: COLING 2016: Technical Papers, pp. 3349–3358 (2016)
Google Scholar
Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: EMNLP, pp. 583–593. Association for Computational Linguistics (2011)
Google Scholar
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A.C., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: AAAI, vol. 16, pp. 3776–3784 (2016)
Google Scholar
Serban, I.V., et al.: A hierarchical latent variable encoder-decoder model for generating dialogues. In: AAAI, pp. 3295–3301 (2017)
Google Scholar
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1577–1586 (2015)
Google Scholar
Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018)
Google Scholar
Sordoni, A., et al.: A neural network approach to context-sensitive generation of conversational responses. In: NAACL: Human Language Technologies, pp. 196–205 (2015)
Google Scholar
Tao, C., Mou, L., Zhao, D., Yan, R.: Ruber: an unsupervised method for automatic evaluation of open-domain dialog systems. In: AAAI (2018)
Google Scholar
Tian, Z., Yan, R., Mou, L., Song, Y., Feng, Y., Zhao, D.: How to make context more useful? an empirical study on context-aware neural conversational models. In: ACL (Volume 2: Short Papers), vol. 2, pp. 231–236 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Williams, J.D., Asadi, K., Zweig, G.: Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In: ACL (Volume 1: Long Papers), vol. 1, pp. 665–677 (2017)
Google Scholar
Xing, C., Wu, Y., Wu, W., Huang, Y., Zhou, M.: Hierarchical recurrent attention network for response generation. In: AAAI (2018)
Google Scholar
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: AAAI, pp. 4618–4626 (2017)
Google Scholar
Zhang, J., et al.: Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017)
Zhou, X., et al.: Multi-turn response selection for chatbots with deep attention matching network. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1118–1127 (2018)
Google Scholar

Download references

Acknowledgement

This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337, U1736207), and the National Key Research and Development Program of China NO. 2016QY03D0604.

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Ruxin Tan, Jiahui Sun, Bo Su & Gongshen Liu

Authors

Ruxin Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bo Su
View author publications
You can also search for this author in PubMed Google Scholar
Gongshen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bo Su or Gongshen Liu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, R., Sun, J., Su, B., Liu, G. (2019). Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_16
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)