Skip to main content

Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

The existing work of using generative model in multi-turn dialogue system is often based on RNN (Recurrent neural network) even though the Transformer structure has achieved great success in other fields of NLP. In the multi-turn conversation task, a response is produced according to both the source utterance and the utterances in the previous turn which are regarded as context utterances. However, vanilla Transformer processes utterances in isolation and hence cannot explicitly handle the differences between context utterances and source utterance. In addition, even the same word could have different meanings in different contexts as there are rich information within context utterance and source utterance in multi-turn conversation. Based on context and multi-dimensional attention mechanism, an end-to-end model, which is extended from vanilla Transformer, is proposed for response generation. With the context mechanism, information from the context utterance can flow to the source and hence jointly control response generation. Multi-dimensional attention mechanism enables our model to capture more context and source utterance information by 2D vectoring the attention weights. Experiments show that the proposed model outperforms other state-of-the-art models (+35.8% better than the best baseline).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The subscript i is omitted for clarity.

  2. 2.

    They are all native speakers and have graduate degrees or above.

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)

    Google Scholar 

  3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  4. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  5. Kuang, S., Xiong, D.: Fusing recency into neural machine translation with an inter-sentence gate model. arXiv preprint arXiv:1806.04466 (2018)

  6. Mou, L., Song, Y., Yan, R., Li, G., Zhang, L., Jin, Z.: Sequence to backward and forward sequences: a content-introducing approach to generative short-text conversation. In: COLING 2016: Technical Papers, pp. 3349–3358 (2016)

    Google Scholar 

  7. Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: EMNLP, pp. 583–593. Association for Computational Linguistics (2011)

    Google Scholar 

  8. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A.C., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: AAAI, vol. 16, pp. 3776–3784 (2016)

    Google Scholar 

  9. Serban, I.V., et al.: A hierarchical latent variable encoder-decoder model for generating dialogues. In: AAAI, pp. 3295–3301 (2017)

    Google Scholar 

  10. Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1577–1586 (2015)

    Google Scholar 

  11. Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018)

    Google Scholar 

  12. Sordoni, A., et al.: A neural network approach to context-sensitive generation of conversational responses. In: NAACL: Human Language Technologies, pp. 196–205 (2015)

    Google Scholar 

  13. Tao, C., Mou, L., Zhao, D., Yan, R.: Ruber: an unsupervised method for automatic evaluation of open-domain dialog systems. In: AAAI (2018)

    Google Scholar 

  14. Tian, Z., Yan, R., Mou, L., Song, Y., Feng, Y., Zhao, D.: How to make context more useful? an empirical study on context-aware neural conversational models. In: ACL (Volume 2: Short Papers), vol. 2, pp. 231–236 (2017)

    Google Scholar 

  15. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  16. Williams, J.D., Asadi, K., Zweig, G.: Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In: ACL (Volume 1: Long Papers), vol. 1, pp. 665–677 (2017)

    Google Scholar 

  17. Xing, C., Wu, Y., Wu, W., Huang, Y., Zhou, M.: Hierarchical recurrent attention network for response generation. In: AAAI (2018)

    Google Scholar 

  18. Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: AAAI, pp. 4618–4626 (2017)

    Google Scholar 

  19. Zhang, J., et al.: Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017)

  20. Zhou, X., et al.: Multi-turn response selection for chatbots with deep attention matching network. In: ACL (Volume 1: Long Papers), vol. 1, pp. 1118–1127 (2018)

    Google Scholar 

Download references

Acknowledgement

This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337, U1736207), and the National Key Research and Development Program of China NO. 2016QY03D0604.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bo Su or Gongshen Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, R., Sun, J., Su, B., Liu, G. (2019). Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics