Advertisement

International Journal of Automation and Computing

, Volume 16, Issue 6, pp 720–736 | Cite as

Transfer Hierarchical Attention Network for Generative Dialog System

  • Xiang ZhangEmail author
  • Qiang Yang
Research Article

Abstract

In generative dialog systems, learning representations for the dialog context is a crucial step in generating high quality responses. The dialog systems are required to capture useful and compact information from mutually dependent sentences such that the generation process can effectively attend to the central semantics. Unfortunately, existing methods may not effectively identify importance distributions for each lower position when computing an upper level feature, which may lead to the loss of information critical to the constitution of the final context representations. To address this issue, we propose a transfer learning based method named transfer hierarchical attention network (THAN). The THAN model can leverage useful prior knowledge from two related auxiliary tasks, i.e., keyword extraction and sentence entailment, to facilitate the dialog representation learning for the main dialog generation task. During the transfer process, the syntactic structure and semantic relationship from the auxiliary tasks are distilled to enhance both the word-level and sentence-level attention mechanisms for the dialog system. Empirically, extensive experiments on the Twitter Dialog Corpus and the PERSONA-CHAT dataset demonstrate the effectiveness of the proposed THAN model compared with the state-of-the-art methods.

Keywords

Dialog system transfer learning deep learning natural language processing (NLP) artificial intelligence 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    J. Weizenbaum. ELIZA — A computer program for the study of natural language communication between man and machine. Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966. DOI:  https://doi.org/10.1145/365153.365168.Google Scholar
  2. [2]
    H. Wang, Z. D. Lu, H. Li, E. H. Chen. A dataset for research on short-text conversations. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, USA, pp. 935–945, 2013.Google Scholar
  3. [3]
    Y. Wu, W. Wu, C. Xing, Z. J. Li, M. Zhou. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 496–505, 2017. DOI:  https://doi.org/10.18653/v1/P17-1046.Google Scholar
  4. [4]
    X. Y. Zhou, D. X. Dong, H. Wu, S. Q. Zhao, D. H. Yu, H. Tian, X. Liu, R. Yan. Multi-view response selection for human-computer conversation. In Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 372–381, 2016.Google Scholar
  5. [5]
    T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. L. Liao. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. International Journal of Automation and Computing, vol. 14, no. 5, pp. 503–519, 2017. DOI:  https://doi.org/10.1007/s11633-017-1054-2.Google Scholar
  6. [6]
    Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI:  https://doi.org/10.1038/nature14539.Google Scholar
  7. [7]
    L. Zhou, J. F. Gao, D. Li, H. Y. Shum. The design and implementation of XiaoIce, an empathetic social chatbot. arXiv. preprint arXiv: 1812.08989, 2018.Google Scholar
  8. [8]
    H. S. Chen, X. R. Liu, D. W. Yin, J. L. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, vol. 19, no. 2, pp. 25–35, 2017. DOI:  https://doi.org/10.1145/3166054.3166058.Google Scholar
  9. [9]
    O. Vinyals, Q. V. Le. A neural conversational model. In Proceedings of the 31st International Conference on Machine Learning Workshop, Lille, France, 2015.Google Scholar
  10. [10]
    L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Beijing, China, pp. 1577–1586, 2015.Google Scholar
  11. [11]
    D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint, arXiv: 1409.0473, 2014.Google Scholar
  12. [12]
    I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, USA, pp. 3776–3783, 2016.Google Scholar
  13. [13]
    C. Xing, W. Wu, Y. Wu, M. Zhou, Y. L. Huang, W. Y. Ma. Hierarchical recurrent attention network for response generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI Press, New Orleans, USA, 2018.Google Scholar
  14. [14]
    L. M. Liu, M. Utiyama, A. Finch, E. Sumita. Neural machine translation with supervised attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3093–3102, 2016.Google Scholar
  15. [15]
    A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. Improving language understanding by generative pre-training, [Online], Available: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
  16. [16]
    J. Devlin, M. W. Chang, K. Lee, K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint, arXiv: 1810.04805, 2018.Google Scholar
  17. [17]
    A. Søgaard, Y. Goldberg. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 231–235, 2016.Google Scholar
  18. [18]
    K. Hashimoto, C. M. Xiong, Y. Tsuruoka, R. Socher. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 446–451, 2017.Google Scholar
  19. [19]
    I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3295–3301, 2017.Google Scholar
  20. [20]
    T. C. Zhao, R. Zhao, M. Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 654–664, 2017. DOI:  https://doi.org/10.18653/v1/P17-1061.Google Scholar
  21. [21]
    I. V. Serban, T. Klinger, G. Tesauro, K. Talamadupula, B. W. Zhou, Y. Bengio, A. Courville. Multiresolution recurrent neural networks: An application to dialogue response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3288–3294, 2017.Google Scholar
  22. [22]
    M. Y. Zhang, G. H. Tian, C. C. Li, J. Gong. Learning to transform service instructions into actions with reinforcement learning and knowledge base. International Journal of Automation and Computing, vol. 15, no. 5, pp. 582–592, 2018. DOI:  https://doi.org/10.1007/s11633-018-1128-9.Google Scholar
  23. [23]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning. arXiv preprint, arXiv: 1312.5602, 2013.Google Scholar
  24. [24]
    J. W. Li, W. Monroe, A. Ritter, M. Galley, J. F. Gao, D. Jurafsky. Deep reinforcement learning for dialogue generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 1192–1202, 2016.Google Scholar
  25. [25]
    J. W. Li, W. Monroe, T. L. Shi, S. Jean, A. Ritter, D. Jurafsky. Adversarial learning for neural dialogue generation. In Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 2157–2169, 2017.Google Scholar
  26. [26]
    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of International Conference on Neural Information Processing Systems, MIT Press, Montreal, Canada, pp. 2672–2680, 2014.Google Scholar
  27. [27]
    C. Xing, W. Wu, Y. Wu, J. Liu, Y. L. Huang, M. Zhou, W. Y. Ma. Topic aware neural response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3351–3357, 2017.Google Scholar
  28. [28]
    D. M. Blei, A. Y. Ng, M. I. Jordan. Latent dirichlet allocation. The Journal of machine Learning Research, vol. 3, pp. 993–1022, 2003.zbMATHGoogle Scholar
  29. [29]
    L. L. Mou, Y. P. Song, R. Yan, G. Li, L. Zhang, Z. Jin. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3349–3358, 2016.Google Scholar
  30. [30]
    H. Zhou, T. Young, M. L. Huang, H. Z. Zhao, J. F. Xu, X. Y. Zhu. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, pp. 4623–4629, 2018.Google Scholar
  31. [31]
    J. W. Li, M. Galley, C. Brockett, G. P. Spithourakis, J. F. Gao, B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 994–1003, 2016.Google Scholar
  32. [32]
    M. T. Luong, H. Pham, C. D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 1412–1421, 2015.Google Scholar
  33. [33]
    K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 2048–2057, 2015.Google Scholar
  34. [34]
    H. T. Mi, Z. G. Wang, A. Ittycheriah. Supervised attentions for neural machine translation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2283–2288, 2016.Google Scholar
  35. [35]
    T. Cohn, C. D. V. Hoang, E. Vymolova, K. S. Yao, C. Dyer, G. Haffari. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, USA, pp. 876–885, 2016.Google Scholar
  36. [36]
    S. Feng, S. J. Liu, M. Li, M. Zhou. Implicit distortion and fertility models for attention-based encoder-decoder NMT model. arXiv preprint, arXiv: 1601.03317, 2016.Google Scholar
  37. [37]
    S. J. Pan, Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. DOI:  https://doi.org/10.1109/TKDE.2009.191.Google Scholar
  38. [38]
    J. Howard, S. Ruder. Fine-tuned language models for text classification. arXiv preprint, arXiv: 1801.06146, 2018.Google Scholar
  39. [39]
    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, USA, pp. 3111–3119, 2013.Google Scholar
  40. [40]
    J. Pennington, R. Socher, C. D. Manning. GloVe: Global vectors for word representation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543, 2014.Google Scholar
  41. [41]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5998–6008, 2017.Google Scholar
  42. [42]
    P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, N. Shazeer. Generating Wikipedia by summarizing long sequences. arXiv preprint, arXiv: 1801.10198, 2018.Google Scholar
  43. [43]
    N. Kitaev, D. Klein. Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, USA, 2018.Google Scholar
  44. [44]
    H. S. Chen, Y. Zhang, Q. Liu. Neural network for heterogeneous annotations. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 731–741, 2016.Google Scholar
  45. [45]
    H. M. Wang, Y. Zhang, G. L. Chan, J. Yang, H. L. Chieu. Universal dependencies parsing for colloquial Singaporean English. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 1732–1744, 2017. DOI:  https://doi.org/10.18653/v1/P17-1159.Google Scholar
  46. [46]
    L. Marujo, A. Gershman, J. Carbonell, R. Frederking, J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. In Proceedings of the 8th International Conference on Language Resources and Evaluation, European Language Resources Association, Istanbul, Turkey, 2012.Google Scholar
  47. [47]
    A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005. DOI:  https://doi.org/10.1016/j.neunet.2005.06.042.Google Scholar
  48. [48]
    S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI:  https://doi.org/10.1162/neco.1997.9.8.1735.Google Scholar
  49. [49]
    S. R. Bowman, G. Angeli, C. Potts, C. D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642, 2015.Google Scholar
  50. [50]
    T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, S. Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, ISCA, Makuhari, Chiba, Japan, pp. 1045–1048, 2010.Google Scholar
  51. [51]
    A. Ritter, C. Cherry, W. B. Dolan. Data-driven response generation in social media. In Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, pp. 583–593, 2011.Google Scholar
  52. [52]
    S. Z. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, J. Weston. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, Australia, pp. 2204–2213, 2018. DOI:  https://doi.org/10.18653/v1/P18-1205.Google Scholar
  53. [53]
    D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2014.Google Scholar
  54. [54]
    C. W. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, J. Pineau. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2122–2132, 2016.Google Scholar
  55. [55]
    Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, vol. 3, pp. 1137–1155, 2003.zbMATHGoogle Scholar
  56. [56]
    J. Wieting, M. Bansal, K. Gimpel, K. Livescu. Towards universal paraphrastic sentence embeddings. arXiv preprint, arXiv: 1511.08198, 2015.Google Scholar
  57. [57]
    G. Forgues, J. Pineau, J. M. Larchevêque, R. Tremblay. Bootstrapping dialog systems with word embeddings. In Proceedings of NIPS Workshop on Modern Machine Learning and Natural Language Processing Workshop, Montreal, Canada, 2014.Google Scholar

Copyright information

© Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Science and Engineering DepartmentHong Kong University of Science and TechnologyHong KongChina

Personalised recommendations