Skip to main content

Using Bidirectional Transformer-CRF for Spoken Language Understanding

  • Conference paper
  • First Online:
Book cover Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Abstract

Spoken Language Understanding (SLU) is a critical component in spoken dialogue systems. It is typically composed of two tasks: intent detection (ID) and slot filling (SF). Currently, most effective models carry out these two tasks jointly and often result in better performance than separate models. However, these models usually fail to model the interaction between intent and slots and ties these two tasks only by a joint loss function. In this paper, we propose a new model based on bidirectional Transformer and introduce a padding method, enabling intent and slots to interact with each other in an effective way. A CRF layer is further added to achieve global optimization. We conduct our experiments on benchmark ATIS and Snips datasets, and results show that our model achieves state-of-the-art on both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, C., Li, L., Qi, J.: A self-attentive model with gate mechanism for spoken language understanding. In: EMNLP, pp. 3824–3833 (2018)

    Google Scholar 

  2. Ba, J., Kiros, R., Hinton, G.E.: Layer normalization. CoRR (2016)

    Google Scholar 

  3. Deng, L., Tur, G., He, X., Hakkani-Tur, D.: Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 210–215. IEEE (2012)

    Google Scholar 

  4. Deoras, A., Sarikaya, R.: Deep belief network based semantic taggers for spoken language understanding. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2713–2717, January 2013

    Google Scholar 

  5. Goo, C.W., et al.: Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), vol. 2, pp. 753–757 (2018)

    Google Scholar 

  6. Haffner, P., Tur, G., Wright, J.H.: Optimizing SVMs for complex call classification. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 1, p. I. IEEE (2003)

    Google Scholar 

  7. Hakkani-Tür, D., et al.: Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In: Interspeech, pp. 715–719 (2016)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  9. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)

    Google Scholar 

  10. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR (2015)

    Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR (2015)

    Google Scholar 

  12. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8. Association for Computational Linguistics (2001)

    Google Scholar 

  13. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, 8 June 2001, pp. 282–289 (2001)

    Google Scholar 

  14. Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)

  15. Ma, S., Sun, X.: A new recurrent neural CRF for learning non-linear edge features. arXiv preprint arXiv:1611.04233 (2016)

  16. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML (2000)

    Google Scholar 

  17. Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)

    Article  Google Scholar 

  18. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  19. Tür, G., Hakkani-Tür, D.Z., Heck, L.P., Parthasarathy, S.: Sentence simplification for spoken language understanding. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5628–5631 (2011)

    Google Scholar 

  20. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  21. Xu, P., Sarikaya, R.: Convolutional neural network based triangular CRF for joint intent detection and slot filling, pp. 78–83 (2013)

    Google Scholar 

  22. Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., Shi, Y.: Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 189–194. IEEE (2014)

    Google Scholar 

  23. Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech, pp. 2524–2528 (2013)

    Google Scholar 

  24. Zhang, X., Ma, D., Wang, H.: Learning dialogue history for spoken language understanding. In: NLPCC (2018)

    Google Scholar 

  25. Zhang, X., Wang, H.: A joint model of intent determination and slot filling for spoken language understanding. In: IJCAI, pp. 2993–2999 (2016)

    Google Scholar 

  26. Zhao, S., Meng, R., He, D., Andi, S., Bambang, P.: Integrating Transformer and Paraphrase Rules for Sentence Simplification. arXiv preprint arXiv:1810.11193 (2018)

  27. Zhou, L., Zhou, Y., Corso, J.J., Socher, R., Xiong, C.: End-to-end dense video captioning with masked transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8739–8748 (2018)

    Google Scholar 

Download references

Acknowledgments

Our work is supported by the National Key Research and Development Program of China under Grant No. 2017YFB1002101 and National Natural Science Foundation of China under Grant No. 61433015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houfeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, L., Wang, H. (2019). Using Bidirectional Transformer-CRF for Spoken Language Understanding. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32233-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32232-8

  • Online ISBN: 978-3-030-32233-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics