Skip to main content

Label-Dependencies Aware Recurrent Neural Networks

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Abstract

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model sequence labeling is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are reinjected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(h_*\) means the hidden layer of any model, as the output layer is computed in the same way for all networks described in this paper.

  2. 2.

    In the literature \(\varPhi \) and \(\varGamma \) are the sigmoid and tanh, respectively.

  3. 3.

    The one-hot representation of a token represented by an index i in a dictionary, is a vector v of the same size as the dictionary and assigned zero everywhere, except at position i where it is 1.

  4. 4.

    In our case, \(y_i\) is explicitely converted from probability distribution to one-hot representation.

  5. 5.

    Indeed we observed better performances when using a word window with respect to when using a single word.

  6. 6.

    Available at http://deeplearning.net/tutorial/rnnslu.html.

  7. 7.

    For example the component localization can be combined with other components like city, relative-distance, generic-relative-location, street etc.

  8. 8.

    https://www.gnu.org/software/octave/; Our code is described at http://www.marcodinarelli.it/software.php and available upon request.

  9. 9.

    http://www.openblas.net; This library allows a speed-up of roughly \(330\times \) on a single matrix-matrix multiplication using 16 cores. This is very attractive with respect to the speed-up of \(380\times \) that can be reached with a GPU, keeping into account that both Octave and OpenBLAS are available for free.

  10. 10.

    This is a publication in French, but results in the tables are easy to understand and directly comparable to our results.

  11. 11.

    We did not run further experiments because without a GPU, experiments on the Penn Treebank are still quite expensive.

  12. 12.

    The errors made by the system are classified as Insertions (I), Deletions (D) and Substitutions (S). The sum of these errors is divided by the number of concepts in the reference annotation (R): \(CER = \frac{I + D + S}{R}\).

References

  1. Jordan, M.I.: Serial order: A parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale, NJ (1989)

    Google Scholar 

  2. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)

    Article  Google Scholar 

  3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  4. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010

    Google Scholar 

  5. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)

    Google Scholar 

  6. Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)

    Google Scholar 

  7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  8. Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)

    Google Scholar 

  9. Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)

    Google Scholar 

  10. Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)

    Google Scholar 

  11. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 157–166 (1994)

    Article  Google Scholar 

  12. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)

    Google Scholar 

  13. Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  14. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  15. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 (2016)

    Google Scholar 

  16. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  17. Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513. Association for Computational Linguistics (2010)

    Google Scholar 

  18. Dinarelli, M., Rosset, S.: Models cascade for tree-structured named entity detection. In: Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), Chiang Mai, Thailand (2011)

    Google Scholar 

  19. Dinarelli, M., Tellier, I.: Improving recurrent neural networks for sequence labelling. CoRR abs/1606.02555 (2016)

    Google Scholar 

  20. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)

    Google Scholar 

  21. Dinarelli, M., Tellier, I.: New recurrent neural network variants for sequence labeling. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 155–173. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_10

    Chapter  Google Scholar 

  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, pp. 173–180. Association for Computational Linguistics, Morristown (2003)

    Google Scholar 

  24. Shen, L., Satta, G., Joshi, A.: Guided learning for bidirectional sequence classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 760–767. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  25. De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)

    Article  Google Scholar 

  26. Dahl, D.A., et al.: Expanding the scope of the ATIS task: The ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, pp. 43–48. Association for Computational Linguistics, Stroudsburg (1994)

    Google Scholar 

  27. Bonneau-Maynard, H., et al.: Results of the French EVALDA-MEDIA evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)

    Google Scholar 

  28. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  29. Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750. Association for Computational Linguistics, Doha (2014)

    Google Scholar 

  30. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)

    Google Scholar 

  31. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)

    Google Scholar 

  32. Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)

    Article  Google Scholar 

  33. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. CoRR abs/1511.08308 (2015)

    Google Scholar 

  34. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)

    Article  Google Scholar 

  35. Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Antwerp, Belgium, pp. 1605–1608 (2007)

    Google Scholar 

  36. Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech. Lang. Process. (2015)

    Google Scholar 

  37. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)

    Google Scholar 

  38. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  39. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, pp. 1026–1034, 7–13 December 2015

    Google Scholar 

  40. Dinarelli, M., Tellier, I.: Etude des reseaux de neurones recurrents pour etiquetage de sequences. In: Actes de la 23eme conférence sur le Traitement Automatique des Langues Naturelles, Paris, France, Association pour le Traitement Automatique des Langues (2016)

    Google Scholar 

  41. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19, 313–330 (1993)

    Google Scholar 

  42. Vukotic, V., Raymond, C., Gravier, G.: A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding. In: Interspeech, San Francisco, United States (2016)

    Google Scholar 

  43. Dinarelli, M., Moschitti, A., Riccardi, G.: Discriminative reranking for spoken language understanding. IEEE Trans. Audio Speech Lang. Process. (TASLP) 20, 526–539 (2011)

    Google Scholar 

  44. Dinarelli, M., Rosset, S.: Hypotheses selection criteria in a reranking framework for spoken language understanding. In: Conference of Empirical Methods for Natural Language Processing, Edinburgh, U.K., pp. 1104–1115 (2011)

    Google Scholar 

  45. Hahn, S., et al.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. (TASLP) 99 (2010)

    Google Scholar 

  46. Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. MIT Press (2000)

    Google Scholar 

  47. Hahn, S., Lehnen, P., Heigold, G., Ney, H.: Optimizing CRFs for SLU tasks in various languages using modified training criteria. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Brighton, U.K. (2009)

    Google Scholar 

  48. Fiscus, J.G.: A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In: Proceedings 1997 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, CA, pp. 347–352 (1997)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the French ANR project Democrat ANR-15-CE38-0008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoann Dupont .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dupont, Y., Dinarelli, M., Tellier, I. (2018). Label-Dependencies Aware Recurrent Neural Networks. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics