Label-Dependencies Aware Recurrent Neural Networks

Dupont, Yoann; Dinarelli, Marco; Tellier, Isabelle

doi:10.1007/978-3-319-77113-7_4

Yoann Dupont¹⁴,
Marco Dinarelli¹⁴ &
Isabelle Tellier¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

871 Accesses
1 Citations

Abstract

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model sequence labeling is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are reinjected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(h_*\) means the hidden layer of any model, as the output layer is computed in the same way for all networks described in this paper.
2.
In the literature \(\varPhi \) and \(\varGamma \) are the sigmoid and tanh, respectively.
3.
The one-hot representation of a token represented by an index i in a dictionary, is a vector v of the same size as the dictionary and assigned zero everywhere, except at position i where it is 1.
4.
In our case, \(y_i\) is explicitely converted from probability distribution to one-hot representation.
5.
Indeed we observed better performances when using a word window with respect to when using a single word.
6.
Available at http://deeplearning.net/tutorial/rnnslu.html.
7.
For example the component localization can be combined with other components like city, relative-distance, generic-relative-location, street etc.
8.
https://www.gnu.org/software/octave/; Our code is described at http://www.marcodinarelli.it/software.php and available upon request.
9.
http://www.openblas.net; This library allows a speed-up of roughly \(330\times \) on a single matrix-matrix multiplication using 16 cores. This is very attractive with respect to the speed-up of \(380\times \) that can be reached with a GPU, keeping into account that both Octave and OpenBLAS are available for free.
10.
This is a publication in French, but results in the tables are easy to understand and directly comparable to our results.
11.
We did not run further experiments because without a GPU, experiments on the Penn Treebank are still quite expensive.
12.
The errors made by the system are classified as Insertions (I), Deletions (D) and Substitutions (S). The sum of these errors is divided by the number of concepts in the reference annotation (R): \(CER = \frac{I + D + S}{R}\).

References

Jordan, M.I.: Serial order: A parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale, NJ (1989)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)
Google Scholar
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)
Google Scholar
Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 157–166 (1994)
Article Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513. Association for Computational Linguistics (2010)
Google Scholar
Dinarelli, M., Rosset, S.: Models cascade for tree-structured named entity detection. In: Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), Chiang Mai, Thailand (2011)
Google Scholar
Dinarelli, M., Tellier, I.: Improving recurrent neural networks for sequence labelling. CoRR abs/1606.02555 (2016)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)
Google Scholar
Dinarelli, M., Tellier, I.: New recurrent neural network variants for sequence labeling. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 155–173. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_10
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, pp. 173–180. Association for Computational Linguistics, Morristown (2003)
Google Scholar
Shen, L., Satta, G., Joshi, A.: Guided learning for bidirectional sequence classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 760–767. Association for Computational Linguistics, Prague (2007)
Google Scholar
De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)
Article Google Scholar
Dahl, D.A., et al.: Expanding the scope of the ATIS task: The ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, pp. 43–48. Association for Computational Linguistics, Stroudsburg (1994)
Google Scholar
Bonneau-Maynard, H., et al.: Results of the French EVALDA-MEDIA evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750. Association for Computational Linguistics, Doha (2014)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)
Google Scholar
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)
Google Scholar
Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Article Google Scholar
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. CoRR abs/1511.08308 (2015)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)
Article Google Scholar
Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Antwerp, Belgium, pp. 1605–1608 (2007)
Google Scholar
Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech. Lang. Process. (2015)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, pp. 1026–1034, 7–13 December 2015
Google Scholar
Dinarelli, M., Tellier, I.: Etude des reseaux de neurones recurrents pour etiquetage de sequences. In: Actes de la 23eme conférence sur le Traitement Automatique des Langues Naturelles, Paris, France, Association pour le Traitement Automatique des Langues (2016)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19, 313–330 (1993)
Google Scholar
Vukotic, V., Raymond, C., Gravier, G.: A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding. In: Interspeech, San Francisco, United States (2016)
Google Scholar
Dinarelli, M., Moschitti, A., Riccardi, G.: Discriminative reranking for spoken language understanding. IEEE Trans. Audio Speech Lang. Process. (TASLP) 20, 526–539 (2011)
Google Scholar
Dinarelli, M., Rosset, S.: Hypotheses selection criteria in a reranking framework for spoken language understanding. In: Conference of Empirical Methods for Natural Language Processing, Edinburgh, U.K., pp. 1104–1115 (2011)
Google Scholar
Hahn, S., et al.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. (TASLP) 99 (2010)
Google Scholar
Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. MIT Press (2000)
Google Scholar
Hahn, S., Lehnen, P., Heigold, G., Ney, H.: Optimizing CRFs for SLU tasks in various languages using modified training criteria. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Brighton, U.K. (2009)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In: Proceedings 1997 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, CA, pp. 347–352 (1997)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the French ANR project Democrat ANR-15-CE38-0008.

Author information

Authors and Affiliations

LaTTiCe (UMR 8094), CNRS, ENS Paris, Université Sorbonne Nouvelle - Paris 3 PSL Research University, USPC (Université Sorbonne Paris Cité), 1 rue Maurice Arnoux, 92120, Montrouge, France
Yoann Dupont, Marco Dinarelli & Isabelle Tellier

Authors

Yoann Dupont
View author publications
You can also search for this author in PubMed Google Scholar
Marco Dinarelli
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Tellier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoann Dupont .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dupont, Y., Dinarelli, M., Tellier, I. (2018). Label-Dependencies Aware Recurrent Neural Networks. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_4
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics