New Recurrent Neural Network Variants for Sequence Labeling

Dinarelli, Marco; Tellier, Isabelle

doi:10.1007/978-3-319-75477-2_10

Marco Dinarelli^14,15,16,17 &
Isabelle Tellier^14,15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1378 Accesses
1 Citations

Abstract

In this paper we study different architectures of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNN and we compare them to the more traditional RNN architectures of Elman and Jordan. We explain in details the advantages of these new variants of RNNs with respect to Elman’s and Jordan’s RNN. We evaluate all models, either new or traditional, on three different tasks: POS-tagging of the French Treebank, and two tasks of Spoken Language Understanding (SLU), namely ATIS and MEDIA. The results we obtain clearly show that the new variants of RNN are more effective than the traditional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The “one-hot” representation of an element at position i in a dictionary V is a vector of size |V| where the i-th component has the value 1 while all the others are 0.
2.
Given input x, using a sigmoid the output value is computed as \(f(x) = \frac{1}{1 + \mathrm {e}^{-x}}\).
3.
Using a set of values expressing a numerical preference \(l \in L\), the higher the better, the softmax is computed as \(g(l) = \frac{\mathrm {e}^l}{\sum _{l \in L} \mathrm {e}^l}\).
4.
Sometimes in POS-tagging, models mistake verbs and nouns, which may seem to have different embeddings. However in these cases, the models make such errors because these particular verbs occur in the same context of nouns (e.g. “the sleep is important”), and so they have similar representations as nouns.
5.
The word input context is thus made of w words on the left and w on the right of the word at a given position t, plus the word at t itself, which gives a total of \(2 w + 1\) input words.
6.
We use embeddings of the same size D for words and labels.
7.
For example the label localisation can be combined with city, relative-distance, general-relative-place, street etc.
8.
For example the cities of Boston and Philadelphia in the example above are mapped to the class CITY-NAME. If a model has never seen Boston during the training phase, but it has seen at least one city name, it will possibly annotate Boston as a departure city thanks to some discriminative context, such as the preposition from.
9.
\(f(x) = max(0,x)\).
10.
Our implementations are basically written in Octave https://www.gnu.org/software/octave/.

References

Jordan, M.I.: Serial order: a parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale (1989)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)
Google Scholar
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)
Google Scholar
Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)
Google Scholar
Xu, W., Auli, M., Clark, S.: CCG supertagging with a recurrent neural network. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Short Papers, ACL 2015, Beijing, China, 26–31 July 2015, vol. 2, pp. 250–255 (2015)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Zennaki, O., Semmar, N., Besacier, L.: Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 29, Shanghai, China, 30 October–1 November 2015
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)
Google Scholar
Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 165–188. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_10
Chapter Google Scholar
Denis, P., Sagot, B.: Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging. Lang. Resour. Eval. 46, 721–736 (2012)
Article Google Scholar
De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)
Article Google Scholar
Dahl, D.A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., Pao, C., Rudnicky, A., Shriberg, E.: Expanding the scope of the ATIS task: the ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, Stroudsburg, PA, USA, pp. 43–48. Association for Computational Linguistics (1994)
Google Scholar
Bonneau-Maynard, H., Ayache, C., Bechet, F., Denis, A., Kuhn, A., Lefèvre, F., Mostefa, D., Qugnard, M., Rosset, S., Servan, S., Vilaneau, J.: Results of the French Evalda-Media evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)
Article Google Scholar
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)
Google Scholar
Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Article Google Scholar
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750. Association for Computational Linguistics (2014)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)
Google Scholar
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D., Zweig, G.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LaTTiCe (UMR 8094), CNRS, ENS Paris, 1 rue Maurice Arnoux, 92120, Montrouge, France
Marco Dinarelli & Isabelle Tellier
Université Sorbonne Nouvelle - Paris 3, Paris, France
Marco Dinarelli & Isabelle Tellier
PSL Research University, Paris, France
Marco Dinarelli & Isabelle Tellier
USPC (Université Sorbonne Paris Cité), Paris, France
Marco Dinarelli & Isabelle Tellier

Authors

Marco Dinarelli
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Tellier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Marco Dinarelli or Isabelle Tellier .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dinarelli, M., Tellier, I. (2018). New Recurrent Neural Network Variants for Sequence Labeling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_10
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics