Abstract
In this paper we study different architectures of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNN and we compare them to the more traditional RNN architectures of Elman and Jordan. We explain in details the advantages of these new variants of RNNs with respect to Elman’s and Jordan’s RNN. We evaluate all models, either new or traditional, on three different tasks: POS-tagging of the French Treebank, and two tasks of Spoken Language Understanding (SLU), namely ATIS and MEDIA. The results we obtain clearly show that the new variants of RNN are more effective than the traditional ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The “one-hot” representation of an element at position i in a dictionary V is a vector of size |V| where the i-th component has the value 1 while all the others are 0.
- 2.
Given input x, using a sigmoid the output value is computed as \(f(x) = \frac{1}{1 + \mathrm {e}^{-x}}\).
- 3.
Using a set of values expressing a numerical preference \(l \in L\), the higher the better, the softmax is computed as \(g(l) = \frac{\mathrm {e}^l}{\sum _{l \in L} \mathrm {e}^l}\).
- 4.
Sometimes in POS-tagging, models mistake verbs and nouns, which may seem to have different embeddings. However in these cases, the models make such errors because these particular verbs occur in the same context of nouns (e.g. “the sleep is important”), and so they have similar representations as nouns.
- 5.
The word input context is thus made of w words on the left and w on the right of the word at a given position t, plus the word at t itself, which gives a total of \(2 w + 1\) input words.
- 6.
We use embeddings of the same size D for words and labels.
- 7.
For example the label localisation can be combined with city, relative-distance, general-relative-place, street etc.
- 8.
For example the cities of Boston and Philadelphia in the example above are mapped to the class CITY-NAME. If a model has never seen Boston during the training phase, but it has seen at least one city name, it will possibly annotate Boston as a departure city thanks to some discriminative context, such as the preposition from.
- 9.
\(f(x) = max(0,x)\).
- 10.
Our implementations are basically written in Octave https://www.gnu.org/software/octave/.
References
Jordan, M.I.: Serial order: a parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale (1989)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)
Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)
Xu, W., Auli, M., Clark, S.: CCG supertagging with a recurrent neural network. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Short Papers, ACL 2015, Beijing, China, 26–31 July 2015, vol. 2, pp. 250–255 (2015)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010)
Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)
Zennaki, O., Semmar, N., Besacier, L.: Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 29, Shanghai, China, 30 October–1 November 2015
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)
Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 165–188. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_10
Denis, P., Sagot, B.: Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging. Lang. Resour. Eval. 46, 721–736 (2012)
De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)
Dahl, D.A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., Pao, C., Rudnicky, A., Shriberg, E.: Expanding the scope of the ATIS task: the ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, Stroudsburg, PA, USA, pp. 43–48. Association for Computational Linguistics (1994)
Bonneau-Maynard, H., Ayache, C., Bechet, F., Denis, A., Kuhn, A., Lefèvre, F., Mostefa, D., Qugnard, M., Rosset, S., Servan, S., Vilaneau, J.: Results of the French Evalda-Media evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)
Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750. Association for Computational Linguistics (2014)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D., Zweig, G.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Dinarelli, M., Tellier, I. (2018). New Recurrent Neural Network Variants for Sequence Labeling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)