Skip to main content

New Recurrent Neural Network Variants for Sequence Labeling

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

In this paper we study different architectures of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNN and we compare them to the more traditional RNN architectures of Elman and Jordan. We explain in details the advantages of these new variants of RNNs with respect to Elman’s and Jordan’s RNN. We evaluate all models, either new or traditional, on three different tasks: POS-tagging of the French Treebank, and two tasks of Spoken Language Understanding (SLU), namely ATIS and MEDIA. The results we obtain clearly show that the new variants of RNN are more effective than the traditional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The “one-hot” representation of an element at position i in a dictionary V is a vector of size |V| where the i-th component has the value 1 while all the others are 0.

  2. 2.

    Given input x, using a sigmoid the output value is computed as \(f(x) = \frac{1}{1 + \mathrm {e}^{-x}}\).

  3. 3.

    Using a set of values expressing a numerical preference \(l \in L\), the higher the better, the softmax is computed as \(g(l) = \frac{\mathrm {e}^l}{\sum _{l \in L} \mathrm {e}^l}\).

  4. 4.

    Sometimes in POS-tagging, models mistake verbs and nouns, which may seem to have different embeddings. However in these cases, the models make such errors because these particular verbs occur in the same context of nouns (e.g. “the sleep is important”), and so they have similar representations as nouns.

  5. 5.

    The word input context is thus made of w words on the left and w on the right of the word at a given position t, plus the word at t itself, which gives a total of \(2 w + 1\) input words.

  6. 6.

    We use embeddings of the same size D for words and labels.

  7. 7.

    For example the label localisation can be combined with city, relative-distance, general-relative-place, street etc.

  8. 8.

    For example the cities of Boston and Philadelphia in the example above are mapped to the class CITY-NAME. If a model has never seen Boston during the training phase, but it has seen at least one city name, it will possibly annotate Boston as a departure city thanks to some discriminative context, such as the preposition from.

  9. 9.

    \(f(x) = max(0,x)\).

  10. 10.

    Our implementations are basically written in Octave https://www.gnu.org/software/octave/.

References

  1. Jordan, M.I.: Serial order: a parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale (1989)

    Google Scholar 

  2. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)

    Article  Google Scholar 

  3. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)

    Google Scholar 

  4. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  5. Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)

    Google Scholar 

  6. Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)

    Google Scholar 

  7. Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)

    Google Scholar 

  8. Xu, W., Auli, M., Clark, S.: CCG supertagging with a recurrent neural network. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Short Papers, ACL 2015, Beijing, China, 26–31 July 2015, vol. 2, pp. 250–255 (2015)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)

    Google Scholar 

  10. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010)

    Google Scholar 

  11. Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)

    Google Scholar 

  12. Zennaki, O., Semmar, N., Besacier, L.: Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 29, Shanghai, China, 30 October–1 November 2015

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  14. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)

    Google Scholar 

  15. Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 165–188. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_10

    Chapter  Google Scholar 

  16. Denis, P., Sagot, B.: Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging. Lang. Resour. Eval. 46, 721–736 (2012)

    Article  Google Scholar 

  17. De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)

    Article  Google Scholar 

  18. Dahl, D.A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., Pao, C., Rudnicky, A., Shriberg, E.: Expanding the scope of the ATIS task: the ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, Stroudsburg, PA, USA, pp. 43–48. Association for Computational Linguistics (1994)

    Google Scholar 

  19. Bonneau-Maynard, H., Ayache, C., Bechet, F., Denis, A., Kuhn, A., Lefèvre, F., Mostefa, D., Qugnard, M., Rosset, S., Servan, S., Vilaneau, J.: Results of the French Evalda-Media evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)

    Google Scholar 

  20. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  21. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)

    Article  Google Scholar 

  22. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)

    Google Scholar 

  23. Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)

    Article  Google Scholar 

  24. Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750. Association for Computational Linguistics (2014)

    Google Scholar 

  25. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)

    Google Scholar 

  26. Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D., Zweig, G.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)

    Article  Google Scholar 

  27. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Marco Dinarelli or Isabelle Tellier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dinarelli, M., Tellier, I. (2018). New Recurrent Neural Network Variants for Sequence Labeling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics