Continuous Distributed Representations of Words as Input of LSTM Network Language Model

Soutner, Daniel; Müller, Luděk

doi:10.1007/978-3-319-10816-2_19

Continuous Distributed Representations of Words as Input of LSTM Network Language Model

Daniel Soutner²¹ &
Luděk Müller²¹

Conference paper

1598 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Abstract

The continuous skip-gram model is an efficient algorithm for learning quality distributed vector representations that are able to capture a large number of syntactic and semantic word relationships. Artificial neural networks have become the state-of-the-art in the task of language modelling whereas Long-Short Term Memory (LSTM) networks seem to be efficient training algorithm.

In this paper, we carry out experiments with a combination of these powerful models: the continuous distributed representations of words are trained with skip-gram method on a big corpora and are used as the input of LSTM language model instead of traditional 1-of-N coding. The possibilities of this approach are shown in experiments on perplexity with Wikipedia and Penn Treebank corpus.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Černocký, J.: RNNLM - Recurrent Neural Network Language Modeling Toolkit (2011)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM Neural Networks for Language Modeling. In: INTERSPEECH 2012 (2012)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long Short-term Memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. arXiv:1308.0850 (cs.NE) (2013)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 641–648. ACM, New York (2007)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of NAACL HLT (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of NIPS (2013)
Google Scholar
Soutner, D., Müller, L.: Application of LSTM Neural Networks in Language Modelling. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 105–112. Springer, Heidelberg (2013)
Google Scholar
Charniak, E., et al.: BLLIP 1987-89 WSJ Corpus Release 1, Linguistic Data Consortium, Philadelphia (2000)
Google Scholar
Wikimedia Foundation: Wikipedia, The Free Encyclopedia (2009), http://en.wikipedia.org
Garofalo, J., et al.: CSR-I (WSJ0) Complete, Linguistic Data Consortium, Philadelphia (2007)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for M-gram language modeling. Acoustics, Speech, and Signal Processing 1, 181 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, New Technologies for the Information Society, University of West Bohemia, Univerzitní 22, 306 14, Plzeň, Czech Republic
Daniel Soutner & Luděk Müller

Authors

Daniel Soutner
View author publications
You can also search for this author in PubMed Google Scholar
Luděk Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soutner, D., Müller, L. (2014). Continuous Distributed Representations of Words as Input of LSTM Network Language Model. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics