Skip to main content

The Custom Decay Language Model for Long Range Dependencies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Abstract

Significant correlations between words can be observed over long distances, but contemporary language models like N-grams, Skip grams, and recurrent neural network language models (RNNLMs) require a large number of parameters to capture these dependencies, if the models can do so at all. In this paper, we propose the Custom Decay Language Model (CDLM), which captures long range correlations while maintaining sub-linear increase in parameters with vocabulary size. This model has a robust and stable training procedure (unlike RNNLMs), a more powerful modeling scheme than the Skip models, and a customizable representation. In perplexity experiments, CDLMs outperform the Skip models using fewer number of parameters. A CDLM also nominally outperformed a similar-sized RNNLM, meaning that it learned as much as the RNNLM but without recurrence.

D. Klakow—The work was supported by the Cluster of Excellence for Multimodal Computing and Interaction, the German Research Foundation (DFG) as part of SFB 1102 and the EU FP7 Metalogue project (grant agreement number: 611073).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Charniak, E.: Immediate-head parsing for language models. In: Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pp. 124–131, Toulouse, France, July 2001

    Google Scholar 

  2. Cheng, W.C., Kok, S., Pham, H.V., Chieu, H.L., Chai, K.M.A.: Language modeling with sum-product networks. In: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2098–2102, Singapore, September 2014

    Google Scholar 

  3. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  4. Goodman, J.T.: Classes for fast maximum entropy training. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 561–564. IEEE, Salt Lake City, May 2001

    Google Scholar 

  5. Graff, D., Cieri, C.: English gigaword LDC2003t05. Web Download. Linguistic Data Consortium, Philadelphia (2003)

    Google Scholar 

  6. Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1222–1225 (2006)

    Google Scholar 

  7. Klakow, D.: Log-linear interpolation of language models. In: Fifth International Conference on Spoken Language Processing (ICSLP), pp. 1695–1698 (1998)

    Google Scholar 

  8. Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(6), 570–583 (1990)

    Article  Google Scholar 

  9. Mikolov, T., Karafiát, M., Burget, L., Černocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048, Makuhari, Japan, September 2010

    Google Scholar 

  10. Mikolov, T., Kombrink, S., Burget, L., Černocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5528–5531, Prague, Czech Republic (2011). http://dx.doi.org/10.1109/ICASSp.2011.5947611

  11. Momtazi, S., Faubel, F., Klakow, D.: Within and across sentence boundary language model. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1800–1803, Makuhari, Japan, September 2010

    Google Scholar 

  12. Singh, M., Klakow, D.: Comparing RNNs and log-linear interpolation of improved skip-model on four babel languages: Cantonese, Pashto, Tagalog, Turkish. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8416–8420, May 2013

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mittul Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Singh, M., Greenberg, C., Klakow, D. (2016). The Custom Decay Language Model for Long Range Dependencies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics