Abstract
In statistical language modelling the classic model used is n-gram. This model is not able however to capture long term dependencies, i.e. dependencies larger than n. An alternative to this model is the probabilistic automaton. Unfortunately, it appears that preliminary experiments on the use of this model in language modelling is not yet competitive, partly because it tries to model too long term dependencies. We propose here to improve the use of this model by restricting the dependency to a more reasonable value. Experiments shows an improvement of 45% reduction in the perplexity obtained on the Wall Street Journal language modeling task.
This work was supported by the BINGO2 project (ANR-07-MDCO 014-02).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Callut, J., Dupont, P.: Learning partially observable markov models from first passage times. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 91–103. Springer, Heidelberg (2007)
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Second ICGI, pp. 139–152 (1994)
Charniak, E.: Immediate-head parsing for language models. In: 10th Conf. of the Association for Computational linguistic, ACL 2001 (2001)
Chen, S.F.: Building Probabilistic Models for natural Language. PhD thesis, Harvard University Cambridge Massachusetts (May 1996)
Daciuk, J., van Noord, G.: Finite automata for compact representation of language models in NLP. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 65–73. Springer, Heidelberg (2003)
Dupont, P., Amengual, J.C.: Smoothing probabilistic automata: an error-correcting approach. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS, vol. 1891, pp. 51–64. Springer, Heidelberg (2000)
Goodman, J.: A bit of progress in language modeling. Technical report, Microsoft Research (2001)
Hirschman, L.: Multi-site data collection for a spoken language corpus. In: DARPA Speech and Natural Language Workshop, pp. 7–14 (1992)
Kenneth, C., Ted, H., Jianfeng, G.: Compressing trigram language models with Golomb coding. In: Joint EMNLP-CoNLL, pp. 199–207 (2007)
Kermorvant, C., Dupont, P.: Stochastic grammatical inference with multinomial tests. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 149–160. Springer, Heidelberg (2002)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Intl. Conf. on Acoustic, Speech and Signal Processing, pp. 181–184 (1995)
Marcus, M., Santorini, S., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn treebank. Computational Linguistics 19(2), 313–330 (1993)
McAllester, D., Shapire, R.: On the convergence rate of the good-turing estimators. In: Conf. on Computational Learning Theory, pp. 1–66 (2000)
Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: Proceedings of COLT 1995, pp. 31–40 (1995)
Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning kneser-ney smoothed n-gram models. IEEE Transactions on Audio, Speech and Language Processing 15(5), 1617–1624 (2007)
Stolcke, A.: Entropy-based pruning of backoff language models. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 270–274 (1998)
Thollard, F.: Improving probabilistic grammatical inference core algorithms with post-processing techniques. In: ICML 2001, pp. 561–568. Morgan Kaufmann, San Francisco (2001)
Thollard, F., Clark, A.: Shallow parsing using probabilistic grammatical inference. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 269–282. Springer, Heidelberg (2002)
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: ICML (2000)
Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines – Part I and II. PAMI 27(7) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zdziobeck, A., Thollard, F. (2008). Position Models and Language Modeling. In: da Vitoria Lobo, N., et al. Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2008. Lecture Notes in Computer Science, vol 5342. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89689-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-89689-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89688-3
Online ISBN: 978-3-540-89689-0
eBook Packages: Computer ScienceComputer Science (R0)