On the Use of the Leaving-One-Out Method in Statistical Language Modelling

Kneser, Reinhard; Essen, Ute; Ney, Hermann

doi:10.1007/978-3-642-57745-1_35

On the Use of the Leaving-One-Out Method in Statistical Language Modelling

Reinhard Kneser²,
Ute Essen² &
Hermann Ney³

Conference paper

229 Accesses

Part of the book series: NATO ASI Series ((NATO ASI F,volume 147))

Abstract

The probability estimates in stochastic language modelling often depend on some additional parameters apart from the training data. These parameters are typically related to the probabilities of events not seen in the training data and conventional maximum-likelihood methods therefore fail to determine them. We present a special form of cross validation, the leaving-one-out concept, to solve this problem. The application of this technique to several different modelling approaches reveals its flexibility and in some cases the simple way of computation. Experiments, performed on an English corpus of 1.1 million words, show the good generalization capability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. O. Duda, P. E. Hart: Pattern Classification and Scene Analysis, Wiley, New York, 1973.
MATH Google Scholar
I J. Good: “The population frequencies of species and the estimation of population parameters”, Biometrika 40, pp. 237–264, Dec. 1953.
MathSciNet MATH Google Scholar
A. Nadas: “On Turing’s formula for word probabilities”, IEEE Trans, on Acoustics, Speech and Signal Proc, Vol. ASSP-33, pp.1414–1416, Dec. 1985.
Article Google Scholar
S.M. Katz: “Estimation of probabilities from sparse data for the language model component of a speech recognizer”, IEEE Trans, on Acoustics, Speech and Signal Proc, Vol. SSP-35, pp. 400–401, March 1987.
Article Google Scholar
H. Ney, U. Essen: “On smoothing techniques for bigram-based natural language modelling”, Proc. ICASSP, Vol. 2, pp. 825–828, May 1991.
Google Scholar
F. Jelinek, R.L. Mercer: “Interpolated estimation of Markov source parameters from sarse data”, pp. 381–397, in E.S. Gelsema, L.N. Kanal (eds.): Pattern Recognition in Practice, North-Holland Publ. Company, Amsterdam, 1980.
Google Scholar
R. Kneser, H. Ney: “Improved clustering techniques for class-based statistical language modelling”, Proc. Eurospeech, Vol. 2, pp. 973–976 Sept. 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Philips GmbH Forschungslaboratorien, Aachen, Germany
Reinhard Kneser & Ute Essen
Lehrstuhl für Informatik, RWTH Aachen, Germany
Hermann Ney

Authors

Reinhard Kneser
View author publications
You can also search for this author in PubMed Google Scholar
Ute Essen
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Technology of Computers Faculty of Sciences, University of Granada, E-18071, Granada, Spain
Antonio J. Rubio Ayuso & Juan M. López Soler &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kneser, R., Essen, U., Ney, H. (1995). On the Use of the Leaving-One-Out Method in Statistical Language Modelling. In: Ayuso, A.J.R., Soler, J.M.L. (eds) Speech Recognition and Coding. NATO ASI Series, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-57745-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-57745-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63344-7
Online ISBN: 978-3-642-57745-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics