Abstract
The maximum entropy method has recently been successfully introduced to a variety of natural language applications. In each of these applications, however, the power of the maximum entropy method is achieved at the cost of a considerable increase in computational requirements. In this paper we present a technique, closely related to the classical cluster expansion from statistical mechanics, for reducing the computational demands necessary to calculate conditional maximum entropy language models.
Research supported in part by NSF and ARPA under grant IRI-9314969 and the ATR Interpreting Telecommunications Research Laboratories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. D. Pietra, V. D. Pietra, and J. Lafferty, “Inducing features of random fields,” Tech. rep., CMU-CS-95-144, Department of Computer Science, Carnegie Mellon University, 1995.
J. Darroch and D. RatclifF, “Generalized iterative scaling for log-linear models,” Ann. Math. Statistics, 43, pp. 1470–1480, 1972.
I. Csiszár, “A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling,” The Annals of Statistics, 17,(3), pp. 1409–1413, 1989.
L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-5,(2), pp. 179–190, 1983.
P. Brown, J. Cocke, S. D. Pietra, V. D. Pietra, F. Jelinek, J. Lafferty, R. Mercer, and P. Roosin, “A statistical approach to machine translation,” Computational Linguistics, 16, pp. 79–85, 1990.
E. T. Jaynes, Papers on Probability, Statistics, and Statistical Physics, D. Reidel Publishing, Dordrecht-Holland, 1983.
A. Berger, S. D. Pietra, and V. D. Pietra, “A maximum entropy approach to natural language processing,” Computational Linguistics, to appear, 1995.
R. Lau, R. Rosenfeld, and S. Roukos, “Adaptive language modeling using the maximum entropy principle,” in Proceedings of the ARPA Human Language Technology Workshop, pp. 108–113, Morgan Kaufman Publishers, 1993.
R. P. Feynman, Statistical Mechanics: A Set of Lectures, W. A. Benjamin, Reading, MA, 1972.
P. C. Cheeseman, “A method for computing generalized Bayesian probability values for expert systems,” in Proc. Eighth International Conference on Artificial Intelligence, pp. 198–202, 1983.
S. A. Goldman, “Efficient methods for calculating maximum entropy distributions,” Tech. rep., MIT Department of Electrical Engineering and Computer Science (Masters thesis), 1987.
J. Godfrey, E. Holliman, and M. McDaniel, “Switchboard: Telephone speech corpus for research development,” in Proc. ICASSP-92, pp. I–517–520, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Lafferty, J.D., Suhm, B. (1996). Cluster Expansions and Iterative Scaling for Maximum Entropy Language Models. In: Hanson, K.M., Silver, R.N. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 79. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5430-7_23
Download citation
DOI: https://doi.org/10.1007/978-94-011-5430-7_23
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6284-8
Online ISBN: 978-94-011-5430-7
eBook Packages: Springer Book Archive