Abstract
Stochastic language models incorporating both n-grams and context-free grammars are proposed. A constrained context-free model specified by a stochastic context-free prior distribution with superimposed n-gram frequency constraints is derived and the resulting maximum-entropy distribution is shown to induce a Markov random field with neighborhood structure at the leaves determined by the relative n-gram frequencies. A computationally efficient version, the mixed tree/chain graph model, is derived with identical neighborhood structure. In this model, a word-tree derivation is given by a stochastic context-free prior on trees down to the preterminal (part-of-speech) level and word attachment is made by a nonstationary Markov chain. Using the Penn TreeBank, a comparison of the mixed tree/chain graph model to both the n-gram and context-free models is performed using entropy measures. The model entropy of the mixed tree/chain graph model is shown to reduce the entropy of both the bigram and context-free models.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
L. R. Bahl, F. Jelinek, and R. L. Mercer, A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2):179–190, 1983.
T. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991.
U. Grenander. Probability measures for context-free languages. Res. rep. in pattern theory, Division of Applied Mathematics, Brown University, Providence, RI, 1967.
T. E. Harris. The Theory of Branching Processes, Springer-Verlag, Berlin-Gottingen-Heidelberg, 1963.
F. Jelinek and R. L. Mercer. Interpolated estimation of Markov source parameters from sparse data. In Proceedings, Workshop on Pattern Recognition in Practice, North-Holland Pub. Co., pages 381–397, Amsterdam, The Netherlands, 1980.
J. Kupiec. A trellis-based algorithm for estimating the parameters of a hidden stochastic context-free grammar. In DARPA Speech and Natural Language Workshop, Asilomar, CA, February 1991.
M. Marcus, B. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of English: The Penn TreeBank. Computational Linguistics, 19(2):313–330, June 1993.
K. E. Mark, M. I. Miller, U. Grenander, and S. Abney. Parameter estimation for constrained context-free language models. In DARPA Speech and Natural Language Workshop, Harriman, NY, February 1992.
M. I. Miller and J. A. O’Sullivan. Entropies and combinatorics of random branching processes and context-free languages. IEEE Transactions on Information Theory, 38(4):1292–1310, July 1992.
C. Shannon. The mathematical theory of communication. Bell System Technical Journal, 27:398–403, 1948.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Mark, K.E., Miller, M.I., Grenander, U. (1996). Constrained Stochastic Language Models. In: Levinson, S.E., Shepp, L. (eds) Image Models (and their Speech Model Cousins). The IMA Volumes in Mathematics and its Applications, vol 80. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4056-3_7
Download citation
DOI: https://doi.org/10.1007/978-1-4612-4056-3_7
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-8482-6
Online ISBN: 978-1-4612-4056-3
eBook Packages: Springer Book Archive