Abstract
We study the problem of computing the probability that a given stochastic context-free grammar (SCFG), G, generates a string in a given regular language L(D) (given by a DFA, D). This basic problem has a number of applications in statistical natural language processing, and it is also a key necessary step towards quantitative ω-regular model checking of stochastic context-free processes (equivalently, 1-exit recursive Markov chains, or stateless probabilistic pushdown processes).
We show that the probability that G generates a string in L(D) can be computed to within arbitrary desired precision in polynomial time (in the standard Turing model of computation), under a rather mild assumption about the SCFG, G, and with no extra assumption about D. We show that this assumption is satisfied for SCFG’s whose rule probabilities are learned via the well-known inside-outside (EM) algorithm for maximum-likelihood estimation (a standard method for constructing SCFGs in statistical NLP and biological sequence analysis). Thus, for these SCFGs the algorithm always runs in P-time.
The full version of this paper is available at arxiv.org/abs/1302.6411. Research partially supported by the Royal Society and by NSF Grant CCF-1017955.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Corazza, A., De Mori, R., Gretter, D., Satta, G.: Computation of probabilities for an island-driven parser. IEEE Trans. PAMI 13(9), 936–950 (1991)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic models of Proteins and Nucleic Acids. Cambridge U. Press (1999)
Esparza, J., Gaiser, A., Kiefer, S.: Computing least fixed points of probabilistic systems of polynomials. In: Proc. 27th STACS, pp. 359–370 (2010)
Esparza, J., Kiefer, S., Luttenberger, M.: Computing the least fixed point of positive polynomial systems. SIAM J. on Computing 39(6), 2282–2355 (2010)
Esparza, J., Kučera, A., Mayr, R.: Model checking probabilistic pushdown automata. Logical Methods in Computer Science 2(1), 1–31 (2006)
Etessami, K., Stewart, A., Yannakakis, M.: Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial Bellman equations. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 314–326. Springer, Heidelberg (2012); See full version at ArXiv:1202.4798
Etessami, K., Stewart, A., Yannakakis, M.: Polynomial-time algorithms for multi-type branching processes and stochastic context-free grammars. In: Proc. 44th ACM STOC, Full version is available at ArXiv:1201.2374 (2012)
Etessami, K., Stewart, A., Yannakakis, M.: Stochastic Context-Free Grammars, Regular Languages, and Newton’s method, Full preprint of this paper: ArXiv:1302.6411 (2013)
Etessami, K., Yannakakis, M.: Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. Journal of the ACM 56(1) (2009)
Etessami, K., Yannakakis, M.: Model checking of recursive probabilistic systems. ACM Trans. Comput. Log. 13(2), 12 (2012)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge U. Press (1985)
Jelinek, F., Lafferty, J.D.: Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics 17(3), 315–323 (1991)
Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31, 3423–3428 (2003)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (1999)
Nederhof, M.-J., Satta, G.: Estimation of consistent probabilistic context-free grammars. In: HLT-NAACL (2006)
Nederhof, M.-J., Satta, G.: Computing partition functions of PCFGs. Research on Language and Computation 6(2), 139–162 (2008)
Nederhof, M.-J., Satta, G.: Probabilistic parsing. New Developments in Formal Languages and Applications 113, 229–258 (2008)
Nederhof, M.-J., Satta, G.: Computation of infix probabilities for probabilistic context-free grammars. In: EMNLP, pp. 1213–1221 (2011)
Sánchez, J., Benedí, J.-M.: Consistency of stochastic context-free grammars from probabilistic estimation based on growth transformations. IEEE Trans. Pattern Anal. Mach. Intell. 19(9), 1052–1055 (1997)
Stewart, A., Etessami, K., Yannakakis, M.: Upper bounds for Newton’s method on monotone polynomial systems, and P-time model checking of probabilistic one-counter automata, Arxiv:1302.3741 (2013) (conference version to appear in CAV 2013)
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 167–201 (1995)
Wojtczak, D., Etessami, K.: Premo: an analyzer for probabilistic recursive models. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 66–71. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Etessami, K., Stewart, A., Yannakakis, M. (2013). Stochastic Context-Free Grammars, Regular Languages, and Newton’s Method. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds) Automata, Languages, and Programming. ICALP 2013. Lecture Notes in Computer Science, vol 7966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39212-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-39212-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39211-5
Online ISBN: 978-3-642-39212-2
eBook Packages: Computer ScienceComputer Science (R0)