Abstract
Some of the most widely-known methods to obtain Stochastic Context-Free Grammars (SCFGs) are based on estimation algorithms. All of these algorithms maximize a certain criterion function from a training sample by using gradient descendent techniques. In this optimization process, the obtaining of the initial SCFGs is an important factor, given that it affects the convergence process and the maximum which can be achieved. Here, we show experimentally how the results can be improved in cases when structural information about the task is inductively incorporated into the initial SCFGs. In this work, we present a stochastic version of the well-known Sakakibara algorithm in order to learn these initial SCFGs. Finally, an experimental study on part of the Wall Street Journal corpus was carried out.
This work has been partially supported by the European Union under contract EUTRANS (ESPRIT LTR-30268) and by the Spanish CICYT under contract (TIC98/0423-C06).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amaya, F., Benedí, J.M., Sánchez, J.A.: Learning of stochastic context-free grammars from bracketed corpora by means of reestimation algorithms. In: Torres, M.I., Sanfeliu, A. (eds.) Proc. VIII Spanish Symposium on Pattern Recognition and Image Analysis, Bilbao, España, May 1999, pp. 119–126, AERFAI (1999)
Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of markov processes. Inequalities 3, 1–8 (1972)
Benedí, J.M., Sánchez, J.A.: Stochastic context-free grammars in general form to chomsky normal form. Technical Report DSIC-II/13/00, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (2000)
Chen, S.F.: Bayesian Grammar Induction for Language Modeling. Ph.d. dissertation, Harvard University (1996)
Jelinek, F., Lafferty, J.D.: Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics 17(3), 315–323 (1991)
Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer, Speech and Language 4, 35–56 (1990)
Mäkinen, E.: On the structural grammatical inference problem for some classes of context-free grammars. Information Processing Letters (42), 1–5 (1992)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19(2), 313–330 (1993)
Ney, H.: Stochastic grammars and pattern recognition. In: Laface, P., De Mori, R. (eds.) Speech Recognition and Understanding. Recent Advances, pp. 319–344. Springer, Heidelberg (1992)
Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pp. 128–135, University of Delaware (1992)
Rosenfeld, R.: The cmu statistical language modeling toolkit and its use in the 1994 arpa csr evaluation. In: ARPA Spoken Language Technology Workshop, Austin, Texas, USA (1995)
Sakakibara, Y.: Efficient learning of context-free grammars from positive structural examples. Information and Computation 97, 23–60 (1992)
Sánchez, J.A., Benedí, J.M.: Consistency of stochastic context-free grammars from probabilistic estimation based on growth transformation. IEEE Trans. Pattern Analysis and Machine Intelligence 19(9), 1052–1055 (1997)
Sánchez, J.A., Benedí, J.M.: Estimation of the probability distributions of stochastic context-free grammars from the k-best derivations. In: 5th International Conference on Spoken Language Processing, Sidney, Australia, pp. 2495–2498 (1998)
Sánchez, J.A., Benedí, J.M.: Learning of stochastic context-free grammars by means of estimation algorithms. In: Proc. EUROSPEECH 1999, Budapest, Hungary, vol. 4, pp. 1799–1802 (1999)
Sánchez, J.A., Benedí, J.M., Casacuberta, F.: Comparison between the inside outside algorithm and the viterbi algorithm for stochastic context-free grammars. In: Perner, P., Wang, P., Rosenfeld, A. (eds.) Advances in Structural and Syntactical Pattern Recognition, pp. 50–59. Springer, Heidelberg (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nevado, F., Sánchez, JA., Benedí, JM. (2000). Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars. In: Oliveira, A.L. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2000. Lecture Notes in Computer Science(), vol 1891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45257-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-45257-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41011-9
Online ISBN: 978-3-540-45257-7
eBook Packages: Springer Book Archive