Abstract
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node’s category, and (3) a model that estimates the probabilities according to a generalized k-gram scheme for trees with k = 3. The last model allows for faster parsing and decreases considerably the perplexity of test samples.
The authors wish to thank the Spanish CICyT for supporting this work through project TIC2000-1599.
Chapter PDF
Similar content being viewed by others
References
Ezra Black, Steven Abney, Dan Flickinger, Claudia Gdaniec, Ralph Grishman, Philip Harrison, Donald Hindle, Robert Ingria, Frederick Jelinek, Judith Klavans, Mark Liberman, Mitch Marcus, Salim Roukos, Beatrice Santorini, and Tomek Strzalkowski. A procedure for quantitatively comparing the syntatic coverage of english grammars. In Proc. Speech and Natural Language Workshop 1991, pages306–311, San Mateo, CA, 1991. Morgan Kauffmann.
Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.
Rafael C. Carrasco, Jose Oncina, and Jorge Calera-Rubio. Stochastic inference of regular tree languages. Machine Learning, 44(1/2):185–197, 2001.
John Carroll, Ted Briscoe, and Antonio Sanfilippo. Parser evaluation: A survey and a new proposal. In Proceedings of the International Conference on Language REsources and Evaluation, pages 447–454, Granada, Spain, 1998.
J.-C. Chappelier and M. Rajman. A generalized CYK algorithm for parsing stochastic CFG. In Actes de TAPD’98, pages 133–137, 1998.
Eugene Charniak. Treebank grammars. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1031–1036. AAAI Press/MIT Press, 1996.
L. Frazier and K. Rayner. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14:178–210, 1982.
Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–632, 1998.
Alexander Krotov, Robert Gaizauskas, Mark Hepple, and Yorick Wilks. Compacting the Penn Treebank grammar. In Proceedings of COLING/ACL’98, pages699–703, 1998.
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19:313–330, 1993.
Maurice Nivat and Andreas Podelski. Minimal ascending and descending tree automata. SIAM Journal on Computing, 26(1):39–58, 1997.
A. Radford, M. Atkinson, D. Britain, H. Clahsen, and A. Spencer. Linguistics: an introduction. Cambridge Univ. Press, Cambridge, 1999.
J.R. Rico-Juan, J. Calera-Rubio, and R.C. Carrasco. Probabilistic k-testable tree-languages. In A.L. Oliveira, editor, Proceedings of 5th International Colloquium, ICGI2000, Lisbon (Portugal), volume 1891 of Lecture Notes in Computer Science, pages 221–228, Berlin, 2000. Springer.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Verdú-Mas, J.L., Forcada, M.L., Carrasco, R.C., Calera-Rubio, J. (2002). Tree k-Grammar Models for Natural Language Modelling and Parsing. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2002. Lecture Notes in Computer Science, vol 2396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70659-3_5
Download citation
DOI: https://doi.org/10.1007/3-540-70659-3_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44011-6
Online ISBN: 978-3-540-70659-5
eBook Packages: Springer Book Archive