Abstract
pcfg Learning by Partition Search is a general grammatical inference method for constructing, adapting and optimising pcfgs. Given a training corpus of examples from a language, a canonical grammar for the training corpus, and a parsing task, Partition Search pcfg Learning constructs a grammar that maximises performance on the parsing task and minimises grammar size. This paper describes Partition Search in detail, also providing theoretical background and a characterisation of the family of inference methods it belongs to. The paper also reports an example application to the task of building grammars for noun phrase extraction, a task that is crucial in many applications involving natural language processing. In the experiments, Partition Search improves parsing performance by up to 21.45% compared to a general baseline and by up to 3.48% compared to a task-specific baseline, while reducing grammar size by up to 17.25%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Belz. 2001. Optimising corpus-derived probabilistic grammars. In Proceedings of Corpus Linguistics 2001, pages 46–57.
A. Belz. 2002. Learning Grammars for Different Parsing Tasks by Partition Search. To appear in Proceedings of COLING 2002.
E. Charniak and G. Carroll. 1994. Context-sensitive statistics for improved grammatical language models. Technical Report CS-94-07, Department of Computer Science, Brown University.
E. Charniak. 1996. Tree-bank grammars. Technical Report CS-96-02, Department of Computer Science, Brown University.
E. M. Gold. 1967. Language Identification in the Limit. Information and Control, 10:447–474.
M. Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–632.
A. J. Korenjak. 1969. A practical method for constructing LR(k) processors. Communications of the ACM, 12(11).
Po Chui Luk, Helen Meng, and Fuliang Weng. 2000. Grammar partitioning and parser composition for natural langugage understanding. In Proceedings of ICSLP 2000.
J. Nerbonne, A. Belz, N. Cancedda, H. Déjean, J. Hammerton, R. Koeling, S. Konstantopoulos, M. Osborne, F. Thollard, and E. Tjong Kim Sang. 2001. Learning computational grammars. In Proceedings of CoNLL 2001, pages 97–104.
H. Schmid and S. Schulte Im Walde. 2000. Robust German noun chunking with a probabilistic context-free grammar. In Proceedings of COLING 2000, pages 726–732.
H. Schmid. 2000. LoPar: Design and implementation. Bericht des Sonderforschungsbereiches “Sprachtheoretische Grundlagen für die Computerlinguistik” 149, Institute for Computational Linguistics, University of Stuttgart.
J. Luis Verdú-Mas, J. Calera-Rubio, and R. C. Carrasco. 2000. A comparison of PCFG models. In Proceedings of CoNLL-2000 and LLL-2000, pages 123–125.
F. L. Weng and A. Stolcke. 1995. Partitioning grammars and composing parsers. In Proceedings of the 4th International Workshop on Parsing Technologies.
J. G. Wolff. 1982. Language Acquisition, Data Compression and Generalization. In Language and Communication, 2(1):57–89.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Belz, A. (2002). PCFG Learning by Nonterminal Partition Search. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_2
Download citation
DOI: https://doi.org/10.1007/3-540-45790-9_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive