Abstract
This paper studies the properties and performance of models for estimating local probability distributions which are used as components of larger probabilistic systems — history-based generative parsing models. We report experimental results showing that memory-based learning outperforms many commonly used methods for this task (Witten-Bell, Jelinek-Mercer with fixed weights, decision trees, and log-linear models). However, we can connect these results with the commonly used general class of deleted interpolation models by showing that certain types of memory-based learning, including the kind that performed so well in our experiments, are instances of this class. In addition, we illustrate the divergences between joint and conditional data likelihood and accuracy performance achieved by such models, suggesting that smoothing based on optimizing accuracy directly might greatly improve performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201 (1997)
Black, E., Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., Roukos, S.: Towards history-based grammars: Using richer models for probabilistic parsing. In: Proceedings of the 31st Meeting of the Association for Computational Linguistics, pp. 31–37 (1993)
Charniak, E.: A maximum entropy inspired parser. In: NAACL (2000)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 310–318 (1996)
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL, pp. 16–23 (1997)
Daelemans, W.: Introduction to the special issue on memory-based language processing. Journal of Experimental and Theoretical Artificial Intelligence 11(3), 287–292 (1999)
Daelemans, W., van den Bosch, A., Zavrel, J.: Forgetting exceptions is harmful in language learning. Machine Learning 34(1/3), 11–43 (1999)
Dagan, I., Lee, L., Pereira, F.: Similarity-based models of cooccurrence probabilities. Machine Learning 34(1-3), 43–69 (1999)
Friedman, J.: On bias variance 0/1-loss and the curse-of-dimensionality. Journal of Data Mining and Knowledge Discovery 1(1) (1996)
Goodman, J.T.: A bit of progress in language modeling: Extended version. In MSR Technical Report MSR-TR-2001-72 (2001)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (2003)
Lee, L.: Measures of distributional similarity. In: 37th Annual Meeting of the Association for Computational Linguistics, pp. 25–32 (1999)
Magerman, D.M.: Statistical decision-tree models for parsing. In: Proceedings of the 33rd Meeting of the Association for Computational Linguistics (1995)
Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D., Brants, T.: The LinGo Redwoods treebank: Motivation and preliminary applications. In: COLING 19 (2002)
Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994)
Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: EMNLP, pp. 1—10 (1997)
Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inform. Theory 37(4), 1085–1094 (1991)
Zavrel, J., Daelemans, W.: Memory-based learning: Using similarity for smoothing. Joint ACL/EACL (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Toutanova, K., Mitchell, M., Manning, C.D. (2003). Optimizing Local Probability Models for Statistical Parsing. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive