Optimizing Local Probability Models for Statistical Parsing

Toutanova, Kristina; Mitchell, Mark; Manning, Christopher D.

doi:10.1007/978-3-540-39857-8_37

Kristina Toutanova¹⁰,
Mark Mitchell¹¹ &
Christopher D. Manning¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2837))

Included in the following conference series:

European Conference on Machine Learning

2022 Accesses
1 Citations

Abstract

This paper studies the properties and performance of models for estimating local probability distributions which are used as components of larger probabilistic systems — history-based generative parsing models. We report experimental results showing that memory-based learning outperforms many commonly used methods for this task (Witten-Bell, Jelinek-Mercer with fixed weights, decision trees, and log-linear models). However, we can connect these results with the commonly used general class of deleted interpolation models by showing that certain types of memory-based learning, including the kind that performed so well in our experiments, are instances of this class. In addition, we illustrate the divergences between joint and conditional data likelihood and accuracy performance achieved by such models, suggesting that smoothing based on optimizing accuracy directly might greatly improve performance.

Download to read the full chapter text

Chapter PDF

Structured Prediction of Sequences and Trees Using Infinite Contexts

MALA with annealed proposals: a generalization of locally and globally balanced proposal distributions

Article 15 November 2021

Bayesian computation: a summary of the current state, and samples backwards and forwards

Article Open access 11 June 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201 (1997)
Google Scholar
Black, E., Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., Roukos, S.: Towards history-based grammars: Using richer models for probabilistic parsing. In: Proceedings of the 31st Meeting of the Association for Computational Linguistics, pp. 31–37 (1993)
Google Scholar
Charniak, E.: A maximum entropy inspired parser. In: NAACL (2000)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 310–318 (1996)
Google Scholar
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL, pp. 16–23 (1997)
Google Scholar
Daelemans, W.: Introduction to the special issue on memory-based language processing. Journal of Experimental and Theoretical Artificial Intelligence 11(3), 287–292 (1999)
Article Google Scholar
Daelemans, W., van den Bosch, A., Zavrel, J.: Forgetting exceptions is harmful in language learning. Machine Learning 34(1/3), 11–43 (1999)
Article MATH Google Scholar
Dagan, I., Lee, L., Pereira, F.: Similarity-based models of cooccurrence probabilities. Machine Learning 34(1-3), 43–69 (1999)
Article MATH Google Scholar
Friedman, J.: On bias variance 0/1-loss and the curse-of-dimensionality. Journal of Data Mining and Knowledge Discovery 1(1) (1996)
Google Scholar
Goodman, J.T.: A bit of progress in language modeling: Extended version. In MSR Technical Report MSR-TR-2001-72 (2001)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (2003)
Google Scholar
Lee, L.: Measures of distributional similarity. In: 37th Annual Meeting of the Association for Computational Linguistics, pp. 25–32 (1999)
Google Scholar
Magerman, D.M.: Statistical decision-tree models for parsing. In: Proceedings of the 33rd Meeting of the Association for Computational Linguistics (1995)
Google Scholar
Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D., Brants, T.: The LinGo Redwoods treebank: Motivation and preliminary applications. In: COLING 19 (2002)
Google Scholar
Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994)
Google Scholar
Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: EMNLP, pp. 1—10 (1997)
Google Scholar
Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inform. Theory 37(4), 1085–1094 (1991)
Article Google Scholar
Zavrel, J., Daelemans, W.: Memory-based learning: Using similarity for smoothing. Joint ACL/EACL (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Stanford University, Stanford, CA, 94305-9040, USA
Kristina Toutanova & Christopher D. Manning
CSLI, Stanford University, Stanford, CA, 94305, USA
Mark Mitchell

Authors

Kristina Toutanova
View author publications
You can also search for this author in PubMed Google Scholar
Mark Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Manning
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toutanova, K., Mitchell, M., Manning, C.D. (2003). Optimizing Local Probability Models for Statistical Parsing. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-39857-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Optimizing Local Probability Models for Statistical Parsing

Abstract

Chapter PDF

Similar content being viewed by others

Structured Prediction of Sequences and Trees Using Infinite Contexts

MALA with annealed proposals: a generalization of locally and globally balanced proposal distributions

Bayesian computation: a summary of the current state, and samples backwards and forwards

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Optimizing Local Probability Models for Statistical Parsing

Abstract

Chapter PDF

Similar content being viewed by others

Structured Prediction of Sequences and Trees Using Infinite Contexts

MALA with annealed proposals: a generalization of locally and globally balanced proposal distributions

Bayesian computation: a summary of the current state, and samples backwards and forwards

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation