A statistical syntactic disambiguation program and what it learns

Ersan, Murat; Charniak, Eugene

doi:10.1007/3-540-60925-3_44

A statistical syntactic disambiguation program and what it learns

Murat Ersan¹ &
Eugene Charniak¹

Conference paper
First Online: 01 January 2005

199 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Abstract

We describe a program that uses statistical information on word-usage to perform syntactic disambiguation, and show that the use of this information significantly improves performance. The bulk of the paper, however, attempts to answer the question: what did the program learn that would account for this improvement? We show that the program has learned many linguistically recognized forms of lexical information, particularly verb case frames and prepositional preferences for nouns and adjectives. We also show that viewed simply as a learner of lexical information the program is also a success, performing slightly better than hand-crafted learning programs for the same tasks.

This research was supported in part by NSF grant IRI-9319516.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Bod, A. Rens, Using an annotated language corpus as a virtual stochastic grammar. In Proceedings of the Eleventh National Conference on Artificial Intelligence, Menlo Park: AAAI Press/MIT Press (1993) 778–783
Google Scholar
Brent, Michael R., Automatic acquisition of subcategorization frames from untagged text. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (1991) 209–214
Google Scholar
Brent, Michael R. and Berwick, R. C., Automatic acquisition of subcategorization frames from tagged text. In Proceedings of the 4th DARPA Speech and Natural Language Workshop (1991) 342–345
Google Scholar
Carroll, Glenn and Charniak, Eugene, Two experiments on learning probabilistic dependency grammars from corpora. In Workshop Notes, Statistically-Based NLP Techniques, AAAI (1992) 1–13
Google Scholar
Charniak, Eugene and Carroll, Glenn, Context-sensitive statistics for improved grammatical language models. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park: AAAI Press/MIT Press (1994) 728–733
Google Scholar
Charniak, Eugene, Parsing with context-free grammars and word statistics. Technical Report CS-95-28, Department of Computer Science, Brown University (1995)
Google Scholar
Charniak, Eugene, Statistical Language Learning. Cambridge: MIT Press (1993)
Google Scholar
Hornby, A. S., Oxford Advanced Learner's Dictionary of Current English. Oxford: Oxford University Press. 3rd ed. (1985)
Google Scholar
Magerman, David M., Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (1995) 276–283
Google Scholar
Manning, Christopher D. Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (1993) 235–242
Google Scholar
Marcus, Mitchell P., Santorini, Beatrice, and Marcinkiewicz, Mary Ann, Building a large annotated corpus of English: the Penn treebank. In Computational Linguistics 19 (1993) 313–330
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Brown University, 02912-1910, Providence, RI
Murat Ersan & Eugene Charniak

Authors

Murat Ersan
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Charniak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ersan, M., Charniak, E. (1996). A statistical syntactic disambiguation program and what it learns. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_44

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_44
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics