Skip to main content

A statistical syntactic disambiguation program and what it learns

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Abstract

We describe a program that uses statistical information on word-usage to perform syntactic disambiguation, and show that the use of this information significantly improves performance. The bulk of the paper, however, attempts to answer the question: what did the program learn that would account for this improvement? We show that the program has learned many linguistically recognized forms of lexical information, particularly verb case frames and prepositional preferences for nouns and adjectives. We also show that viewed simply as a learner of lexical information the program is also a success, performing slightly better than hand-crafted learning programs for the same tasks.

This research was supported in part by NSF grant IRI-9319516.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bod, A. Rens, Using an annotated language corpus as a virtual stochastic grammar. In Proceedings of the Eleventh National Conference on Artificial Intelligence, Menlo Park: AAAI Press/MIT Press (1993) 778–783

    Google Scholar 

  2. Brent, Michael R., Automatic acquisition of subcategorization frames from untagged text. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (1991) 209–214

    Google Scholar 

  3. Brent, Michael R. and Berwick, R. C., Automatic acquisition of subcategorization frames from tagged text. In Proceedings of the 4th DARPA Speech and Natural Language Workshop (1991) 342–345

    Google Scholar 

  4. Carroll, Glenn and Charniak, Eugene, Two experiments on learning probabilistic dependency grammars from corpora. In Workshop Notes, Statistically-Based NLP Techniques, AAAI (1992) 1–13

    Google Scholar 

  5. Charniak, Eugene and Carroll, Glenn, Context-sensitive statistics for improved grammatical language models. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park: AAAI Press/MIT Press (1994) 728–733

    Google Scholar 

  6. Charniak, Eugene, Parsing with context-free grammars and word statistics. Technical Report CS-95-28, Department of Computer Science, Brown University (1995)

    Google Scholar 

  7. Charniak, Eugene, Statistical Language Learning. Cambridge: MIT Press (1993)

    Google Scholar 

  8. Hornby, A. S., Oxford Advanced Learner's Dictionary of Current English. Oxford: Oxford University Press. 3rd ed. (1985)

    Google Scholar 

  9. Magerman, David M., Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (1995) 276–283

    Google Scholar 

  10. Manning, Christopher D. Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (1993) 235–242

    Google Scholar 

  11. Marcus, Mitchell P., Santorini, Beatrice, and Marcinkiewicz, Mary Ann, Building a large annotated corpus of English: the Penn treebank. In Computational Linguistics 19 (1993) 313–330

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ersan, M., Charniak, E. (1996). A statistical syntactic disambiguation program and what it learns. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_44

Download citation

  • DOI: https://doi.org/10.1007/3-540-60925-3_44

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60925-4

  • Online ISBN: 978-3-540-49738-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics