Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

  • 190 Accesses

Abstract

Corpora tagged with part-of-speech and phrase structure information have been used for both exploratory data analysis as well as unsupervised learning of language models. These corpora have proved invaluable resources for research activities such as training part-of-speech taggers, disambiguating word-senses, detecting noun-phrases, inducing selectional restrictions, extracting argument structure and inducing probabilistic grammars.

In this paper, we present some new techniques that use parsed corpora, not for inducing grammars but for circumventing parsing as much as possible. In particular, we will describe how a parsed corpus using a wide-coverage Lexicalized Tree Adjoining Grammar (LTAG) is used for this purpose. The first technique exploits the fact that LTAGs represent dependency and constituency information in a uniform way. The second technique uses Explanation-Based Learning methodology to view parsing as Finite State Transduction. Both the techniques exploit the central notions of LTAGs — lexicalization, extended domain of locality and factoring of recursion from the domain over which dependencies are specified.

We would like to thank R. Chandrasekhar, Christine Doran, Mitch Marcus and Martha Palmer for their valuable comments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Eric Brill. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993.

    Google Scholar 

  2. Ted Briscoe. Prospects for Practical Parsing of Unrestricted Text: Robust Statistical Parsing Techniques. In Corpus-based Research into Language. Rodopi, 1994.

    Google Scholar 

  3. Kuang-Hua Chen and Hsin-Hsi Chen. Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.

    Google Scholar 

  4. Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In 2nd Applied Natural Language Processing Conference, Austin, Texas, 1988.

    Google Scholar 

  5. Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, and Martin Zaidel. XTAG System — A Wide Coverage Grammar for English. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.

    Google Scholar 

  6. F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision Tree Parsing using a Hidden Derivation Model. ARPA Workshop on Human Language Technology, pages 260–265, 1994.

    Google Scholar 

  7. Aravind K. Joshi and B. Srinivas. Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.

    Google Scholar 

  8. R. Leech, G. & Garside. Computer Corpora: Selected Papers and Bibliography, chapter Running a grammar factory:the production of syntactically analysed corpora or ‘treebanks'. Berlin, 1991.

    Google Scholar 

  9. Mitchell M. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19.2:313–330, June 1993.

    Google Scholar 

  10. Steve Minton. Quantitative Results concerning the utility of Explanation-Based Learning. In Proceedings of 7th AAAI Conference, pages 564–569, Saint Paul, Minnesota, 1988.

    Google Scholar 

  11. Tom M. Mitchell, Richard M. Keller, and Smadar T. Kedar-Carbelli. Explanation-Based Generalization: A Unifying View. Machine Learning 1, 1:47–80, 1986.

    Google Scholar 

  12. Günter Neumann. Application of Explanation-based Learning for Efficient Processing of Constraint-based Grammars. In 10 th IEEE Conference on Artificial Intelligence for Applications, San Antonio, Texas, 1994.

    Google Scholar 

  13. Manny Rayner. Applying Explanation-Based Generalization to Natural Language Processing. In Proceedings of the International Conference on Fifth Generation Computer Systems, Tokyo, 1988.

    Google Scholar 

  14. Francesc Ribas. On learning more appropriate selectional restrictions. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.

    Google Scholar 

  15. G. Sampson. Susanne: a Doomsday book of English Grammar. In Corpus-based Research into Language. Rodopi, Amsterdam, 1994.

    Google Scholar 

  16. Chister Samuelsson. Grammar Specialization through Entropy Thresholds. In 32nd Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.

    Google Scholar 

  17. Christer Samuelsson and Manny Rayner. Quantitative Evaluation of Explanation-Based Learning as an Optimization Tool for Large-Scale Natural Language System. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sydney,Australia, 1991.

    Google Scholar 

  18. Y. Schabes, M. Roth, and R. Osborne. Parsing the Wall Street Journal with the Inside-Outside Algorithm. In Proceedings of the European ACL, 1993.

    Google Scholar 

  19. Yves Schabes. Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis, Computer Science Department, University of Pennsylvania, 1990.

    Google Scholar 

  20. Yves Schabes, Anne Abeillé, and Aravind K. Joshi. Parsing strategies with ‘lexicalized’ grammars: Application to Tree Adjoining Grammars. In Proceedings of the 12th International Conference on Computational Linguistics (COLING'88), Budapest, Hungary, August 1988.

    Google Scholar 

  21. B. Srinivas, Christine Doran, and Seth Kulick. Heuristics and parse ranking. In Proceedings of the 4th Annual International Workshop on Parsing Technologies, Prague, September 1995.

    Google Scholar 

  22. Frank van Harmelen and Allan Bundy. Explanation-Based Generalization = Partial Evaluation. Artificial Intelligence, 36:401–412, 1988.

    Google Scholar 

  23. Atro Voutilainen. NPtool, a Detector of English Noun Phrases. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joshi, A.K., Srinivas, B. (1996). Using parsed corpora for circumventing parsing. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_63

Download citation

  • DOI: https://doi.org/10.1007/3-540-60925-3_63

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60925-4

  • Online ISBN: 978-3-540-49738-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics