Skip to main content

A Practical Chunker for Unrestricted Text

  • Conference paper
  • First Online:
Book cover Natural Language Processing — NLP 2000 (NLP 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1835))

Included in the following conference series:

Abstract

In this paper we present a practical approach to text chunking for unrestricted Modern Greek text that is based on multiple-pass parsing. Two versions of this chunker are proposed: one based on a large lexicon and one based on minimal resources. In the latter case the morphological analysis is performed using exclusively two small lexicons containing closed-class words and common suffixes of the Modern Greek words. We give comparative performance results on the basis of a corpus of unrestricted text and show that very good results can be obtained by omitting the large and complicate resources. Moreover, the considerable time cost introduced by the use of the large lexicon indicates that the minimal-resources chunker is the best solution regarding a practical application that requires rapid response and less than perfect parsing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karlsson, F., A. Voutilainen, J. Heikkila, and A. Anttila (1995). A Language-Independent System for Parsing Unrestricted Text Mouton de Gruyter.

    Google Scholar 

  2. Hobbs, J., D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson (1996). FASTUS: a Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In E. Roche and Y. Schabes eds Finite State Devices for Natural Language Processing Cambridge MA: MIT Press.

    Google Scholar 

  3. Dermatas, E. and G. Kokkinakis (1995). Automatic Stochastic Tagging of Natural Language Texts. Computational Linguistics, 21(2), pp. 137–164.

    Google Scholar 

  4. Mikheev, A. (1997). Automatic Rule Induction for Unknown Word Guessing. Computational Linguistics, 23(3), pp. 405–423.

    Google Scholar 

  5. Goodman, J. (1997). Global Thresholding and Multiple-Pass Parsing. In Proc. of the Second Conference on Empirical Methods in Natural Language Processing, pp. 11–25.

    Google Scholar 

  6. Schwartz, R, L. Nguyen, and J. Makhoul (1996). Multiple-Pass Search Strategies. In C. Lee, F. Soong, and K. Paliwal eds Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, pp. 429–456.

    Google Scholar 

  7. Abney, S. (1991). Parsing by Chunks. In Berwick, Abney, and Tenny eds, Principle-based Parsing Kluwer Academic Publishers.

    Google Scholar 

  8. Michos S., F. Fakotakis, and G. Kokkinakis (1995). A Novel and Efficient Method for Parsing Unrestricted Texts of Quasi-Free Word Order Languages. Int. Journal on Artificial Intelligence Tools, 4(3). World Scientific, pp. 301–321.

    Article  Google Scholar 

  9. Church, K. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proc. of Second Conference on Applied Natural Language Processing, pp. 136–143.

    Google Scholar 

  10. Ramshaw, L. and Marcus M. (1995). Text Chunking Using Transformation-based Learning. In Proc. of ACL Third Workshop on Very Large Corpora., pp. 82–94.

    Google Scholar 

  11. Skut, W. and Brants T. (1998). Chunk Tagger: Statistical Recognition of Noun Phrases. In ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing

    Google Scholar 

  12. Bourigault, D. (1992). Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In Proc. of the Fifteenth Int. Conference on Computational Linguistics, 3, pp. 977–981.

    Google Scholar 

  13. Voutilainen, A. (1993). NPtool, a Detector of English Noun Phrases. In Proc. of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Ohio State University, pp. 48–57.

    Google Scholar 

  14. Srinivas, B. (1997). Performance Evaluation of Supertagging for Partial Parsing. In Proc. of the Fifth International Workshop on Parsing Technologies

    Google Scholar 

  15. Sundheim, B. (nted.) (1995). Proceedings of the 6 th Message Understanding Conference (MUC-6) Columbia, Advanced Research Projects Agency, Information Technology Office, Maryland.

    Google Scholar 

  16. Sgarbas, K., N. Fakotakis and G. Kokkinakis (1995). A PC-KIMMO-based Morphological Description of Modern Greek. Literary and Linguistic Computing, 10(3), Oxford University Press, New York, pp. 189–201.

    Google Scholar 

  17. Stamatatos, E., N. Fakotakis, and G. Kokkinakis (1999). Automatic Extraction of Rules for Sentence Boundary Disambiguation. In Proc. of the Workshop in Machine Learning in Human Language Technology, Advance Course on Artificial Intelligence (ACAI’99), pp. 88–92.

    Google Scholar 

  18. Stamatatos, E., N. Fakotakis, and G. Kokkinakis (1999). Automatic Authorship Attribution. In Proc. of the 9 th Conf. of the European Chapter of the Association for Computational Linguistics (EACL’99), pp. 158–164.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stamatatos, E., Fakotakis, N., Kokkinakis, G. (2000). A Practical Chunker for Unrestricted Text. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-45154-4_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67605-8

  • Online ISBN: 978-3-540-45154-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics