Abstract
In this paper we present a practical approach to text chunking for unrestricted Modern Greek text that is based on multiple-pass parsing. Two versions of this chunker are proposed: one based on a large lexicon and one based on minimal resources. In the latter case the morphological analysis is performed using exclusively two small lexicons containing closed-class words and common suffixes of the Modern Greek words. We give comparative performance results on the basis of a corpus of unrestricted text and show that very good results can be obtained by omitting the large and complicate resources. Moreover, the considerable time cost introduced by the use of the large lexicon indicates that the minimal-resources chunker is the best solution regarding a practical application that requires rapid response and less than perfect parsing results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Karlsson, F., A. Voutilainen, J. Heikkila, and A. Anttila (1995). A Language-Independent System for Parsing Unrestricted Text Mouton de Gruyter.
Hobbs, J., D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson (1996). FASTUS: a Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In E. Roche and Y. Schabes eds Finite State Devices for Natural Language Processing Cambridge MA: MIT Press.
Dermatas, E. and G. Kokkinakis (1995). Automatic Stochastic Tagging of Natural Language Texts. Computational Linguistics, 21(2), pp. 137–164.
Mikheev, A. (1997). Automatic Rule Induction for Unknown Word Guessing. Computational Linguistics, 23(3), pp. 405–423.
Goodman, J. (1997). Global Thresholding and Multiple-Pass Parsing. In Proc. of the Second Conference on Empirical Methods in Natural Language Processing, pp. 11–25.
Schwartz, R, L. Nguyen, and J. Makhoul (1996). Multiple-Pass Search Strategies. In C. Lee, F. Soong, and K. Paliwal eds Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, pp. 429–456.
Abney, S. (1991). Parsing by Chunks. In Berwick, Abney, and Tenny eds, Principle-based Parsing Kluwer Academic Publishers.
Michos S., F. Fakotakis, and G. Kokkinakis (1995). A Novel and Efficient Method for Parsing Unrestricted Texts of Quasi-Free Word Order Languages. Int. Journal on Artificial Intelligence Tools, 4(3). World Scientific, pp. 301–321.
Church, K. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proc. of Second Conference on Applied Natural Language Processing, pp. 136–143.
Ramshaw, L. and Marcus M. (1995). Text Chunking Using Transformation-based Learning. In Proc. of ACL Third Workshop on Very Large Corpora., pp. 82–94.
Skut, W. and Brants T. (1998). Chunk Tagger: Statistical Recognition of Noun Phrases. In ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing
Bourigault, D. (1992). Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In Proc. of the Fifteenth Int. Conference on Computational Linguistics, 3, pp. 977–981.
Voutilainen, A. (1993). NPtool, a Detector of English Noun Phrases. In Proc. of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Ohio State University, pp. 48–57.
Srinivas, B. (1997). Performance Evaluation of Supertagging for Partial Parsing. In Proc. of the Fifth International Workshop on Parsing Technologies
Sundheim, B. (nted.) (1995). Proceedings of the 6 th Message Understanding Conference (MUC-6) Columbia, Advanced Research Projects Agency, Information Technology Office, Maryland.
Sgarbas, K., N. Fakotakis and G. Kokkinakis (1995). A PC-KIMMO-based Morphological Description of Modern Greek. Literary and Linguistic Computing, 10(3), Oxford University Press, New York, pp. 189–201.
Stamatatos, E., N. Fakotakis, and G. Kokkinakis (1999). Automatic Extraction of Rules for Sentence Boundary Disambiguation. In Proc. of the Workshop in Machine Learning in Human Language Technology, Advance Course on Artificial Intelligence (ACAI’99), pp. 88–92.
Stamatatos, E., N. Fakotakis, and G. Kokkinakis (1999). Automatic Authorship Attribution. In Proc. of the 9 th Conf. of the European Chapter of the Association for Computational Linguistics (EACL’99), pp. 158–164.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stamatatos, E., Fakotakis, N., Kokkinakis, G. (2000). A Practical Chunker for Unrestricted Text. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_13
Download citation
DOI: https://doi.org/10.1007/3-540-45154-4_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67605-8
Online ISBN: 978-3-540-45154-9
eBook Packages: Springer Book Archive