Abstract
This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which has been successful in some NLP tasks in English. We use a baseline statistical trigram-based model of a previous work to enumerate an n-best list of candidate morphological parse sequences for each sentence. We then apply the perceptron algorithm to rerank the n-best list using a set of 23 features. The perceptron trained to do morphological disambiguation improves the accuracy of the baseline model from 93.61% to 96.80%. When we train the perceptron as a POS tagger, the accuracy is 98.27%. Turkish morphological disambiguation and POS tagging results that we obtained is the best reported so far.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Oflazer, K.: Two-level Description of Turkish Morphology. Literary and Linguistic Computing 9(2), 137–148 (1994)
Karlsson, F., Voutilainen, A., Heikkila, J., Anttila, A.: Constraint Grammar-A Language-Independent System for Parsing Unrestricted Text (1995)
Brill, E.: A Simple Rule-Based Part-of-Speech Tagger. In: Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy (1992)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics (1995)
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of Second Conference on Applied Natural Language Processing, Austin, Texas (1988)
Ratnaparkhi, A.: A Maximum-Entropy Model for Part-of-Speech Tagging. In: Proceedings of the emprical methods in natural language processing conference (1996)
Cutting, D., Kupiec, J., Pealersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy (1992)
Hajič, J., Hladká, B.: Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In: Proceedings of COLING-ACL Conference (1998)
Oflazer, K., Tür, G.: Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation. In: Proceedings of the ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA (1996)
Oflazer, K., Tür, G.: Morphological Disambiguation by Voting Constraints. In: Proceedings of ACL/EACL, The 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain (1997)
Hakkani-Tür, D.Z., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities 36(4) (2002)
Yüret, D., Türe, F.: Learning Morphological Disambiguation Rules for Turkish. In: Proceedings of HLT-NAACL (2006)
Freund, Y., Schapire, R.E.: Large Margin Classification using the Perceptron Algorithm. Machine Learning 37(3), 277–296 (1999)
Collins, M., Duffy, N.: New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In: Proceedings of ACL (2002)
Collins, M.: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In: Proceedings of EMNLP (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sak, H., Güngör, T., Saraçlar, M. (2007). Morphological Disambiguation of Turkish Text with Perceptron Algorithm. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)