Abstract
In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A pair of stem-tag and suffix-tag lexicons is obtained by the application of that approach to an annotated lexicon of word-tag pairs. The two lexicons are then used to predict the tags of unseen words in two ways, (1) by using only the stem and suffix generated by the segmentation rules, and (2) for all matching combinations of stem and suffix present in the lexicons. The results show high correlation between the constituents generated by the segmentation rules, and the tags of the words in which they appear, thereby demonstrating the linguistic relevance of the segmentations produced by the hybrid approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Brill. Some advances in transformation-based part of speech tagging. In Proceedings of AAAI-94, pages 748–753. AAAI Press/MIT Press, 1994.
Tomaž Erjavec. The MULTEXT-East Slovene Lexicon. In Proceedings of the 7th Electrotechnical Conference ERK, Volume B, pages 189–192, Portorož, Slovenia, 1998.
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989.
Dimitar Kazakov. Unsupervised learning of naïve morphology with genetic algorithms. In W. Daelemans, A. van den Bosch, and A. Weijters, editors, Workshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks, pages 105–112, Prague, April 1997.
Dimitar Kazakov, and Suresh Manandhar. A Hybrid Approach to Word Segmentation. In D. Page, editor, Proc. of the 8th International Workshop on Inductive Logic Programming (ILP-98), pages 125–134. Berlin, 1998. Springer-Verlag.
Suresh Manandhar, Sašo Džeroski, and Tomaž Erjavec. Learning Multilingual Morphology with CLOG. In The Eighth International Conference on Inductive Logic Programming (ILP’98), Madison, Wisconsin, USA, 1998.
Raymond J. Mooney and Mary Elaine Califf. Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, June 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kazakov, D., Manandhar, S., Erjavec, T. (1999). Learning Word Segmentation Rules for Tag Prediction. In: Džeroski, S., Flach, P. (eds) Inductive Logic Programming. ILP 1999. Lecture Notes in Computer Science(), vol 1634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48751-4_15
Download citation
DOI: https://doi.org/10.1007/3-540-48751-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66109-2
Online ISBN: 978-3-540-48751-7
eBook Packages: Springer Book Archive