A hybrid approach to word segmentation

Kazakov, Dimitar; Manandhar, Suresh

doi:10.1007/BFb0027316

Dimitar Kazakov¹ &
Suresh Manandhar¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1446))

Included in the following conference series:

International Conference on Inductive Logic Programming

115 Accesses
8 Citations

Abstract

This article presents a combination of unsupervised and supervised learning techniques for generation of word segmentation rules from a list of words. First, a bias for word segmentation is introduced and a simple genetic algorithm is used for the search of segmentation that corresponds to the best bias value. In the second phase, the segmentation obtained from the genetic algorithm is used as an input for two inductive logic programming algorithms, namely FoIDL and CLOG. The result is a logic program that can be used for segmentation of unseen words. The learnt program contains affixes which are characteristic for the given language and can be used in other morphology tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Blockeel. Application of inductive logic programming to natural language processing. Master's thesis, Katholieke Universiteit Leuven, 1994.
Google Scholar
E. Brill. Some advances in transformation-based part of speech tagging. In Proceedings of AAAI-94, pages 748–753. AAAI Press/MIT Press, 1994.
Google Scholar
Mary Elaine Califf and Raymond J. Mooney. Advantages of decision lists and implicit negatives in inductive logic programming. Technical report, University of Texas at Austin, 1996.
Google Scholar
James Cussens. Part-of-speech tagging using Progol. In Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97), pages 93–108, 1997.
Google Scholar
Sabine Deligne. Modèles de séquences de longueurs variables: Application au traitement du langage écrit et de la parole. PhD thesis, ENST Paris, France, 1996.
Google Scholar
Bernard Fradin. L'approche à deux niveaux en morphologie computationnelle et les développements récents de la morphologie. Traitement automatique des langues, 35(2):9–48, 1994.
Google Scholar
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989.
Google Scholar
Dimitar Kazakov. An inductive approach to natural language parser design. In Kemal Oflazer and Harold Somers, editors, Proceedings of Nemlap-2, pages 209–217, Ankara, Turkey, September 1996. Bilkent University.
Google Scholar
Dimitar Kazakov. Unsupervised learning of naïve morphology with genetic algorithms. In W. Daelemans, A. van den Bosch, and A. Weijters, editors, Workshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks, pages 105–112, Prague, April 1997.
Google Scholar
Nada Lavrač and Sašo Džeroski. Inductive Logic Programming Techniques and Applications. Ellis Horwood Ltd., Campus 400, Maylands Avenue, Hemel Hempstead, Herdfortshire, HP2 7EZ, England, 1994.
Google Scholar
Suresh Manandhar, Sašo Džeroski, and Tomaž Erjavec. Learning Multilingual Morphology with CLOG. In The Eighth International Conference on Inductive Logic Programming (ILP'98), Madison, Wisconsin, USA, 1998.
Google Scholar
Raymond J. Mooney and Mary Elaine Califf. Induction of first-order decision lists: Results on learning the past tense of English verbs. JAIR, June 1995.
Google Scholar
Vito Pirelli. Morphology, Analogy and Machine Translation. PhD thesis, Salford University, UK, 1993.
Google Scholar
J.R. Quinlan. Learning logical definitions from relations. ML, 5:239–266, 1990.
Google Scholar
Antal van den Bosch, Walter Daelemans, and Ton Weijters. Morphological analysis as classification: an inductive learning approach. In Kemal Oflazer and Harold Somers, editors, Proceedings of Nemlap-2, pages 79–89, Ankara, Sep. 1996.
Google Scholar
François Yvon. Prononcer par analogie: motivations, formalisations et évaluations. PhD thesis, ENST Paris, France, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

University of York, YO10 5DD, Heslington, York, UK
Dimitar Kazakov & Suresh Manandhar

Authors

Dimitar Kazakov
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Manandhar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Page

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kazakov, D., Manandhar, S. (1998). A hybrid approach to word segmentation. In: Page, D. (eds) Inductive Logic Programming. ILP 1998. Lecture Notes in Computer Science, vol 1446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027316

Download citation

DOI: https://doi.org/10.1007/BFb0027316
Published: 18 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64738-6
Online ISBN: 978-3-540-69059-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics