Skip to main content

Learning to Identify Semitic Roots

  • Chapter

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

Abstract

The morphology of Semitic languages is unique in the sense that the major word-formation mechanism is an inherently non-concatenative process of interdigitation, whereby two morphemes, a root and a pattern, are interwoven. Identifying the root of a given word in a Semitic language is an important task, in some cases a crucial part of morphological analysis. It is also a non-trivial task, which many humans find challenging. We present a machine learning approach to the problem of extracting roots of Semitic words. Given the large number of potential roots (thousands), we address the problem as one of combining several classifiers, each predicting the value of one of the root’s consonants. We show that when these predictors are combined by enforcing some fairly simple linguistics constraints, high accuracy, which compares favorably with human performance on this task, can be achieved

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Beesley, Kenneth R. 1998a. Arabic morphological analysis on the internet. In Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, Cambridge, April.

    Google Scholar 

  • Beesley, Kenneth R. 1998b. Arabic morphology using only finite-state operations. In Michael Rosner, editor, Proceedings of the Workshop on Computational Approaches to Semitic languages, pages 50–57, Montreal, Quebec, August. COLING-ACL’98.

    Google Scholar 

  • Buckwalter, Tim. 2002. Buckwalter Arabic morphological analyzer. Linguistic Data Consortium (LDC) catalog number LDC2002L49 and ISBN 1-58563-257-0.

    Google Scholar 

  • Choueka, Yaacov. 1990. MLIM - a system for full, exact, on-line grammatical analysis of Modern Hebrew. In Yehuda Eizenberg, editor, Proceedings of the Annual Conference on Computers in Education, p. 63, Tel Aviv, April. In Hebrew.

    Google Scholar 

  • Darwish, Kareem. 2002. Building a shallow Arabic morphological analyzer in one day. In Mike Rosner and Shuly Wintner, editors, Computational Approaches to Semitic Languages, an ACL’02 Workshop, pp. 47–54, Philadelphia, PA, July.

    Google Scholar 

  • Daya, Ezra, Dan Roth, and Shuly Wintner. 2004. Learning Hebrew roots: Machine learning with linguistic constraints. In Proceedings of EMNLP’04, pp. 357–364, Barcelona, Spain, July.

    Google Scholar 

  • Even-Shoshan, Abraham. 1993. HaMillon HaXadash (The New Dictionary). Kiryat Sefer, Jerusalem. In Hebrew.

    Google Scholar 

  • Even-Zohar, Y. and Dan Roth. 2001. A sequential model for multi class classification. In EMNLP-2001, the SIGDAT Conference on Empirical Methods in Natural Language Processing, pp. 10–19.

    Google Scholar 

  • Florian, Radu. 2002. Named entity recognition as a house of cards: Classifier stacking. In Proceedings of CoNLL-2002, pp. 175–178. Taiwan.

    Google Scholar 

  • Habash, Nizar and Owen Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580, Ann Arbor, Michigan, June. Association for Computational Linguistics.

    Google Scholar 

  • Kruskal, Joseph. 1999. An overview of sequence comparison. In David Sankoff and Joseph Kruskal, editors, Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. CSLI Publications, Stanford, CA, pp. 1–44. Reprint, with a foreword by John Nerbonne.

    Google Scholar 

  • Marsi, Erwin, Antal van den Bosch, and Abdelhadi Soudi. 2005. Memory-based morphological analysis generation and part-of-speech tagging of Arabic. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, pp. 1–8, Ann Arbor, Michigan, June. Association for Computational Linguistics.

    Google Scholar 

  • McCarthy, John J. 1981. A prosodic theory of nonconcatenative morphology. Linguistic Inquiry, 12(3):373–418.

    Google Scholar 

  • Ornan, Uzzi. 2003. The Final Word. Haifa, Israel: University of Haifa Press. In Hebrew.

    Google Scholar 

  • Punyakanok, Vasin and Dan Roth. 2001. The use of classifiers in sequential inference. In NIPS-13; The 2000 Conference on Advances in Neural Information Processing Systems 13, pp. 995–1001. MIT Press.

    Google Scholar 

  • Roth, Dan. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of AAAI-98 and IAAI-98, pp. 806–813, Madison, Wisconsin.

    Google Scholar 

  • Schütze, H. and Y. Singer. 1994. Part-of-speech tagging using a variable memory Markov model. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 181–187.

    Google Scholar 

  • Shimron, Joseph, editor. 2003. Language Processing and Acquisition in Languages of Semitic, Root-Based, Morphology. Number 28 in Language Acquisition and Language Disorders. John Benjamins.

    Google Scholar 

  • Tjong Kim Sang, Erik F. and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Walter Daelemans and Miles Osborne, editors, Proceedings of CoNLL-2003, pp. 142–147. Edmonton, Canada.

    Google Scholar 

  • Zdaqa, Yizxaq. 1974. Luxot HaPoal (The Verb Tables). Jerusalem: Kiryath Sepher. In Hebrew.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Daya, E., Roth, D., Wintner, S. (2007). Learning to Identify Semitic Roots. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_8

Download citation

Publish with us

Policies and ethics