Skip to main content

Morphological Processing of Semitic Languages

  • Chapter
  • First Online:

Abstract

This chapter addresses morphological processing of Semitic languages. In light of the complex morphology and problematic orthography of many of the Semitic languages, the chapter begins with a recapitulation of the challenges these phenomena pose on computational applications. It then discusses the approaches that were suggested to cope with these challenges in the past. The bulk of the chapter, then, discusses available solutions for morphological processing, including analysis, generation, and disambiguation, in a variety of Semitic languages. The concluding section discusses future research directions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Parts of this introduction are based on and adapted from [137].

  2. 2.

    This section is adapted from [135].

References

  1. Adler M, Elhadad M (2006) An unsupervised morpheme-based HMM for Hebrew morphological disambiguation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 665–672. http://www.aclweb.org/anthology/P/P06/P06-1084

  2. Al-Haj H, Lavie A (2010) The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation. In: Proceedings of the conference of the Association for Machine Translation in the Americas (AMTA), Denver

    Google Scholar 

  3. Alkuhlani S, Habash N (2011) A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, Portland. Association for Computational Linguistics, pp 357–362. http://www.aclweb.org/anthology/P11-2062

  4. Al-Shalabi R, Evens M (1998) A computational morphology system for Arabic. In: Rosner M (ed) Proceedings of the workshop on computational approaches to Semitic languages, COLING-ACL’98, Montreal, pp 66–72

    Google Scholar 

  5. Al-Sughaiyer IA, Al-Kharashi IA (2004) Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol 55(3):189–213

    Article  Google Scholar 

  6. Altantawy M, Habash N, Rambow O, Saleh I (2010) Morphological analysis and generation of Arabic nouns: a morphemic functional approach. In: Proceedings of the seventh international conference on language resources and evaluation (LREC), Valletta

    Google Scholar 

  7. Altantawy M, Habash N, Rambow O (2011) Fast yet rich morphological analysis. In: Proceedings of the 9th international workshop on finite-state methods and natural language processing (FSMNLP 2011), Blois

    Google Scholar 

  8. Amsalu S, Gibbon D (2005) A complete finite-state model for Amharic morphographemics. In: Yli-Jyrä A, Karttunen L, Karhumäki J (eds) FSMNLP. Lecture notes in computer science, vol 4002. Springer, Berlin/New York, pp 283–284

    Google Scholar 

  9. Amsalu S, Gibbon D (2005) Finite state morphology of Amharic. In: Proceedings of RANLP, Borovets, pp 47–51

    Google Scholar 

  10. Amtrup JW (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Mach Transl 18(3):217–238. doi:http://dx.doi.org/10.1007/s10590-004-2476-5

  11. Argaw AA, Asker L (2007) An Amharic stemmer: reducing words to their citation forms. In: Proceedings of the ACL-2007 workshop on computational approaches to Semitic languages, Prague

    Google Scholar 

  12. Audebert C, Gaubert C, Jaccarini A (2009) Minimal resources for Arabic parsing: an interactive method for the construction of evolutive automata. In: Choukri K, Maegaard B (eds) Proceedings of the second international conference on Arabic language resources and tools, The MEDAR Consortium, Cairo

    Google Scholar 

  13. Badr I, Zbib R, Glass J (2008) Segmentation for English-to-Arabic statistical machine translation. In: Proceedings of ACL-08: HLT, short papers, Columbus. Association for Computational Linguistics, pp 153–156. http://www.aclweb.org/anthology/P/P08/P08-2039

  14. Bar-Haim R, Sima’an K, Winter Y (2005) Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 39–46, http://www.aclweb.org/anthology/W/W05/W05-0706

  15. Bar-haim R, Sima’an K, Winter Y (2008) Part-of-speech tagging of Modern Hebrew text. Nat Lang Eng 14(2):223–251

    Article  Google Scholar 

  16. Barthélemy F (1998) A morphological analyzer for Akkadian verbal forms with a model of phonetic transformations. In: Proceedings of the Coling-ACL 1998 workshop on computational approaches to Semitic languages, Montreal, pp 73–81

    Google Scholar 

  17. Beesley KR (1996) Arabic finite-state morphological analysis and generation. In: Proceedings of COLING-96, the 16th international conference on computational linguistics, Copenhagen

    Google Scholar 

  18. Beesley KR (1998) Arabic morphological analysis on the internet. In: Proceedings of the 6th international conference and exhibition on multi-lingual computing, Cambridge

    Google Scholar 

  19. Beesley KR (1998) Arabic morphology using only finite-state operations. In: Rosner M (ed) Proceedings of the workshop on computational approaches to Semitic languages, COLING-ACL’98, Montreal, pp 50–57

    Google Scholar 

  20. Beesley KR (1998) Constraining separated morphotactic dependencies in finite-state grammars. In: FSMNLP-98, Bilkent, pp 118–127

    Google Scholar 

  21. Beesley KR, Karttunen L (2000) Finite-state non-concatenative morphotactics. In: Proceedings of the fifth workshop of the ACL special interest group in computational phonology, SIGPHON-2000, Luxembourg

    Google Scholar 

  22. Beesley KR, Karttunen L (2003) Finite-state morphology: xerox tools and techniques. CSLI, Stanford

    Google Scholar 

  23. Belguith LH, Aloulou C, Ben Hamadou A (2008) MASPAR: De la segmentation à l’analyse syntaxique de textes arabes. Rev Inf Interact Intell I3 7(2):9–36

    Google Scholar 

  24. Bentur E, Angel A, Segev D (1992) Computerized analysis of Hebrew words. Hebrew Linguist 36:33–38. (in Hebrew)

    Google Scholar 

  25. Berri J, Zidoum H, Atif Y (2001) Web-based Arabic morphological analyzer. In: Gelbukh A (ed) CICLing 2001. Lecture notes in computer science, vol 2004. Springer, Berlin, pp 389–400

    Google Scholar 

  26. Brants T (2000) TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on applied natural language processing, Seattle. Association for Computational Linguistics, pp 224–231. doi:10.3115/974147.974178, http://www.aclweb.org/anthology/A00-1031

  27. Buckwalter T (2004) Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium, Philadelphia

    Google Scholar 

  28. Buckwalter T (2004) Issues in Arabic orthography and morphology analysis. In: Farghaly A, Megerdoomian K (eds) COLING 2004 computational approaches to Arabic script-based languages, COLING, Geneva, pp 31–34

    Google Scholar 

  29. Choueka Y (1966) Computers and grammar: mechnical analysis of Hebrew verbs. In: Proceedings of the annual conference of the Israeli Association for Information Processing, Rehovot, pp 49–66. (in Hebrew)

    Google Scholar 

  30. Choueka Y (1972) Fast searching and retrieval techniques for large dictionaries and concordances. Heb Comput Linguist 6:12–32. (in Hebrew)

    Google Scholar 

  31. Choueka Y (1980) Computerized full-text retrieval systems and research in the humanities: the Responsa project. Comput Humanit 14:153–169

    Article  Google Scholar 

  32. Choueka Y (1990) MLIM – a system for full, exact, on-line grammatical analysis of Modern Hebrew. In: Eizenberg Y (ed) Proceedings of the annual conference on computers in education, Tel Aviv, p 63. (in Hebrew)

    Google Scholar 

  33. Choueka Y (1993) Response to “computerized analysis of Hebrew words”. Heb Linguist 37:87. (in Hebrew)

    Google Scholar 

  34. Cohen D (1970) Essai d’une analyse automatique de l’arabe. In: Etudes de linguistique sémitique et arabe, De Gruyter, Germany, pp 49–78

    Google Scholar 

  35. Cohen SB, Smith NA (2007) Joint morphological and syntactic disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague. Association for Computational Linguistics, pp 208–217. http://www.aclweb.org/anthology/D/D07/D07-1022

  36. Cohen-Sygal Y, Wintner S (2006) Finite-state registered automata for non-concatenative morphology. Comput Linguist 32(1):49–82

    Article  MATH  MathSciNet  Google Scholar 

  37. Collins M (2002) Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, EMNLP ’02, Philadelphia, Vol 10. Association for Computational Linguistics, pp 1–8. doi:http://dx.doi.org/10.3115/1118693.1118694

  38. Daelemans W, van den Bosch A (2005) Memory-based language processing. Studies in natural language processing. Cambridge University Press, Cambridge

    Book  Google Scholar 

  39. Darwish K (2002) Building a shallow Arabic morphological analyzer in one day. In: Rosner M, Wintner S (eds) ACL’02 workshop on computational approaches to Semitic languages, Philadelphia, pp 47–54

    Google Scholar 

  40. Daya E, Roth D, Wintner S (2007) Learning to identify Semitic roots. In: Soudi A, Neumann G, van den Bosch A (eds) Arabic computational morphology: knowledge-based and empirical methods, text, speech and language technology, vol 38. Springer, Dordrecht, pp 143–158

    Chapter  Google Scholar 

  41. Diab M (2007) Improved Arabic base phrase chunking with a new enriched POS tag set. In: Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources, Prague, pp 89–96. http://www.aclweb.org/anthology/W/W07/W07-0812

  42. Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004, Boston

    Google Scholar 

  43. Dichy J, Farghaly A (2003) Roots and patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual lexical database centered on Arabic be built. In: Proceedings of the MT-Summit IX workshop on machine translation for Semitic languages, New Orleans, pp 1–8

    Google Scholar 

  44. Duh K, Kirchhoff K (2005) POS tagging of dialectal Arabic: a minimally supervised approach. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 55–62. http://www.aclweb.org/anthology/W/W05/W05-0708

  45. El Kholy A, Habash N (2010) Orthographic and morphological processing for English-Arabic statistical machine translation. In: In actes de traitement automatique des langues naturelles (TALN), Montréal

    Google Scholar 

  46. El Kholy A, Habash N (2010) Techniques for Arabic morphological detokenization and orthographic denormalization. In: Proceedings of LREC-2010, Valletta (Malta)

    Google Scholar 

  47. Elming J, Habash N (2007) Combination of statistical word alignments based on multiple preprocessing schemes. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Companion Volume, Short Papers, Prague, pp 25–28. http://www.aclweb.org/anthology/N/N07/N07-2007

  48. Fissaha Adafre S (2005) Part of speech tagging for Amharic using conditional random fields. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 47–54. http://www.aclweb.org/anthology/W/W05/W05-0707

  49. Fissaha S, Haller J (2003) Amharic verb lexicon in the context of machine translation. In: Proceedings of the TALN workshop on natural language processing of minority languages, Batz-sur-Mer

    Google Scholar 

  50. Forsberg M (2007) Three tools for language processing: BNF converter, functional morphology, and extract. PhD thesis, Göteborg University and Chalmers University of Technology

    Google Scholar 

  51. Forsberg M, Ranta A (2004) Functional morphology. In: Proceedings of the ninth ACM SIGPLAN international conference on functional programming (ICFP’04), Snowbird. ACM, New York, pp 213–223

    Google Scholar 

  52. Fraenkel AS (1976) All about the Responsa retrieval project – what you always wanted to know but were afraid to ask. Jurimetrics J 16(3):149–156

    MathSciNet  Google Scholar 

  53. Gadish R (ed) (2001) Klalei ha-Ktiv Hasar ha-Niqqud, 4th edn. Academy for the Hebrew Language, Jerusalem. (in Hebrew)

    Google Scholar 

  54. Gambäck B, Olsson F, Argaw AA, Asker L (2009) An Amharic corpus for machine learning. In: Proceedings of the 6th world congress of African linguistics, Cologne

    Google Scholar 

  55. Gambäck B, Olsson F, Argaw AA, Asker L (2009) Methods for Amharic part-of-speech tagging. In: Proceedings of the first workshop on language technologies for African languages, Athen. Association for Computational Linguistics, Stroudsburg, pp 104–111

    Google Scholar 

  56. Gasser M (2009) Semitic morphological analysis and generation using finite state transducers with feature structures. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens. Association for Computational Linguistics, pp 309–317. http://www.aclweb.org/anthology/E09-1036

  57. Gasser M (2011) HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya, Bibliotheca Alexandrina, Alexandria, pp 94–99

    Google Scholar 

  58. Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of 4th international conference on language resources and evaluation (LREC), Lisbon, pp 43–46

    Google Scholar 

  59. Goldberg Y, Tsarfaty R (2008) A single generative model for joint morphological segmentation and syntactic parsing. In: Proceedings of ACL-08: HLT, Columbus. Association for Computational Linguistics, pp 371–379. http://www.aclweb.org/anthology/P/P08/P08-1043

  60. Goldstein L (1991) Generation and inflection of the possession inflection of Hebrew nouns. Master’s thesis, Technion, Haifa (in Hebrew)

    Google Scholar 

  61. Habash N (2004) Large scale lexeme based arabic morphological generation. In: Proceedings of traitement automatique du langage naturel (TALN-04), Fez

    Google Scholar 

  62. Habash N (2007) Arabic morphological representations for machine translation. In: van den Bosch A, Soudi A (eds) Arabic computational morphology: knowledge-based and empirical methods. Springer, Dordrecht

    Google Scholar 

  63. Habash N (2010) Introduction to Arabic natural language processing. Synthesis lectures on human language technologies. Morgan & Claypool, San Rafael. doi:http://dx.doi.org/10.2200/S00277ED1V01Y201008HLT010

  64. Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL’05), University of Michigan. Association for Computational Linguistics, Ann Arbor, pp 573–580. http://www.aclweb.org/anthology/P/P05/P05-1071

  65. Habash N, Rambow O (2006) MAGEAD: a morphological analyzer and generator for the Arabic dialects. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 681–688. http://www.aclweb.org/anthology/P/P06/P06-1086

  66. Habash N, Rambow O (2007) Arabic diacritization through full morphological tagging. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; Companion Volume, Short Papers, Rochester. Association for Computational Linguistics, pp 53–56. http://www.aclweb.org/anthology/N/N07/N07-2014

  67. Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Moore RC, Bilmes JA, Chu-Carroll J, Sanderson M (eds) HLT-NAACL, New York. The Association for Computational Linguistics

    Google Scholar 

  68. Habash N, Rambow O, Kiraz G (2005) Morphological analysis and generation for Arabic dialects. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 17–24. http://www.aclweb.org/anthology/W/W05/W05-0703

  69. Habash N, Gabbard R, Rambow O, Kulick S, Marcus M (2007) Determining case in Arabic: learning complex linguistic behavior requires complex linguistic features. In: Proceeings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), Prague

    Google Scholar 

  70. Habash N, Rambow O, Roth R (2009) MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Choukri K, Maegaard B (eds) Proceedings of the second international conference on Arabic language resources and tools, Cairo, The MEDAR Consortium

    Google Scholar 

  71. Habash N, Diab M, Rabmow O (2012) Conventional orthography for Dialectal Arabic. In: Proceedings of the language resources and evaluation conference (LREC), Istanbul

    Google Scholar 

  72. Habash N, Eskander R, Hawwari A (2012) A morphological analyzer for Egyptian Arabic. In: Proceedings of the twelfth meeting of the special interest group on computational morphology and phonology, Montréal. Association for Computational Linguistics, pp 1–9. http://www.aclweb.org/anthology/W12-2301

  73. Haertel RA, McClanahan P, Ringger EK (2010) Automatic diacritization for low-resource languages using a hybrid word and consonant CMM. In: Human language technologies: the 2010 annual conference of the north american chapter of the Association for Computational Linguistics, HLT ’10, Stroudsburg. Association for Computational Linguistics, pp 519–527

    Google Scholar 

  74. Hajič J (2000) Morphological tagging: Data vs. dictionaries. In: Proceedings of ANLP-NAACL conference, Seattle, pp 94–101

    Google Scholar 

  75. Hajič J, Hladká B (1998) Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal. Association for Computational Linguistics, Stroudsburg, pp 483–490. doi:http://dx.doi.org/10.3115/980845.980927, http://dx.doi.org/10.3115/980845.980927

  76. Harley HB (2006) English words: a linguistic introduction. The language library. Wiley-Blackwell, Malden

    Google Scholar 

  77. Hetzron R (ed) (1997) The Semitic languages. Routledge, London/New York

    Google Scholar 

  78. Hulden M (2009) Foma: a finite-state compiler and library. In: Proceedings of the demonstrations session at EACL 2009, Athens. Association for Computational Linguistics, pp 29–32. http://www.aclweb.org/anthology/E09-2008

  79. Hulden M (2009) Revisiting multi-tape automata for Semitic morphological analysis and generation. In: Proceedings of the EACL 2009 workshop on computational approaches to Semitic languages, Athens. Association for Computational Linguistics, pp 19–26. http://www.aclweb.org/anthology/W09-0803

  80. Itai A, Wintner S (2008) Language resources for Hebrew. Lang Resour Eval 42(1):75–98

    Article  Google Scholar 

  81. Johnson CD (1972) Formal aspects of phonological description. Mouton, The Hague

    Google Scholar 

  82. Kammoun NC, Belguith LH, Mesfar S (2010) Arabic POS tagging based on NooJ grammars and the Arabic morphological analyzer MORPH2. In: Proceedings of NooJ 2010, Komotini

    Google Scholar 

  83. Kaplan RM, Kay M (1994) Regular models of phonological rule systems. Comput Linguist 20(3):331–378

    Google Scholar 

  84. Karttunen L, Beesley KR (2001) A short history of two-level morphology. In: Talk given at the ESSLLI workshop on finite state methods in natural language processing. http://www.helsinki.fi/esslli/evening/20years/twol-history.html

  85. Kataja L, Koskenniemi K (1988) Finite-state description of Semitic morphology: a case study of ancient Akkadian. In: COLING, Budapest, pp 313–315

    Google Scholar 

  86. Kay M (1987) Nonconcatenative finite-state morphology. In: Proceedings of the third conference of the European chapter of the Association for Computational Linguistics, Copenhagen, pp 2–10

    Google Scholar 

  87. Khoja S (2001) APT: Arabic part-of-speech tagger. In: Proceedings of the student workshop at the second meeting of the North American chapter of the Association for Computational Linguistics (NAACL2001), Pittsburgh

    Google Scholar 

  88. Kiraz GA (2000) Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic. Comput Linguist 26(1):77–105

    Article  Google Scholar 

  89. Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. The Department of General Linguistics, University of Helsinki

    Google Scholar 

  90. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML-01), Williamstown, pp 282–289

    Google Scholar 

  91. Lavie A, Itai A, Ornan U, Rimon M (1988) On the applicability of two-level morphology to the inflection of Hebrew verbs. In: Proceedings of the international conference of the ALLC, Jerusalem

    Google Scholar 

  92. Lee J, Naradowsky J, Smith DA (2011) A discriminative model for joint morphological disambiguation and dependency parsing. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, Portland. Association for Computational Linguistics, pp 885–894. http://www.aclweb.org/anthology/P11-1089

  93. Maamouri M, Bies A, Buckwalter T, Mekki W (2004) The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR conference on Arabic language resources and tools, Cairo, pp 102–109

    Google Scholar 

  94. Macks A (2002) Parsing Akkadian verbs with Prolog. In: Proceedings of the ACL-02 workshop on computational approaches to Semitic languages, Philadelphia

    Google Scholar 

  95. MacWhinney B (2000) The CHILDES project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah

    Google Scholar 

  96. Magdy W, Darwish K (2006) Arabic OCR error correction using character segment correction, language modeling, and shallow morphology. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney. Association for Computational Linguistics, pp 408–414. http://www.aclweb.org/anthology/W/W06/W06-1648

  97. Mohamed E, Kübler S (2009) Diacritization for real-world Arabic texts. In: Proceedings of the international conference RANLP-2009, pp 251–257. http://www.aclweb.org/anthology/R09-1047

  98. Mohamed E, Kübler S (2010) Arabic part of speech tagging. In: Proceedings of the seventh conference on international language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta

    Google Scholar 

  99. Mohamed E, Kübler S (2010) Is Arabic part of speech tagging feasible without word segmentation? In: Human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, HLT’10, Los Angeles. Association for Computational Linguistics, Stroudsburg, pp 705–708. http://dl.acm.org/citation.cfm?id=1857999.1858104

  100. Nelken R, Shieber SM (2005) Arabic diacritization using weighted finite-state transducers. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 79–86. http://www.aclweb.org/anthology/W/W05/W05-0711

  101. Netzer Y, Adler M, Gabay D, Elhadad M (2007) Can you tag the modal? You should. In: Proceedings of the ACL-2007 workshop on computational approaches to Semitic languages, Prague

    Google Scholar 

  102. Nir B, MacWhinney B, Wintner S (2010) A morphologically-analyzed CHILDES corpus of Hebrew. In: Proceedings of the seventh conference on international language resources and evaluation (LREC’10), Valletta. European Language Resources Association (ELRA), pp 1487–1490

    Google Scholar 

  103. Ornan U (1985) Indexes and concordances in a phonemic Hebrew script. In: Proceedings of the ninth world congress of Jewish studies, World Union of Jewish Studies, Jerusalem, pp 101–108. (in Hebrew)

    Google Scholar 

  104. Ornan U (1985) Vocalization by a computer: a linguistic lesson. In: Luria BZ (ed) Avraham Even-Shoshan book, Kiryat-Sefer, Jerusalem, pp 67–76. (in Hebrew)

    Google Scholar 

  105. Ornan U (1986) Phonemic script: a central vehicle for processing natural language – the case of Hebrew. Technical report 88.181, IBM Research Center, Haifa

    Google Scholar 

  106. Ornan U (1987) Computer processing of Hebrew texts based on an unambiguous script. Mishpatim 17(2):15–24. (in Hebrew)

    Google Scholar 

  107. Ornan U, Katz M (1995) A new program for Hebrew index based on the Phonemic Script. Technical report LCL 94-7, Laboratory for Computational Linguistics, Technion, Haifa

    Google Scholar 

  108. Ornan U, Kazatski W (1986) Analysis and synthesis processes in Hebrew morphology. In: Proceedings of the 21 national data processing conference, Israel. (in Hebrew)

    Google Scholar 

  109. Owens J (1997) The Arabic grammatical tradition. In: Hetzron R (ed) The Semitic languages. Routledge, London/New York, chap 3, pp 46–58

    Google Scholar 

  110. Pinkas G (1985) A linguistic system for information retrieval. Maase Hoshev 12:10–16. (in Hebrew)

    Google Scholar 

  111. Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K (eds) Proceedings of the conference on empirical methods in natural language processing, Copenhagen. Association for Computational Linguistics, pp 133–142

    Google Scholar 

  112. Roark B, Sproat RW (2007) Computational approaches to morphology and syntax. Oxford University Press, New York

    Google Scholar 

  113. Roche E, Schabes Y (eds) (1997) Finite-state language processing. Language, speech and communication. MIT, Cambridge

    Google Scholar 

  114. Roth D (1998) Learning to resolve natural language ambiguities: a unified approach. In: Proceedings of AAAI-98 and IAAI-98, Madison, pp 806–813

    Google Scholar 

  115. Roth R, Rambow O, Habash N, Diab M, Rudin C (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08: HLT, Short Papers, Columbus. Association for Computational Linguistics, pp 117–120. http://www.aclweb.org/anthology/P/P08/P08-2030

  116. Sadat F, Habash N (2006) Combination of Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 1–8. http://www.aclweb.org/anthology/P/P06/P06-1001

  117. Schippers A (1997) The Hebrew grammatical tradition. In: Hetzron R (ed) The Semitic languages. Routledge, London/New York, chap 4, pp 59–65

    Google Scholar 

  118. Shaalan K, Abo Bakr HM, Ziedan I (2009) A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 workshop on computational approaches to Semitic languages, Semitic’09, Athens. Association for Computational Linguistics, Stroudsburg, pp 27–35

    Google Scholar 

  119. Shacham D, Wintner S (2007) Morphological disambiguation of Hebrew: a case study in classifier combination. In: Proceedings of EMNLP-CoNLL 2007, the conference on empirical methods in natural language processing and the conference on computational natural language learning, Prague. Association for Computational Linguistics

    Google Scholar 

  120. Shany-Klein M (1990) Generation and analysis of Segolate noun inflection in Hebrew. Master’s thesis, Technion, Haifa. (in Hebrew)

    Google Scholar 

  121. Shany-Klein M, Ornan U (1992) Analysis and generation of Hebrew Segolate nouns. In: Ornan U, Arieli G, Doron E (eds) Hebrew computational linguistics. Ministry of Science and Technology, Jerusalem, chap 4, pp 39–51. (in Hebrew)

    Google Scholar 

  122. Shapira M, Choueka Y (1964) Mechanographic analysis of Hebrew morphology: possibilities and achievements. Leshonenu 28(4):354–372. (in Hebrew)

    Google Scholar 

  123. Silberztein M (2004) NooJ: an object-oriented approach. In: Muller C, Royauté J, Silberztein M (eds) INTEX pour la linguistique et le traitement automatique des Langues, cahiers de la MSH Ledoux, Presses Universitaires de Franche-Comté, pp 359–369

    Google Scholar 

  124. Smith NA, Smith DA, Tromble RW (2005) Context-based morphological disambiguation with random fields. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver. Association for Computational Linguistics, Morristown, pp 475–482

    Google Scholar 

  125. Smrž O (2007) ElixirFM: implementation of functional Arabic morphology. In: Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources, Prague. Association for Computational Linguistics, Stroudsburg, pp 1–8

    Google Scholar 

  126. Smrž O (2007) Functional Arabic morphology. Prague Bull Math Linguist 88:5–30

    Google Scholar 

  127. Soudi A, van den Bosch A, Neumann G (2007) Arabic computational morphology: knowledge-based and empirical methods. Springer, Dordrecht

    Book  Google Scholar 

  128. Sproat RW (1992) Morphology and computation. MIT, Cambridge

    Google Scholar 

  129. Tachbelie MY, Abate ST, Besacier L (2011) Part-of-speech tagging for under-resourced and morphologically rich languages – the case of Amharic, Bibliotheca Alexandrina, Alexandria, pp 50–55. http://aflat.org/files/HLTD201109.pdf

  130. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 joint SIGDAT conference on empirical methods in natural language processing and very large corpora, Morristown. Association for Computational Linguistics, pp 63–70. doi:http://dx.doi.org/10.3115/1117794.1117802

  131. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: NAACL ’03: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology, Edmonton. Association for Computational Linguistics, Morristown, pp 173–180. doi:http://dx.doi.org/10.3115/1073445.1073478

  132. Tsarfaty R (2006) Integrated morphological and syntactic disambiguation for Modern Hebrew. In: Proceedings of the COLING/ACL 2006 student research workshop, Sydney. Association for Computational Linguistics, pp 49–54. http://www.aclweb.org/anthology/P/P06/P06-3009

  133. Tsuruoka Y, Tsujii J (2005) Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT’05, Vancouver. Association for Computational Linguistics, Stroudsburg, pp 467–474. doi:http://dx.doi.org/10.3115/1220575.1220634, http://dx.doi.org/10.3115/1220575.1220634

  134. Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Bozanis P, Houstis EN (eds) Advances in informatics. LNCS, vol 3746. Springer, Berlin/Heidelberg, chap 36, pp 382–392. doi:10.1007/11573036_36, http://dx.doi.org/10.1007/11573036_36

  135. Wintner S (2004) Hebrew computational linguistics: past and future. Artif Intell Rev 21(2):113–138. doi:http://dx.doi.org/10.1023/B:AIRE.0000020865.73561.bc

  136. Wintner S (2008) Strengths and weaknesses of finite-state technology: a case study in morphological grammar development. Nat Lang Eng 14(4):457–469. doi:http://dx.doi.org/10.1017/S1351324907004676

  137. Wintner S (2009) Language resources for Semitic languages: challenges and solutions. In: Nirenburg S (ed) Language engineering for lesser-studied languages. IOS, Amsterdam, pp 277–290

    Google Scholar 

  138. Yona S, Wintner S (2008) A finite-state morphological grammar of Hebrew. Nat Lang Eng 14(2):173–190

    Article  Google Scholar 

  139. Zitouni I, Sorensen JS, Sarikaya R (2006) Maximum entropy based restoration of Arabic diacritics. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 577–584. http://www.aclweb.org/anthology/P/P06/P06-1073

  140. Zwicky AM, Pullum GK (1983) Cliticization vs. inflection: English n’t. Language 59(3): 502–513

    Article  Google Scholar 

Download references

Acknowledgements

I am tremendously grateful to Nizar Habash for his help and advice; it would have been hard to complete this chapter without them. All errors and misconceptions are, of course, solely my own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuly Wintner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wintner, S. (2014). Morphological Processing of Semitic Languages. In: Zitouni, I. (eds) Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45358-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45358-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45357-1

  • Online ISBN: 978-3-642-45358-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics