Advertisement

Grammar-Lexis Relations in the Computational Morphology of Arabic

  • Joseph Dichy
  • Ali Farghaly
Part of the Text, Speech and Language Technology book series (TLTB, volume 38)

Abstract

Grammar-lexis rules and relations ensuring correct insertion of major lexical entries (nouns, verbs and deverbals) play an essential part in the computational morphology of Arabic. This chapter, which is based on the experiences of the DIINAR.1 Arabic lexical resource and related software, and on that of the first version of the SYSTRAN Arabic-English MT system, outlines previous approaches of the computational morphology of the language (Section 2): root and pattern (shortly recalled); lexeme-based; machine learning and statistical; stems, based on roots and patterns, and finally, the stem-based approach, including root and pattern as well as grammar-lexis information. The latter, which is the most compliant to the requirements of machine-translation and other high-level applications, is further developed in Section 3 of the Arabic word-form and a mapping of rules and relations accounting for grammar-lexis relations operating within the boundaries of that complex unit. In the Word-Formatives Grammar, rules and relations involving the lexical nucleus of the word-form play a crucial part and are formalised in a computational perspective. The stem either coincides with, or is the core of the nucleus, because lexical entries include two overall categories: in the first, stem and entry coincide; in the second, the lexical entry corresponds to a morphological compound encompassing the stem and a lexicalized extension (in most cases, a suffix which is part of the entry). Correct relations between the lexical nucleus and the other formatives included in the word-form are ensured through morphosyntactic specifiers associated to each entry of the lexical database. These relations, which have been included in the DIINAR.1 database, are both finite in number and exhaustive in coverage. They also allow computational morphology and other applications to rely on a good restriction of the generated lexica: only cliticized or affixed formatives that can effectively be associated with a given lexical nucleus are added, and ‘illegal’ ones are ruled out. In the DIINAR.1 resource, the effective number of inflected word-forms is 7,774,938 (about nine times less than one would obtain through ‘blind’ generation). A comprehensive mapping of examples is given. Their compatibility with applications going beyond computational morphology is also outlined

Keywords

Machine Translation Lexical Entry Nucleus Formative Statistical Machine Translation Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbès, R. (2004). La conception et la réalisation d’un concordancier électronique pour l’arabe. Thèse de doctorat en sciences de l’information, ENSSIB/INSA, Lyon.Google Scholar
  2. Abbès, R., Dichy, J. & Hassoun, M. (2004). The Architecture of a Standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In Proceedings of the COLING-04 Workshop on Computational Approaches to Arabic Script-based Languages (pp. 15–22), Geneva.Google Scholar
  3. Ammar, S. & Dichy, J. (1999). Les verbes arabe.Paris: Hatier. Fully Arabic version, with specific introduction: Al-’afālu l-ςarabiyya, الأفعال العربية (same publisher and year).Google Scholar
  4. Arar, M. (2003). Dāhiratu l-labsi fī l-ςarabiyya [The phenomenon of ambiguity in Arabic, ظاهرة اللبس في العربية]. Amman: Dār Wā’il.Google Scholar
  5. Aronoff, M. (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: MIT Press.Google Scholar
  6. Beesley, K. (1989). Computer Analysis of Arabic Morphology: A two-level approach with detours. In Comrie, B. & Eid, M. (Eds.) (1991), Perspectives on Arabic Linguistics III: Papers from the Third Annual Symposium on Arabic Linguistics (pp. 155–172). Amsterdam: John Benjamins.Google Scholar
  7. Beesley, K. (2001). Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans in 2001. In Proceedings of the ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 1–8), Toulouse, France.Google Scholar
  8. Beesley, K. & Karttunen, L. (2003). Finite State Morphology. Stanford, CA: CSLI Publications.Google Scholar
  9. Buckwalter, T. (2002). Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, Philadelphia. LDC catalog number LDC2002L49 and ISBN 1-58563-257–0. Retrieved December 16, 2006, from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49Google Scholar
  10. Cantineau, J. (1950a). La notion de ‘schème’ et son altération dans diverses langues sémitiques. In Semitica, 3, 73–83.Google Scholar
  11. Cantineau, J. (1950b). Racines et schèmes. In Mélanges offerts á William Marçais. Paris : Maisonneuve.Google Scholar
  12. Cassuto, P. (2000). Le classement dans les dictionnaires de l’hébreu. In Cassuto, P. & Larcher, P. (Eds.), La sémitologie, aujourd’hui (pp. 133–158).Google Scholar
  13. Cassuto. P. & Larcher, P. (Eds.). (2000). La sémitologie, aujourd’hui. Travaux du Cercle linguistique d’Aix-en-Provence n°16, Publications de l’université de Provence:Google Scholar
  14. Cohen, D. (1961). Essai d’une analyse automatique de l’arabe. T.A. informations. Reprod. in Cohen, D. études de linguistique sémitique et arabe (pp. 49–78). The Hague/Paris: Mouton.Google Scholar
  15. Desclés, J.-P., dir. (1983). (H. Abaab, J.-P. Desclés, J. Dichy, D.E. Kouloughli, M.S. Ziadah). Conception d’un synthétiseur et d’un analyseur morphologiques de l’arabe, en vue d’une utilisation en Enseignement assisté par Ordinateur. Rapport rédigé à la demande du Ministère des Affaires étrangères.Google Scholar
  16. Diab, M. & Resnik, P. (2001). An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics (pp. 255–262), Philadelphia, PA.Google Scholar
  17. Dichy, J. (1984). Vers un modèle d’analyse automatique du mot graphique non-vocalisé en arabe. Presented at the Conference on “Communication entre langues européennes et langues orientales”, Montvillargenne, Oise. Revised version in Dichy, J. & Hassoun, M. (Eds.), (1989), pp. 92–158.Google Scholar
  18. Dichy, J. (1987). The SAMIA Research Program, Year Four, Progress and Prospects. In Processing Arabic Report 2 (pp. 1–26). T.C.M.O., Nijmegen University, Netherlands.Google Scholar
  19. Dichy, J. (1990). L’écriture dans la représentation de la langue : la lettre et le mot en arabe. Doctorat d’état, Université Lumière Lyon 2, Lyon.Google Scholar
  20. Dichy, J. (1993). Deux grands ‘mythes scientifiques’ relatifs au système d’écriture de l’arabe. In Savoir, images, mirages, Journées d’Études arabes, Special issue ofl’Arabisant (pp. 32–33). Paris: Association Française des Arabisants.Google Scholar
  21. Dichy, J. (1997). Pour une lexicomatique de l’arabe : l’unité lexicale simple et l’inventaire fini des spécificateurs du domaine du mot. Meta 42, 291–306. Presses de l’Université de Montréal.Google Scholar
  22. Dichy, J. (2000). Morphosyntactic Specifiers to be associated to Arabic Lexical Entries - Methodological and Theoretical Aspects. In Proceedings of ACIDA 2000 (Vol. ‘Corpora and Natural Language Processing’, pp. 55–60), Monastir, Tunisia.Google Scholar
  23. Dichy, J. (2003). Sens des schèmes et sens des racines en arabe: le principe de figement lexical (PFL) et ses effets sur le lexique d’une langue sémitique. In Rémi-Giraud, S. & Panier, L., dir., La polysémie ou l’empire des sens (pp. 189–211). Lyon: Presses Universitaires de Lyon.Google Scholar
  24. Dichy, J. (2005). Spécificateurs engendrés par les traits [± animé], [± humain], [± concret] et structures d’arguments en arabe et en français. In Béjoint, H. & Maniez, F. (Eds.), De la mesure dans les termes, Actes du colloque en hommage à Philippe Thoiron (pp. 151–181). Lyon: Presses Universitaires de Lyon.Google Scholar
  25. Dichy, J. Braham, A., Ghazali, S. & Hassoun, M. (2002). La base de connaissances linguistiques DIINAR.1 (DIctionnaire INformatisé de l’ARabe, version 1). In Braham, A. (Ed.), Proceedings of the International Symposium on the Processing of Arabic, Université de la Manouba, Tunisia.Google Scholar
  26. Dichy, J. & Farghaly, A. (2003). Roots and Patterns vs. Stems plus Grammar-Lexis Specifications: on what basis should a multilingual lexical database centred on Arabic be built? In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 1–8), New Orleans.Google Scholar
  27. Dichy, J. & Hassoun, M. (Eds.) (1989). Simulation de modèles linguistiques et Enseignement Assisté par Ordinateur de l’arabe – Travaux SAMIA I. Paris: Conseil International de la Langue Française.Google Scholar
  28. Dien, D., Kiem, H. & Hovy, E. (2003). BTL: a Hybrid Model for English-Vietnamese Machine Translation. In Proceedings of the IXth MT Summit (pp. 87–94), New Orleans.Google Scholar
  29. Ditters, E. (1992). A Formal Approach to Arabic Syntax: The Noun phrase and the Verb Phrase. Ph.D. dissertation, Catholic University of Nijmegen, Netherlands.Google Scholar
  30. Farghaly, A. (1987). Three Level Morphology. Paper presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.Google Scholar
  31. Farghaly, A. (1994). Discontinuity in the Lexicon: A Case from Arabic Morphology. In International Conference on Arabic Linguistics, The American University in Cairo, Cairo, Egypt.Google Scholar
  32. Fassi-Fehri, A. (1997). Al-Maςjama wa-t-taxTīT – NaDarāt jadīda fī qaDāyā l-luγ a l-ςarabiyya [Lexicography and language planning. Arabic Language matters reconsidered, المعجمة والتخطيط – نظرات جديدة في قضايا اللغة العربية]. Casablanca, Morocco: Al-Markaz al-thaqāfiyy al-ςarabiyy.Google Scholar
  33. Forster, G., Grandrabur, S., Langlais, P., Plamondon, P., Russel, G. & Simard, M. (2003). Statistical Machine Translation: Rapid Development with limited Resources. In Proceedings of the IXth MT Summit (pp. 110–117), New Orleans.Google Scholar
  34. Frost, R., Deutsch, A. & Forster, K.I. (2000). Decomposing morphologically complex words in a non linear morphology. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 751–65.CrossRefGoogle Scholar
  35. Frost, R., Forster, K.I. & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory and Cognition, 23, 829–856.CrossRefGoogle Scholar
  36. Geith, M. & El-Saadany, T. (1987). Arabic morphological analyzer on a personal computer. Presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.Google Scholar
  37. Ghenima, M. (1998). Analyse morpho-syntaxique en vue de la voyellation assistée par ordinateur des textes écrits en arabe. Ph.D. dissertation, ENSSIB/Université Lyon 2.Google Scholar
  38. Grainger, J., Dichy, J., El-Halfaoui, M. & Bamhamed, M. (2003). Approche expérimentale de la reconnaissance du mot écrit en arabe. In Jaffré, J.-P. (Ed.), Dynamiques de l’écriture: approches pluridisciplinaires. Faits de langue, 22, 77–86.Google Scholar
  39. Hassoun, M. (1987). Conception d’un dictionnaire pour le traitement automatique de l’arabe dans différents contextes d’application., Ph.D. (thèse d’Ètat), Université Lyon 1.Google Scholar
  40. Hlal, Y. (1979). Méthode d’apprentissage pour l’analyse morphosyntaxique (expérimentée dans le cas de l’arabe et du français). Ph.D. dissertation, Université Paris-Sud, Centre d’Orsay.Google Scholar
  41. Hlal, Y. (1985a). Morphology and syntax of the Arabic language. Arab School of Sciences and Technology: Informatics 4C, 1–8.Google Scholar
  42. Hlal, Y. (1985b). Morphological analysis of Arabic speech. In Workshop Papers Kuwait/Proceedings of Kuwait Conference on Computer Processing of the Arabic Language (Section 13, pp. 273–294).Google Scholar
  43. Karttunnen, L. (1994). Constructing Lexical Transducers. In Proceedings of COLING-94, (pp. 206–411), Tokyo, Japan.Google Scholar
  44. Karttunnen, L. & Beesley, K.R. (2005). Twenty-five years of finite-state morphology. In Arppe, A., Carlson, L., Lindén, K., Piitulainen, J., Suominen, M., Vainio, M., Westerlund, H. & Yli-Jyrä, A. (Eds.), Inquiries into Words, Constraints and Contexts. Festschrift for Kimmo Koskenniemi on his 60th Birthday (2005). CSLI Studies in Computational Linguistics ONLINE, pp. 71–83. Copestake, A. (Series Ed.). Stanford, CA: CSLI Publications.Google Scholar
  45. McCarthy, J. (1981). A Prosodic Theory of Nonconcatenative Morphology. Linguistic Inquiry, 12, 373–418.Google Scholar
  46. McCarthy, John J. & Prince, Alan S. (1996). Prosodic morphology. Technical report 32, Rutgers University Center for Cognitive science.Google Scholar
  47. Melčuk, I. A. (1982). Towards a Language of Linguistics: A System of Formal Notions for Theoretical Morphology. München: Wilhem Fink Verlag.Google Scholar
  48. Nikkhou, M. (Ed.) (2004). NEMLAR International Conference on Arabic Language Resources and Tools, Cairo. Paris: ELDA.Google Scholar
  49. Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morpho-syntactic analyzer of unvowelled Arabic texts. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 66–72), Toulouse, France.Google Scholar
  50. Ouersighni, R. (2002). La conception et la réalisation d’un système d’analyse morpho-syntaxique robuste pour l’arabe: utilisation pour la détection et le diagnostic des fautes d’accord. Ph.D. dissertation, ENSSIB/Université Lyon 2.Google Scholar
  51. Rogati, M., McCarley, S. & Yang, Y. (2003). Unsupervised Learning of Arabic Stemming Using a Parallel Corpus. In 41st Annual Meeting of the Association of Computational Linguistics (pp. 391–398), Sapporo, Japan.Google Scholar
  52. Roman, A. (1990). Grammaire de l’arabe. Paris: P.U.F., coll. “Que sais-je?”.Google Scholar
  53. Roman, A. (1999). La création lexicale en arabe, ressources et limites de la nomination dans une langue humaine naturelle. Presses Universitaires de Lyon.Google Scholar
  54. Rousseau, J. (1987). La découverte de la racine en sémitique par l’idéologue Volney. Historiographia Linguistica, 14(3), 341–365.Google Scholar
  55. Sampson, G. (1985). Writing systems. Stanford University Press.Google Scholar
  56. Schafer, C. & Yarowsky, D. (2003). A Two-Level Syntax-Based Approach to Arabic-English Statistical Machine Translation. In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 45–52), New Orleans.Google Scholar
  57. Soudi, A., Cavalli-Sforza, V. & Jamari, A. (2001). A Computational Lexeme-Based Treatment of Arabic Morphology. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 155–162), Toulouse, France.Google Scholar
  58. Troupeau, G. (1984). La notion de ‘racine’ chez les grammairiens arabes anciens. In Auroux, S., Glatiny, M., Joly, A., Nicolas, A. & Rosier, I. (Eds.), Matériaux pour une histoire des théories linguistiques, pp. 239–245. Presses Universitaires de Lille.Google Scholar
  59. Zaafrani, R. (2002). Développement d’un environnement interactif d’apprentissage avec ordinateur de l’arabe langue étrangère. Ph.D. dissertation, ENSSIB/Université Lyon 2.Google Scholar
  60. Zwiep, I.E. (1996). The Hebrew linguistic tradition of the Middle Ages. Histoire Épistémologie Langage, 18(1), 41–61.Google Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  • Joseph Dichy
    • 1
  • Ali Farghaly
    • 2
  1. 1.Université Lumière-Lyon 2ICAR research lab (CNRS/Lyon 2)69365 Lyon Cedex 07France
  2. 2.Oracle USARedwood ShoresUSA

Personalised recommendations