Abstract
This paper presents a cascade of morpho-syntactic tools to deal with Arabic natural language processing. It begins with the description of a large coverage formalization of the Arabic lexicon. The built electronic dictionary, named "El-DicAr", which stands for “Electronic Dictionary for Arabic”, links inflectional, morphological, and syntactic-semantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over 3 million inflected forms. El-DicAr represents the linguistic engine for the automatic analyzer, built through a lexical analysis module, and a cascade of morpho-syntactic tools including: a morphological analyzer, a spell-checker, a named entity recognition tool, an automatic annotator and tools for linguistic research and contextual exploration. The morphological analyzer identifies the component morphemes of the agglutinative forms using large coverage morphological grammars. The spell-checker corrects the most frequent typographical errors. The lexical analysis module handles the different vocalization statements in Arabic written texts. Finally, the named entity recognition tool is based on a combination of the morphological analysis results and a set of rules represented as local grammars.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Achour, H.: Contribution à l’étude du problème de la voyellation automatique de l’arabe. PhD Thesis, Paris7 University (1998)
Beesley, K.: Arabic Finite-State Morphological Analysis and Generation. In: Proceedings of the16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, Denmark, pp. 89–94 (1996)
Beesley, K.: Arabic Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001. In: Proceedings of ACL/EACL 2001 Workshop, ARABIC Language Processing: Status and Prospects, Toulouse, France, pp. 3–19 (2001)
Bohas, G., Guillaume, J.P., Kouloughli, D.E.: The Arabic Linguistic Tradition. Reprinted version in the Georgetown Classics in Arabic Language and Linguistics of the 1990 edition. Georgetown University Press, Washington (2006)
Chomsky, N.: Aspects of the Theory of Syntax. Trad. J.-C. Milner (1975)
Cohen, D.: Essai d’une analyse automatique de l’arabe. In: Cohen, D. (ed.) Etudes de linguistique sémitique et arabe, pp. 49–78. Mouton, Paris (1970)
Dichy, J.: Pour une lexicomatique de l’arabe: l’unité lexicale simple de l’inventaire du mot. META - Journal de traduction 42(2), 291–306 (1997)
Dichy, J.: On lemmatization in Arabic. A formal definition of the Arabic entries of multilingual lexical databases. In: Proceedings of ACL/ EACL 2001, Workshop, ARABIC Language Processing: Status and Prospects, Toulouse, France, pp. 52–65 (2001)
Harris, Z.S.: Transformational Theory. Language 41(9), 363–401 (1985)
Kadri, Y., Benyamina, A.: Un système d’analyse syntaxico-sémantique du langage arabe non voyellé. Engineer software thesis, Oran University (1992)
Khoja, S., Garside, R., Knowles, G.: A tagset for the morpho-syntactic tagging of Arabic. In: Proceedings of the International conference CL 2001, Lancaster, UK, pp. 341–349 (2001)
Kouloughli, D.E.: Lexique fondamental de l’arabe standard moderne. l’Harmattan, Paris (1991)
Mac Donald, M.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus processing for Lexical Acquisition, pp. 21–39. Massachussetts Institute of Technology (1996)
Mesfar, S.: NooJ4Web: an on-line concordance service. In: Proceedings of the 10th International NooJ conference, pp. 173–189. Cambridge Scholars Press (2007)
Mesfar, S.: An Automatic morpho-syntactic analyzer and a named entities recognition system for standard Arabic. PhD Thesis, Franche-Comté University, France (2008)
Revuz, D.: Dictionnaires et lexiques: méthodes et algorithmes. PhD Thesis, Paris7 University, France (2001)
Silberztein, M.: NooJ’s Dictionaries. In: Proceedings of the 2nd Language and Technology Conference (LTC 2005), Poznan, Poland, pp. 128–133 (2005)
Silberztein, M.: An Alternative Approach to Tagging. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 1–11. Springer, Heidelberg (2007) (invited talk)
Silberztein, M.: NooJ Manual (2010), www.nooj4nlp.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mesfar, S. (2010). Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)