Skip to main content

Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Abstract

This paper presents a cascade of morpho-syntactic tools to deal with Arabic natural language processing. It begins with the description of a large coverage formalization of the Arabic lexicon. The built electronic dictionary, named "El-DicAr", which stands for “Electronic Dictionary for Arabic”, links inflectional, morphological, and syntactic-semantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over 3 million inflected forms. El-DicAr represents the linguistic engine for the automatic analyzer, built through a lexical analysis module, and a cascade of morpho-syntactic tools including: a morphological analyzer, a spell-checker, a named entity recognition tool, an automatic annotator and tools for linguistic research and contextual exploration. The morphological analyzer identifies the component morphemes of the agglutinative forms using large coverage morphological grammars. The spell-checker corrects the most frequent typographical errors. The lexical analysis module handles the different vocalization statements in Arabic written texts. Finally, the named entity recognition tool is based on a combination of the morphological analysis results and a set of rules represented as local grammars.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achour, H.: Contribution à l’étude du problème de la voyellation automatique de l’arabe. PhD Thesis, Paris7 University (1998)

    Google Scholar 

  2. Beesley, K.: Arabic Finite-State Morphological Analysis and Generation. In: Proceedings of the16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, Denmark, pp. 89–94 (1996)

    Google Scholar 

  3. Beesley, K.: Arabic Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001. In: Proceedings of ACL/EACL 2001 Workshop, ARABIC Language Processing: Status and Prospects, Toulouse, France, pp. 3–19 (2001)

    Google Scholar 

  4. Bohas, G., Guillaume, J.P., Kouloughli, D.E.: The Arabic Linguistic Tradition. Reprinted version in the Georgetown Classics in Arabic Language and Linguistics of the 1990 edition. Georgetown University Press, Washington (2006)

    Google Scholar 

  5. Chomsky, N.: Aspects of the Theory of Syntax. Trad. J.-C. Milner (1975)

    Google Scholar 

  6. Cohen, D.: Essai d’une analyse automatique de l’arabe. In: Cohen, D. (ed.) Etudes de linguistique sémitique et arabe, pp. 49–78. Mouton, Paris (1970)

    Google Scholar 

  7. Dichy, J.: Pour une lexicomatique de l’arabe: l’unité lexicale simple de l’inventaire du mot. META - Journal de traduction 42(2), 291–306 (1997)

    Google Scholar 

  8. Dichy, J.: On lemmatization in Arabic. A formal definition of the Arabic entries of multilingual lexical databases. In: Proceedings of ACL/ EACL 2001, Workshop, ARABIC Language Processing: Status and Prospects, Toulouse, France, pp. 52–65 (2001)

    Google Scholar 

  9. Harris, Z.S.: Transformational Theory. Language 41(9), 363–401 (1985)

    Google Scholar 

  10. Kadri, Y., Benyamina, A.: Un système d’analyse syntaxico-sémantique du langage arabe non voyellé. Engineer software thesis, Oran University (1992)

    Google Scholar 

  11. Khoja, S., Garside, R., Knowles, G.: A tagset for the morpho-syntactic tagging of Arabic. In: Proceedings of the International conference CL 2001, Lancaster, UK, pp. 341–349 (2001)

    Google Scholar 

  12. Kouloughli, D.E.: Lexique fondamental de l’arabe standard moderne. l’Harmattan, Paris (1991)

    Google Scholar 

  13. Mac Donald, M.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus processing for Lexical Acquisition, pp. 21–39. Massachussetts Institute of Technology (1996)

    Google Scholar 

  14. Mesfar, S.: NooJ4Web: an on-line concordance service. In: Proceedings of the 10th International NooJ conference, pp. 173–189. Cambridge Scholars Press (2007)

    Google Scholar 

  15. Mesfar, S.: An Automatic morpho-syntactic analyzer and a named entities recognition system for standard Arabic. PhD Thesis, Franche-Comté University, France (2008)

    Google Scholar 

  16. Revuz, D.: Dictionnaires et lexiques: méthodes et algorithmes. PhD Thesis, Paris7 University, France (2001)

    Google Scholar 

  17. Silberztein, M.: NooJ’s Dictionaries. In: Proceedings of the 2nd Language and Technology Conference (LTC 2005), Poznan, Poland, pp. 128–133 (2005)

    Google Scholar 

  18. Silberztein, M.: An Alternative Approach to Tagging. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 1–11. Springer, Heidelberg (2007) (invited talk)

    Chapter  Google Scholar 

  19. Silberztein, M.: NooJ Manual (2010), www.nooj4nlp.net

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mesfar, S. (2010). Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics