A Purely Surface-Oriented Approach to Handling Arabic Morphology

  • Yousuf AboamerEmail author
  • Marcus Kracht
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11668)


In this paper, we introduce a completely lexicalist approach to deal with Arabic morphology. This purely surface-oriented treatment is part of a comprehensive mathematical approach to integrate Arabic syntax and semantics using overt morphological features in the string-to-meaning translation. The basic motivation of our approach is to combine semantic representations with formal descriptions of morphological units. That is, the lexicon is a collection of signs; each sign \(\delta \) is a triple \(\delta = \langle E, C, M\rangle \), such that E is the exponent, C is the combinatorics and M is the meaning of the sign. Here, we are only concerned with the exponents, i.e. the components of a morphosemantic lexicon (for a fragment of Arabic). To remain surface-oriented, we allow for discontinuity in the constituents; constituents are sequences of strings, which can only be concatenated or duplicated, but no rule can delete, add or modify any string. Arabic morphology is very well known for its complexity and richness. The word formation in Arabic poses real challenges because words are derived from roots, which bear the core meaning of their derivatives, formed by inserting vowels and maybe other consonants. The units in the sequences are so-called glued strings rather than only strings. A glued string is a string that has left and right context conditions. Optimally morphs are combined in a definite and non-exceptional linear way, as in many cases in different languages (e.g. plural in English). The process of Arabic word formation is rather complex; it is not just a sequential concatenation of morphs by placing them next to each other. But the constituents are discontinuous. Vowels and more consonants are inserted between, before and after the root consonants resulting in what we call “fractured glued string”, i.e. as a sequence of glued strings combined in diverse ways; forward concatenation, backward concatenation, forward wrapping, reduction, forward transfixation and, going beyond the multi-context free grammars (MCFGs), also reduplication.


Discontinuity Arabic morphology Surface orientation Morphosemantics 


  1. 1.
    CIA: CIA World Fact Book. Central Intelligence Agency, Washington, D.C. (2018)Google Scholar
  2. 2.
    Habash, N.: Introduction to Arabic natural language processing. Morgan and Claypool Publishers (2010)Google Scholar
  3. 3.
    Kracht, M.: Agreement morphology, argument structure and syntax. Revision 8 (2016, unpublished manuscript)Google Scholar
  4. 4.
    Ryding, K.: A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge (2005)CrossRefGoogle Scholar
  5. 5.
    Al-Sughaiyer, I., Al-Kharashi, I.: Arabic morphological analysis techniques: a comprehensive survey. J. Assoc. Inf. Sci. Technol. 55(3), 189–213 (2004)CrossRefGoogle Scholar
  6. 6.
    Soudi, A., Neumann, G., Van den Bosch, A.: Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Springer, Cham (2007). Scholar
  7. 7.
    Dichy, J., Farghaly, A.: Grammar-lexis relations in the computational morphology of Arabic. In: Soudi, A., Neumann, G., Van den Bosch, A. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods, pp. 115–140. Springer, Dordrecht (2007). Scholar
  8. 8.
    Boudchiche, M., et al.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29(2), 141–146 (2017)Google Scholar
  9. 9.
    Sawalha, M., Atwell, E.: Comparative evaluation of Arabic language morphological analysers and stemmers. In: Coling 2008: Companion volume: Posters, pp. 107–110 (2008)Google Scholar
  10. 10.
    Kay, M.: Nonconcatenative finite-state morphology. In: Proceedings of the Third Conference of the European chapter of the Association for Computational Linguistics, pp. 2–10 (1987)Google Scholar
  11. 11.
    Beesley, K.: Finite-state morphological analysis and generation of arabic at xerox research: status and plans in 2001. In: ACL Workshop on Arabic Language Processing: Status and Perspective, pp. 1–8 (2001)Google Scholar
  12. 12.
    Attia, M., et al.: A corpus-based finite-state morphological toolkit for contemporary Arabic. J. Logic Comput. 24(2), 455–472 (2013)CrossRefGoogle Scholar
  13. 13.
    Aboamer, Y., Farghaly, A.: Mariam ComLex: A Bi-Directional Finite State Morphological Transducer for MSA. In: The 29th Annual Symposium on Arabic Linguistics, at the University of Wisconsin-Milwaukee, USA (2015)Google Scholar
  14. 14.
    Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 1.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2002 L49 (2002). ISBN 1-58563-257-0Google Scholar
  15. 15.
    Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 2.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2004 L02 (2004). ISBN 1-58563-324-0Google Scholar
  16. 16.
    Habash, N., Rambow, O, Roth, R.: MADA + TOKAN: a toolkit for arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt (2009)Google Scholar
  17. 17.
    Maamouri, M., et al.: LDC Standard Arabic morphological analyzer SAMA v. 3.1. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No. LDC2010L01. ISBN 1-58563-555-3Google Scholar
  18. 18.
    Sawalha, M., Atwell, E., Abushariah, M.: SALMA: standard arabic language morphological analysis. In: 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6 (2013)Google Scholar
  19. 19.
    Abdelali, A., et al.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)Google Scholar
  20. 20.
    Taji, D., et al.: An Arabic morphological analyzer and generator with copious features. In: Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 140–150 (2018)Google Scholar
  21. 21.
    Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 1–9 (2012)Google Scholar
  22. 22.
    Partee, B., ter Meulen, A., Wall, R.: Mathematical Methods in Linguistic. Linguistic Society of America (1990)Google Scholar
  23. 23.
    Crystal, D.: A Dictionary of Linguistics and Phonetics, 6th edn. Blackwell Publishing Ltd. (2008)Google Scholar
  24. 24.
    Seki, H., et al.: On multiple context-free grammars. Theor. Comput. Sci. 88(2), 191–229 (1991)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Kracht, M., Aboamer, Y.: Argument structure and referent systems. In: 12th International Conference on Computational Semantics IWCS (2017)Google Scholar
  26. 26.
    McCarthy, J.: A prosodic theory of nonconcatenative morphology. Linguist. Inquiry 12(3), 373–418 (1981)Google Scholar
  27. 27.
    Kasami, T., Seki, H., Fujii, M.: Generalized Context-free Grammars, Multiple Context-free Grammars and Head Grammars. Preprint of WG on Natural Language of IPSJ (1987)Google Scholar
  28. 28.
    Soudi, A., Violetta C., Jamari, A.: The Arabic noun system generation. In: Proceedings of the International Symposium on the Processing of Arabic (2002)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Bielefeld UniversityBielefeldGermany

Personalised recommendations