Skip to main content

A Survey and Comparative Study of Arabic NLP Architectures

  • Chapter
  • First Online:
Intelligent Natural Language Processing: Trends and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

Abstract

Arabic Natural Language Processing (ANLP) has known a significant progress during the last years. As a result, several ANLP tools and applications have been developed such as tokenizers, Part Of Speech taggers, morphological analyzers, syntactic parsers, etc. However, most of these tools are heterogeneous and can hardly be reused in the context of other projects without modifying their source code. This problem is known to be common to all languages, that is why some advanced NLP language independent architectures have emerged such as GATE (Cunningham et al. ACL, 2002) [1] and UIMA (Apache UIMA Manuals and Guides, 2015) [2]. These architectures have significantly changed the way NLP applications are designed and developed. They provide homogenous structures for applications, better reusability and faster deployment. In this article, we present a comparative study of NLP architectures in order to specify which ones can suitably deal with Arabic language and its specificities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.britishcouncil.org/voices-magazine/surprising-facts-about-arabic-language.

  2. 2.

    http://www.lrec-conf.org/lrec2010/.

  3. 3.

    https://uima.apache.org/.

  4. 4.

    http://www-01.ibm.com/software/ecm/content-analytics/uima.html.

  5. 5.

    https://gate.ac.uk/.

  6. 6.

    https://opennlp.apache.org/.

  7. 7.

    http://www.nooj-association.org/.

  8. 8.

    http://www.nltk.org.

  9. 9.

    https://www.gutenberg.org/.

  10. 10.

    http://www.alias-i.com/lingpipe/.

  11. 11.

    http://alias-i.com/.

  12. 12.

    https://github.com/aymara/lima.

  13. 13.

    http://www.kalisteo.eu/en/index.htm.

  14. 14.

    https://www.microsoft.com/en-us/research/project/arabic-toolkit-service-atks/.

  15. 15.

    http://innovation.columbia.edu/technologies/cu14012_arabic-language-disambiguation-for-natural-language-processing-applications.

  16. 16.

    https://sites.google.com/site/mahajalthobaiti/resources.

  17. 17.

    http://arabic.emi.ac.ma/safar/.

  18. 18.

    http://arabic.emi.ac.ma:8080/SafarWeb_V2/.

  19. 19.

    http://searchservervirtualization.techtarget.com/definition/platform.

  20. 20.

    http://whatis.techtarget.com/definition/framework.

References

  1. Cunningham, H., Maynard, D., Bontcheva, K., Tablan,V.: A framework and graphical development environment for robust NLP tools and applications, In: ACL (2002)

    Google Scholar 

  2. Apache UIMA Manuals and Guides. https://uima.apache.org/d/uimaj-current/index.html. Last Accessed 11 Nov 2015

  3. Internet World Users By Language: Top 10 Languages. http://www.internetworldstats.com/stats7.htm. Last Accessed 11 Nov 2015

  4. Althobaiti, M., Kruschwitz, U., Poesio, M.: AraNLP: a Java-based library for the processing of Arabic text, In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC), Reykjavik (2014)

    Google Scholar 

  5. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic, In LREC’14, Reykjavik (2014)

    Google Scholar 

  6. Shaalan, K.: A survey of arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)

    Article  Google Scholar 

  7. Prasanth, Y., Nakul, S.: Integrating natural language processing and software engineering. Int. J. Softw. Eng. Appl. 9(11), 127–136 (2015)

    Google Scholar 

  8. Leidner, J.L.: Current issues in software engineering for natural language processing. In: Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems, Stroudsburg (2003)

    Google Scholar 

  9. Bikel, D.M., Zitouni, I.: Combining natural language processing engines. In: Multilingual Natural Language Processing Applications: From Theory to Practice, pp. 523–542. IBM Press (2012)

    Google Scholar 

  10. Besançon, R., De Chalendar, G., Ferret, O., Gara, F., Mesnard, O., Laïb, M., Semmar, N.: LIMA: a multilingual framework for linguistic analysis and linguistic resources development and evaluation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta (2010)

    Google Scholar 

  11. Alias-i, LingPipe. http://alias-i.com/lingpipe. Last Accessed 01 Mar 2017

  12. Apache. OpenNLP. https://opennlp.apache.org/. Last Accessed 01 Mar 2017

  13. NLTK. Natural language toolkit. http://www.nltk.org/. Last Accessed 01 Mar 2017

  14. Silberztein, M.: NooJ: a linguistic development environment. http://www.nooj-association.org/. Last Accessed 01 Mar 2017

  15. Microsoft. Arabic toolkit service (ATKS). https://www.microsoft.com/en-us/research/project/arabic-toolkit-service-atks/. Last Accessed 01 Mar 2017

  16. Diab,M., Habash, N., Rambow, O.: Arabic language disambiguation for natural language processing applications. http://innovation.columbia.edu/technologies/cu14012_arabic-language-disambiguation-for-natural-language-processing-applications. Last Accessed 01 Mar 2017

  17. Jaafar, Y., Bouzoubaa, K.: SAFAR: software architecture for Arabic language processing. http://arabic.emi.ac.ma/safar/. Last Accessed 01 Mar 2017

  18. Ferrucci, D., Lally, A.: Building an example application with the unstructured information management architecture. IBM Syst. J. 43(3), 455–475 (2004)

    Article  Google Scholar 

  19. Wilcock, G.: Introduction to linguistic annotation and text analytics. Synth. Lect. Hum. Lang. Technol. 2(1), 1–159 (2009)

    Article  Google Scholar 

  20. Cunningham,H., Maynard, D., Bontcheva, K., et al.: Text Processing with Gate. Gateway Press, CA (2011)

    Google Scholar 

  21. Cunningham, H., Maynard, D., Bontcheva, K., et al.: Developing language processing components with GATE version 8 (a User Guide). https://gate.ac.uk/sale/tao/split.html. Last Accessed 11 Nov 2015

  22. Ingersoll, G.S., Morton, T.S., Farris, A.L.: Taming text: how to find, organize, and manipulate it. Manning Publications Co. (2013)

    Google Scholar 

  23. Buyko, E., Wermter, J., Poprat, M., Hahn, U.: Automatically adapting an NLP core engine to the biology domain. In: Proceedings of the Joint BioLINK-Bio-Ontologies Meeting. A Joint Meeting of the ISMB Special Interest Group on Bio-Ontologies and the BioLINK Special Interest Group on Text Data Mining in Association with ISMB (2006)

    Google Scholar 

  24. Silberztein, M.: Complex annotations with NooJ. In: International NooJ Conference, Barcelone (2007)

    Google Scholar 

  25. Silberztein, M., Váradi, T., Tadic, M.: Open source multi-platform NooJ for NLP. In: COLING (Demos), Mumbai (2012)

    Google Scholar 

  26. Bird, S., Klein, E., Loper, E., Baldridge, J.: Multidisciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics (2008)

    Google Scholar 

  27. Perkins, J.: Python 3 Text Processing with NLTK 3 Cookbook, Packt Publishing Ltd (2014)

    Google Scholar 

  28. Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)

    Google Scholar 

  29. Habash, N., Rambow, O., Roth, R.: Mada+ tokan: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo (2009)

    Google Scholar 

  30. Mona, D.: Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In: chez 2nd International Conference on Arabic Language Resources and Tools (2009)

    Google Scholar 

  31. Souteh, Y., Bouzoubaa, K.: SAFAR platform and its morphological layer. In: Proceeding of the Eleventh Conference on Language Engineering (ESOLEC’2011), Cairo, Egypt (2011)

    Google Scholar 

  32. Jaafar, Y., Bouzoubaa, K.: Arabic natural language processing from software engineering to complex pipelines. In: Conference on Intelligent Text Processing and Computational Linguistics (CICLing’2015), Cairo, Egypt (2015)

    Google Scholar 

  33. Alkhalil Morpho Sys (2013). http://sourceforge.net/projects/alkhalil/. Last Accessed 23 Apr 2015

  34. Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 1.0 (2002)

    Google Scholar 

  35. Khoja, S., Garside, R.: Stemming Arabic Text. Lancaster, UK, Computing Department, Lancaster University (1999)

    Google Scholar 

  36. Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Arabic Computational Morphology: Knowledge-Based and Empirical Methods, pp. 221–243. Springer, Netherlands (2007)

    Google Scholar 

  37. Algasaier, H.: The ISRI Arabic stemmer. http://www.nltk.org/_modules/nltk/stem/isri.html. Last Accessed 11 Nov 2015

  38. Motaz, S.: Arabic computational linguistics. http://sourceforge.net/projects/ar-text-mining/. Last Accès le 11 Nov 2015

  39. Zerrouki, T.: Tashaphyne 0.2. https://pypi.python.org/pypi/Tashaphyne. Last Accessed 11 Nov 2015

  40. Jaafar, Y., Namly, D., Bouzoubaa, K., Yousfi, A.: Enhancing Arabic stemming process using resources and benchmarking tools. J King Saud Univ.—Comput. Inf. Sci. (2016)

    Google Scholar 

  41. Spence Green, C.D.M.: Better Arabic parsing: baselines, evaluations, and analysis. In: Chez the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing (2010)

    Google Scholar 

  42. Buckwalter, T.: Arabic transliteration/encoding chart. http://languagelog.ldc.upenn.edu/myl/ldc/morph/buckwalter.html. Last Accessed 12 Nov 2015

  43. Jaafar, Y., Bouzoubaa, K.: Benchmark of Arabic morphological analyzers: challenges and solutions. In: 9th International Conference on Intelligent Systems: Theories and Applications (SITA’14), Rabat, Morocco (2014)

    Google Scholar 

  44. Namly, D., Bouzoubaa, K., Tahir, Y., Khamar, H.: Development of Arabic particles lexicon using the LMF framework. In: Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015), Sousse, Tunisia (2015)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Professor Mohamed Issam Kabbaj (Mohammadia School of Engineers, Mohammed Vth University in Rabat, Morocco) for his feedback on the work presented in Sect. 4.2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Younes Jaafar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jaafar, Y., Bouzoubaa, K. (2018). A Survey and Comparative Study of Arabic NLP Architectures. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics