Abstract
Arabic Natural Language Processing (ANLP) has known a significant progress during the last years. As a result, several ANLP tools and applications have been developed such as tokenizers, Part Of Speech taggers, morphological analyzers, syntactic parsers, etc. However, most of these tools are heterogeneous and can hardly be reused in the context of other projects without modifying their source code. This problem is known to be common to all languages, that is why some advanced NLP language independent architectures have emerged such as GATE (Cunningham et al. ACL, 2002) [1] and UIMA (Apache UIMA Manuals and Guides, 2015) [2]. These architectures have significantly changed the way NLP applications are designed and developed. They provide homogenous structures for applications, better reusability and faster deployment. In this article, we present a comparative study of NLP architectures in order to specify which ones can suitably deal with Arabic language and its specificities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
References
Cunningham, H., Maynard, D., Bontcheva, K., Tablan,V.: A framework and graphical development environment for robust NLP tools and applications, In: ACL (2002)
Apache UIMA Manuals and Guides. https://uima.apache.org/d/uimaj-current/index.html. Last Accessed 11 Nov 2015
Internet World Users By Language: Top 10 Languages. http://www.internetworldstats.com/stats7.htm. Last Accessed 11 Nov 2015
Althobaiti, M., Kruschwitz, U., Poesio, M.: AraNLP: a Java-based library for the processing of Arabic text, In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC), Reykjavik (2014)
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic, In LREC’14, Reykjavik (2014)
Shaalan, K.: A survey of arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)
Prasanth, Y., Nakul, S.: Integrating natural language processing and software engineering. Int. J. Softw. Eng. Appl. 9(11), 127–136 (2015)
Leidner, J.L.: Current issues in software engineering for natural language processing. In: Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems, Stroudsburg (2003)
Bikel, D.M., Zitouni, I.: Combining natural language processing engines. In: Multilingual Natural Language Processing Applications: From Theory to Practice, pp. 523–542. IBM Press (2012)
Besançon, R., De Chalendar, G., Ferret, O., Gara, F., Mesnard, O., Laïb, M., Semmar, N.: LIMA: a multilingual framework for linguistic analysis and linguistic resources development and evaluation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta (2010)
Alias-i, LingPipe. http://alias-i.com/lingpipe. Last Accessed 01 Mar 2017
Apache. OpenNLP. https://opennlp.apache.org/. Last Accessed 01 Mar 2017
NLTK. Natural language toolkit. http://www.nltk.org/. Last Accessed 01 Mar 2017
Silberztein, M.: NooJ: a linguistic development environment. http://www.nooj-association.org/. Last Accessed 01 Mar 2017
Microsoft. Arabic toolkit service (ATKS). https://www.microsoft.com/en-us/research/project/arabic-toolkit-service-atks/. Last Accessed 01 Mar 2017
Diab,M., Habash, N., Rambow, O.: Arabic language disambiguation for natural language processing applications. http://innovation.columbia.edu/technologies/cu14012_arabic-language-disambiguation-for-natural-language-processing-applications. Last Accessed 01 Mar 2017
Jaafar, Y., Bouzoubaa, K.: SAFAR: software architecture for Arabic language processing. http://arabic.emi.ac.ma/safar/. Last Accessed 01 Mar 2017
Ferrucci, D., Lally, A.: Building an example application with the unstructured information management architecture. IBM Syst. J. 43(3), 455–475 (2004)
Wilcock, G.: Introduction to linguistic annotation and text analytics. Synth. Lect. Hum. Lang. Technol. 2(1), 1–159 (2009)
Cunningham,H., Maynard, D., Bontcheva, K., et al.: Text Processing with Gate. Gateway Press, CA (2011)
Cunningham, H., Maynard, D., Bontcheva, K., et al.: Developing language processing components with GATE version 8 (a User Guide). https://gate.ac.uk/sale/tao/split.html. Last Accessed 11 Nov 2015
Ingersoll, G.S., Morton, T.S., Farris, A.L.: Taming text: how to find, organize, and manipulate it. Manning Publications Co. (2013)
Buyko, E., Wermter, J., Poprat, M., Hahn, U.: Automatically adapting an NLP core engine to the biology domain. In: Proceedings of the Joint BioLINK-Bio-Ontologies Meeting. A Joint Meeting of the ISMB Special Interest Group on Bio-Ontologies and the BioLINK Special Interest Group on Text Data Mining in Association with ISMB (2006)
Silberztein, M.: Complex annotations with NooJ. In: International NooJ Conference, Barcelone (2007)
Silberztein, M., Váradi, T., Tadic, M.: Open source multi-platform NooJ for NLP. In: COLING (Demos), Mumbai (2012)
Bird, S., Klein, E., Loper, E., Baldridge, J.: Multidisciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics (2008)
Perkins, J.: Python 3 Text Processing with NLTK 3 Cookbook, Packt Publishing Ltd (2014)
Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)
Habash, N., Rambow, O., Roth, R.: Mada+Â tokan: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo (2009)
Mona, D.: Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In: chez 2nd International Conference on Arabic Language Resources and Tools (2009)
Souteh, Y., Bouzoubaa, K.: SAFAR platform and its morphological layer. In: Proceeding of the Eleventh Conference on Language Engineering (ESOLEC’2011), Cairo, Egypt (2011)
Jaafar, Y., Bouzoubaa, K.: Arabic natural language processing from software engineering to complex pipelines. In: Conference on Intelligent Text Processing and Computational Linguistics (CICLing’2015), Cairo, Egypt (2015)
Alkhalil Morpho Sys (2013). http://sourceforge.net/projects/alkhalil/. Last Accessed 23 Apr 2015
Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 1.0 (2002)
Khoja, S., Garside, R.: Stemming Arabic Text. Lancaster, UK, Computing Department, Lancaster University (1999)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Arabic Computational Morphology: Knowledge-Based and Empirical Methods, pp. 221–243. Springer, Netherlands (2007)
Algasaier, H.: The ISRI Arabic stemmer. http://www.nltk.org/_modules/nltk/stem/isri.html. Last Accessed 11 Nov 2015
Motaz, S.: Arabic computational linguistics. http://sourceforge.net/projects/ar-text-mining/. Last Accès le 11 Nov 2015
Zerrouki, T.: Tashaphyne 0.2. https://pypi.python.org/pypi/Tashaphyne. Last Accessed 11 Nov 2015
Jaafar, Y., Namly, D., Bouzoubaa, K., Yousfi, A.: Enhancing Arabic stemming process using resources and benchmarking tools. J King Saud Univ.—Comput. Inf. Sci. (2016)
Spence Green, C.D.M.: Better Arabic parsing: baselines, evaluations, and analysis. In: Chez the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing (2010)
Buckwalter, T.: Arabic transliteration/encoding chart. http://languagelog.ldc.upenn.edu/myl/ldc/morph/buckwalter.html. Last Accessed 12 Nov 2015
Jaafar, Y., Bouzoubaa, K.: Benchmark of Arabic morphological analyzers: challenges and solutions. In: 9th International Conference on Intelligent Systems: Theories and Applications (SITA’14), Rabat, Morocco (2014)
Namly, D., Bouzoubaa, K., Tahir, Y., Khamar, H.: Development of Arabic particles lexicon using the LMF framework. In: Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015), Sousse, Tunisia (2015)
Acknowledgements
We would like to thank Professor Mohamed Issam Kabbaj (Mohammadia School of Engineers, Mohammed Vth University in Rabat, Morocco) for his feedback on the work presented in Sect. 4.2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Jaafar, Y., Bouzoubaa, K. (2018). A Survey and Comparative Study of Arabic NLP Architectures. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)