Abstract
This research is a corpus-based study which aims to reveal the ability of the recent morphological analyzers to handle the ambiguities that apear in the Egyptian Arabic electronic texts written in Social Media. The research evaluates the automatic annotation of the Egyptian Arabic Penn-Treebank ARZ ATB using CALIMA, the Columbian Arabic diaLectal Morphological Analyzer. The corpus is collected by Linguistic Consortium Data as a part of BOLT project, which aims to develop a technology that enables English speakers to retrieve and understand information from informal foreign language sources including chat, text messaging, and spoken conversations. In order to reach better results, the research concentrated on the nouns category. For achieving the research task, a gold standard was built by using the most frequent 1723 nominal word types from 6543 word types of 16226 words selected randomly from the ARZ ATB corpus. The total number of the collected morphemes was 2798. Recall, Precision, F-score, and accuracy of the tool performance were calculated, the recall was 89%, the precision was 94.5%, F-score was 93.7% and the accuracy reached to 93%. The errors were classified to reveal the main morphological ambiguities that the tool couldn’t handle due to the development of the written form of the Egyptian dialect in social media. According to the results, the Orthographic variations that appeared in the Egyptian Arabic dialects reflected the lack of an authorized writing system governs the using of the dialect in its written form. Thus, gathering and describing the main orthographic variations is imperative to handle the ambiguities that are revealed in the study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bies, A., Song, Z., Maamouri, M., Grimes, S., Lee, H., Wright, J., Strassel, S., Habash, N., Eskander, R., Rambow, O.: Transliteration of Arabizi into Arabic orthography: developing a parallel annotated Arabizi-Arabic script sms/chat corpus. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pp. 93–103 (2014)
Eisenstein, J.: What to do about bad language on the internet. In: HLT-NAACL, pp. 359–369 (2013)
Maamouri, M., Bies, A., Kulick, S., Ciul, M., Habash, N., Eskander, R.: Developing an Egyptian Arabic treebank: impact of dialectal morphology on annotation and tool development. In: LREC, pp. 2348–2354 (2014)
Badawi, E.S., Carter, M., Gully, A.: Modern Written Arabic: a Comprehensive Grammar. Routledge (2013)
Attia, M.A.: Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester (2008)
Habash, N., Rambow, O., Roth, R.: Mada+ tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)
Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC 14, 1094–1101 (2014)
Habib, M.B., Van Keulen, M.: Information extraction for social media. Association for Computational Linguistics (2014)
Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 1–9. Association for Computational Linguistics (2012)
Habash, N., Diab, M.T., Rambow, O.: Conventional orthography for dialectal Arabic. In: LREC, pp. 711–718 (2012)
FaaĂź, G., Heid, U., Schmid, H.: Design and application of a gold standard for morphological analysis: smor as an example of morphological evaluation. In: LREC (2010)
Sawalha, M.S.S.: Open-source Resources and Standards for Arabic Word Structure Analysis: Fine Grained Morphological Analysis of Arabic Text Corpora. University of Leeds (2011)
Gadalla, H.A.: Comparative Morphology of Standard and Egyptian Arabic, vol. 5. Lincom Europa Munich (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Marzouk, R., Kareh, S.E. (2018). An Evaluation of the Morphological Analysis of Egyptian Arabic TreeBank. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)