Skip to main content

An Evaluation of the Morphological Analysis of Egyptian Arabic TreeBank

  • Chapter
  • First Online:
Intelligent Natural Language Processing: Trends and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

  • 3384 Accesses

Abstract

This research is a corpus-based study which aims to reveal the ability of the recent morphological analyzers to handle the ambiguities that apear in the Egyptian Arabic electronic texts written in Social Media. The research evaluates the automatic annotation of the Egyptian Arabic Penn-Treebank ARZ ATB using CALIMA, the Columbian Arabic diaLectal Morphological Analyzer. The corpus is collected by Linguistic Consortium Data as a part of BOLT project, which aims to develop a technology that enables English speakers to retrieve and understand information from informal foreign language sources including chat, text messaging, and spoken conversations. In order to reach better results, the research concentrated on the nouns category. For achieving the research task, a gold standard was built by using the most frequent 1723 nominal word types from 6543 word types of 16226 words selected randomly from the ARZ ATB corpus. The total number of the collected morphemes was 2798. Recall, Precision, F-score, and accuracy of the tool performance were calculated, the recall was 89%, the precision was 94.5%, F-score was 93.7% and the accuracy reached to 93%. The errors were classified to reveal the main morphological ambiguities that the tool couldn’t handle due to the development of the written form of the Egyptian dialect in social media. According to the results, the Orthographic variations that appeared in the Egyptian Arabic dialects reflected the lack of an authorized writing system governs the using of the dialect in its written form. Thus, gathering and describing the main orthographic variations is imperative to handle the ambiguities that are revealed in the study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bies, A., Song, Z., Maamouri, M., Grimes, S., Lee, H., Wright, J., Strassel, S., Habash, N., Eskander, R., Rambow, O.: Transliteration of Arabizi into Arabic orthography: developing a parallel annotated Arabizi-Arabic script sms/chat corpus. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pp. 93–103 (2014)

    Google Scholar 

  2. Eisenstein, J.: What to do about bad language on the internet. In: HLT-NAACL, pp. 359–369 (2013)

    Google Scholar 

  3. Maamouri, M., Bies, A., Kulick, S., Ciul, M., Habash, N., Eskander, R.: Developing an Egyptian Arabic treebank: impact of dialectal morphology on annotation and tool development. In: LREC, pp. 2348–2354 (2014)

    Google Scholar 

  4. Badawi, E.S., Carter, M., Gully, A.: Modern Written Arabic: a Comprehensive Grammar. Routledge (2013)

    Google Scholar 

  5. Attia, M.A.: Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester (2008)

    Google Scholar 

  6. Habash, N., Rambow, O., Roth, R.: Mada+ tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)

    Google Scholar 

  7. Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC 14, 1094–1101 (2014)

    Google Scholar 

  8. Habib, M.B., Van Keulen, M.: Information extraction for social media. Association for Computational Linguistics (2014)

    Google Scholar 

  9. Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 1–9. Association for Computational Linguistics (2012)

    Google Scholar 

  10. Habash, N., Diab, M.T., Rambow, O.: Conventional orthography for dialectal Arabic. In: LREC, pp. 711–718 (2012)

    Google Scholar 

  11. FaaĂź, G., Heid, U., Schmid, H.: Design and application of a gold standard for morphological analysis: smor as an example of morphological evaluation. In: LREC (2010)

    Google Scholar 

  12. Sawalha, M.S.S.: Open-source Resources and Standards for Arabic Word Structure Analysis: Fine Grained Morphological Analysis of Arabic Text Corpora. University of Leeds (2011)

    Google Scholar 

  13. Gadalla, H.A.: Comparative Morphology of Standard and Egyptian Arabic, vol. 5. Lincom Europa Munich (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reham Marzouk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Marzouk, R., Kareh, S.E. (2018). An Evaluation of the Morphological Analysis of Egyptian Arabic TreeBank. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics