Skip to main content

Statistical Arabic Grammar Analyzer

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Abstract

The grammar analysis is considered one of the complex tasks in the Natural Language Processing (NLP) field, since it determines the relation between the words in the sentence. This paper proposes a system to automate the grammar analysis of Arabic language sentences (Sentence Grammar Analysis, <ErAb Aljml). The task of Arabic grammar analysis has been divide into three sub-tasks, of determining the grammatical tag, the case, and the sign of each token in the level of the sentence. For the task of Arabic grammar analysis, a dataset has been compiled and a statistical system that assigns an appropriate tag, case and sign has been implemented. The proposed system has been tested and the experiments show that it achieves a 89.74% token accuracy and a 63.56% overall sentence accuracy and it has the potential to be further improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Daoud, E., Basata, A.: A framework to automate the parsing of arabic language sentences. Int. Arab J. Inf. Technol. 6(2), 191–195 (2009)

    Google Scholar 

  2. Attia, M.: An ambiguity-controlled morphological analyzer for modern standard arabic modelling finite state networks. In: Challenges of Arabic for NLP/MT Conference, vol. 200610. The British Computer Society, London (2006)

    Google Scholar 

  3. Attia, M.A.: Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester (2008)

    Google Scholar 

  4. Buckwalter, T.: Buckwalter arabic morphological analyzer version 2.0. linguistic data consortium, university of pennsylvania, 2002. ldc cat alog no.: Ldc2004l02. Tech. rep., ISBN 1-58563-324-0 (2004)

    Google Scholar 

  5. Daoud, A.M.: Morphological analysis and diacritical arabic text compression. Computer Journal of the International Journal of ACM Jordan 1(1), 41–47 (2010)

    Google Scholar 

  6. Diab, M.: Second generation amira tools for arabic processing: Fast and robust tokenization, pos tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools. Citeseer (2009)

    Google Scholar 

  7. Diab, M., Hacioglu, K., Jurafsky, D.: Automated methods for processing arabic text: from tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)

    Google Scholar 

  8. Dukes, K., Buckwalter, T.: A dependency treebank of the quran using traditional arabic grammar. In: 2010 the 7th International Conference on Informatics and Systems (INFOS), pp. 1–7. IEEE (2010)

    Google Scholar 

  9. Green, S., DeNero, J.: A class-based agreement model for generating accurately inflected translations. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 146–155. Association for Computational Linguistics (2012)

    Google Scholar 

  10. Green, S., Manning, C.D.: Better arabic parsing: Baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 394–402. Association for Computational Linguistics (2010)

    Google Scholar 

  11. Habash, N., Rambow, O., Roth, R.: Mada+ tokan: A toolkit for arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)

    Google Scholar 

  12. Habash, N., Roth, R.M.: Catib: The columbia arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics (2009)

    Google Scholar 

  13. Ibrahim, M.N., Mahmoud, M.N., El-Reedy, D.A.: Bel-arabi: Advanced arabic grammar analyzer

    Google Scholar 

  14. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The penn arabic treebank: Building a large-scale annotated arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)

    Google Scholar 

  15. Nawar, M.N.: Improving arabic tokenization and pos tagging using morphological analyzer. In: Hassanien, A.E., Tolba, M.F., Taher Azar, A. (eds.) AMLTA 2014. CCIS, vol. 488, pp. 46–53. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Nawar, M.N., Ragheb, M.M.: Fast and robust arabic error correction system. In: ANLP 2014, p. 143 (2014)

    Google Scholar 

  17. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Nawar Ibrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ibrahim, M.N. (2015). Statistical Arabic Grammar Analyzer. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18111-0_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18110-3

  • Online ISBN: 978-3-319-18111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics