Abstract
The grammar analysis is considered one of the complex tasks in the Natural Language Processing (NLP) field, since it determines the relation between the words in the sentence. This paper proposes a system to automate the grammar analysis of Arabic language sentences (Sentence Grammar Analysis, <ErAb Aljml). The task of Arabic grammar analysis has been divide into three sub-tasks, of determining the grammatical tag, the case, and the sign of each token in the level of the sentence. For the task of Arabic grammar analysis, a dataset has been compiled and a statistical system that assigns an appropriate tag, case and sign has been implemented. The proposed system has been tested and the experiments show that it achieves a 89.74% token accuracy and a 63.56% overall sentence accuracy and it has the potential to be further improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Daoud, E., Basata, A.: A framework to automate the parsing of arabic language sentences. Int. Arab J. Inf. Technol. 6(2), 191–195 (2009)
Attia, M.: An ambiguity-controlled morphological analyzer for modern standard arabic modelling finite state networks. In: Challenges of Arabic for NLP/MT Conference, vol. 200610. The British Computer Society, London (2006)
Attia, M.A.: Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester (2008)
Buckwalter, T.: Buckwalter arabic morphological analyzer version 2.0. linguistic data consortium, university of pennsylvania, 2002. ldc cat alog no.: Ldc2004l02. Tech. rep., ISBN 1-58563-324-0 (2004)
Daoud, A.M.: Morphological analysis and diacritical arabic text compression. Computer Journal of the International Journal of ACM Jordan 1(1), 41–47 (2010)
Diab, M.: Second generation amira tools for arabic processing: Fast and robust tokenization, pos tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools. Citeseer (2009)
Diab, M., Hacioglu, K., Jurafsky, D.: Automated methods for processing arabic text: from tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)
Dukes, K., Buckwalter, T.: A dependency treebank of the quran using traditional arabic grammar. In: 2010 the 7th International Conference on Informatics and Systems (INFOS), pp. 1–7. IEEE (2010)
Green, S., DeNero, J.: A class-based agreement model for generating accurately inflected translations. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 146–155. Association for Computational Linguistics (2012)
Green, S., Manning, C.D.: Better arabic parsing: Baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 394–402. Association for Computational Linguistics (2010)
Habash, N., Rambow, O., Roth, R.: Mada+ tokan: A toolkit for arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)
Habash, N., Roth, R.M.: Catib: The columbia arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics (2009)
Ibrahim, M.N., Mahmoud, M.N., El-Reedy, D.A.: Bel-arabi: Advanced arabic grammar analyzer
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The penn arabic treebank: Building a large-scale annotated arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Nawar, M.N.: Improving arabic tokenization and pos tagging using morphological analyzer. In: Hassanien, A.E., Tolba, M.F., Taher Azar, A. (eds.) AMLTA 2014. CCIS, vol. 488, pp. 46–53. Springer, Heidelberg (2014)
Nawar, M.N., Ragheb, M.M.: Fast and robust arabic error correction system. In: ANLP 2014, p. 143 (2014)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ibrahim, M.N. (2015). Statistical Arabic Grammar Analyzer. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-18111-0_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)