Towards an automatic extraction of synonyms for Quranic Arabic WordNet
- 388 Downloads
In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %.
KeywordsQuranic WordNet Arabic dictionaries Cosine similarity Vector space model Synonymy Semantic relations
- Aliwy, A. H. (2013). Arabic morphosyntactic raw text part of speech tagging system. Repozytorium Uniwersytetu Warszawskiego.Google Scholar
- Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational linguistics and intelligent text processing (pp. 136–145). Berlin: Springer.Google Scholar
- Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated corpus for Arabic speech and language processing. In Proceedings of language resources and evaluation conference (LREC) 2012.Google Scholar
- Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Building a WordNet for Arabic. In Proceedings of the fifth international conference on language resources and evaluation (LREC 2006).Google Scholar
- Fellbaum, C., & Vossen, P. (2007). Connecting the universal to the specific. In T. Ishida, S. R. Fussell & P. T. J. M. Vossen (Eds.), Intercultural collaboration: First international workshop (Vol. 4568, pp. 1–16). Lecture notes in computer science. New York: SpringerGoogle Scholar
- Mandala, R., Takenobu, T., & Hozumi, T. (1998). The use of WordNet in information retrieval. In: Paper presented at the use of WordNet in natural language processing systems: Proceedings of the conference.Google Scholar
- Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244.Google Scholar
- Poprat, M., Beisswanger, E., & Hahn, U. (2008, June). Building a BioWordNet by using WordNet’s data formats and WordNet’s software infrastructure: A failure story. In Software engineering, testing, and quality assurance for natural language processing (pp. 31–39). Association for Computational Linguistics.Google Scholar
- Princeton. (2015). Retrived February 3, 2015, from https://wordnet.princeton.edu/.
- Qurany. (2015). Retrived February 3, 2015, from http://quranytopics.appspot.com/.
- Sawalha, M., & Atwell, E. (2010). Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10).Google Scholar
- Sawalha, M. (2011). Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora. PhD Thesis. School of Computing. University of Leeds.Google Scholar
- Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0). In proceedings of LRE-Rel 2: 2nd workshop on language resources and evaluation for religious texts at LREC 2014. Reykjavik, Iceland.Google Scholar
- Sawalha, M. S., Brierley, C., & Atwell, E. (2012). Open-source boundary-annotated Qur’an Corpus for Arabic and phrase breaks prediction in classical and modern standard Arabic text. Journal of Speech Sciences, 2(2), 175–191.Google Scholar
- Shoaib, M., Yasin, M. N., Hikmat, U. K., Saeed, M. I., & Khiyal, M. S. H. (2009, October). Relational WordNet model for semantic search in Holy Quran. In International conference on emerging technologies, 2009. ICET 2009 (pp. 29–34). IEEE.Google Scholar
- Siemiński, A. (2011). Wordnet based word sense disambiguation. In Computational collective intelligence. Technologies and applications (pp. 405–414). Berlin:Springer.Google Scholar
- Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005, November). Semantic similarity methods in wordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management (pp. 10–16). ACM.Google Scholar
- Yih, W.-T., & Meek, C. (2007). Improving similarity measures for short segments of text. In Paper presented at the AAAI.Google Scholar