Advertisement

International Journal of Speech Technology

, Volume 19, Issue 2, pp 177–189 | Cite as

Towards an automatic extraction of synonyms for Quranic Arabic WordNet

  • Manal AlMaayah
  • Majdi Sawalha
  • Mohammad A. M. Abushariah
Article

Abstract

In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %.

Keywords

Quranic WordNet Arabic dictionaries Cosine similarity Vector space model Synonymy Semantic relations 

References

  1. Abouenour, L., Bouzoubaa, K., & Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation, 47(3), 891–917.CrossRefGoogle Scholar
  2. Aliwy, A. H. (2013). Arabic morphosyntactic raw text part of speech tagging system. Repozytorium Uniwersytetu Warszawskiego.Google Scholar
  3. Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational linguistics and intelligent text processing (pp. 136–145). Berlin: Springer.‏Google Scholar
  4. Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated corpus for Arabic speech and language processing. In Proceedings of language resources and evaluation conference (LREC) 2012.Google Scholar
  5. Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Building a WordNet for Arabic. In Proceedings of the fifth international conference on language resources and evaluation (LREC 2006).Google Scholar
  6. Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.MATHGoogle Scholar
  7. Fellbaum, C., & Vossen, P. (2007). Connecting the universal to the specific. In T. Ishida, S. R. Fussell & P. T. J. M. Vossen (Eds.), Intercultural collaboration: First international workshop (Vol. 4568, pp. 1–16). Lecture notes in computer science. New York: SpringerGoogle Scholar
  8. Fellbaum, c, & Vossen, P. (2012). Challenges for a multilingual WordNet. Language Resources and Evaluation, 46, 313–326.CrossRefGoogle Scholar
  9. Mandala, R., Takenobu, T., & Hozumi, T. (1998). The use of WordNet in information retrieval. In: Paper presented at the use of WordNet in natural language processing systems: Proceedings of the conference.Google Scholar
  10. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.CrossRefGoogle Scholar
  11. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244.‏Google Scholar
  12. Miller, G. A., & Fellbaum, C. (2007). WordNet then and now. Language Resources and Evaluation, 41, 209–214.CrossRefGoogle Scholar
  13. Poprat, M., Beisswanger, E., & Hahn, U. (2008, June). Building a BioWordNet by using WordNet’s data formats and WordNet’s software infrastructure: A failure story. In Software engineering, testing, and quality assurance for natural language processing (pp. 31–39). Association for Computational Linguistics.‏Google Scholar
  14. Princeton. (2015). Retrived February 3, 2015, from https://wordnet.princeton.edu/.
  15. Qurany. (2015). Retrived February 3, 2015, from http://quranytopics.appspot.com/.
  16. Sawalha, M., & Atwell, E. (2010). Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10).Google Scholar
  17. Sawalha, M. (2011). Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora. PhD Thesis. School of Computing. University of Leeds.Google Scholar
  18. Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0). In proceedings of LRE-Rel 2: 2nd workshop on language resources and evaluation for religious texts at LREC 2014. Reykjavik, Iceland.Google Scholar
  19. Sawalha, M. S., Brierley, C., & Atwell, E. (2012). Open-source boundary-annotated Qur’an Corpus for Arabic and phrase breaks prediction in classical and modern standard Arabic text. Journal of Speech Sciences, 2(2), 175–191.Google Scholar
  20. Shoaib, M., Yasin, M. N., Hikmat, U. K., Saeed, M. I., & Khiyal, M. S. H. (2009, October). Relational WordNet model for semantic search in Holy Quran. In International conference on emerging technologies, 2009. ICET 2009  (pp. 29–34). IEEE.Google Scholar
  21. Siemiński, A. (2011). Wordnet based word sense disambiguation. In Computational collective intelligence. Technologies and applications (pp. 405–414). Berlin:Springer.‏Google Scholar
  22. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005, November). Semantic similarity methods in wordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management (pp. 10–16). ACM.‏Google Scholar
  23. Yih, W.-T., & Meek, C. (2007). Improving similarity measures for short segments of text. In Paper presented at the AAAI.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Manal AlMaayah
    • 1
  • Majdi Sawalha
    • 1
  • Mohammad A. M. Abushariah
    • 1
  1. 1.Computer Information Systems Department, King Abdullah II School for Information TechnologyThe University of JordanAmmanJordan

Personalised recommendations