Advertisement

Word Representations in Vector Space and their Applications for Arabic

  • Mohamed A. ZahranEmail author
  • Ahmed Magooda
  • Ashraf Y. Mahgoub
  • Hazem Raafat
  • Mohsen Rashwan
  • Amir Atyia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)

Abstract

A lot of work has been done to give the individual words of a certain language adequate representations in vector space so that these representations capture semantic and syntactic properties of the language. In this paper, we compare different techniques to build vectorized space representations for Arabic, and test these models via intrinsic and extrinsic evaluations. Intrinsic evaluation assesses the quality of models using benchmark semantic and syntactic dataset, while extrinsic evaluation assesses the quality of models by their impact on two Natural Language Processing applications: Information retrieval and Short Answer Grading. Finally, we map the Arabic vector space to the English counterpart using Cosine error regression neural network and show that it outperforms standard mean square error regression neural networks in this task.

Keywords

Word Representations Word Vectors Word Embeddings Arabic Natural Language Processing Arabic Information Retrieval Arabic Short Answer Grading Arabic-English vector space mapping Cosine regression neural network 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Collobert, R., Weston, J.: A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 160–167 (2008)Google Scholar
  2. 2.
    Mnih, A., Hinton, G.: A Scalable Hierarchical Distributed Language Model. In: NIPS: Proceedings of Neural Information Processing Systems, Vancouver, B.C, Canada, pp. 1081–1088 (2009)Google Scholar
  3. 3.
    Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL-HLT: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)Google Scholar
  4. 4.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: ACL: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)Google Scholar
  5. 5.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: ICLR: Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, USA, pp. 1301–3781 (2013)Google Scholar
  6. 6.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representation of Words and Phrases and their Compositionality. In: NIPS: Proceedings of Neural Information Processing Systems Nevada, United States, pp. 3111–3119 (2013)Google Scholar
  7. 7.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP: Proceeding of the Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532–1543 (2014)Google Scholar
  8. 8.
    http://opus.lingfil.uu.se/ (accessed January 29, 2015)
  9. 9.
    Tiedemann, J.: Parallel Data, Tools and Interfaces in OPUS. In: LREC: Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 2214–2218 (2012)Google Scholar
  10. 10.
    Raafat, H., Zahran, M., Rashwan, M.: Arabase A Database Combining Different Arabic Resources with Lexical and Semantic Information. In: Proceeding of KDIR is part of IC3K, The International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Portugal, pp. 233–240 (2013)Google Scholar
  11. 11.
    Eisele, A., Chen, Y.: MultiUN: A Multilingual corpus from United Nation Documents. In: LREC: Proceeding of the International Conference on Language Resources and Evaluation, Valletta, Malta, pp. 17–23 (2010)Google Scholar
  12. 12.
    http://www.opensubtitles.org/ (accessed January 29, 2015)
  13. 13.
    http://tanzil.net/download/ (accessed January 29, 2015)
  14. 14.
    Tiedemann, J.: News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In (RANLP): Recent Advances in Natural Language Processing, pp. 237–248. John Benjamins, Amsterdam (2009)Google Scholar
  15. 15.
  16. 16.
    Saad, M.K., Ashour, W.: OSAC: Open Source Arabic Corpus. In: EEECS: the 6th International Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, vol. 10 (2010)Google Scholar
  17. 17.
    https://github.com/anastaw/Meedan-Memory (accessed January 29, 2015)
  18. 18.
    http://ksucorpus.ksu.edu.sa/ar/ (accessed January 29, 2015)
  19. 19.
    https://code.google.com/p/word2vec/ (accessed January 29, 2015)
  20. 20.
    http://nlp.stanford.edu/projects/glove/ (accessed January 29, 2015)
  21. 21.
    Mikolov, T., Le, V.Q., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation. In: arXiv, 1309-4168 (2013)Google Scholar
  22. 22.
    Gomaa, W.H., Fahmy, A.A.: Automatic scoring for answers to Arabic test questions. Computer Speech & Language, 833–857 (2014)Google Scholar
  23. 23.
    Mahgoub, Y.A., Rashwan, A.M., Raafat, H., Zahran, A.M., Fayek, B.M.: Semantic Query Expansion for Arabic Information Retrieval. In: EMNLP: The Arabic Natural Language Processing Workshop, Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 87–92 (2014)Google Scholar
  24. 24.
    Oard, D.W., Gey, F.C.: The TREC 2002 Arabic/English CLIR Track. In: TREC (2002)Google Scholar
  25. 25.
    http://sourceforge.net/p/lemur/wiki/Indri/ (accessed January 31, 2015)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mohamed A. Zahran
    • 1
    Email author
  • Ahmed Magooda
    • 1
  • Ashraf Y. Mahgoub
    • 1
  • Hazem Raafat
    • 2
  • Mohsen Rashwan
    • 3
  • Amir Atyia
    • 1
  1. 1.Computer Engineering DepartmentCairo UniversityGizaEgypt
  2. 2.Computer Science DepartmentKuwait UniversityKuwaitKuwait
  3. 3.Electronics and Communications DepartmentCairo UniversityGizaEgypt

Personalised recommendations