Word Representations in Vector Space and their Applications for Arabic

Zahran, Mohamed A.; Magooda, Ahmed; Mahgoub, Ashraf Y.; Raafat, Hazem; Rashwan, Mohsen; Atyia, Amir

doi:10.1007/978-3-319-18111-0_32

Mohamed A. Zahran¹⁴,
Ahmed Magooda¹⁴,
Ashraf Y. Mahgoub¹⁴,
Hazem Raafat¹⁵,
Mohsen Rashwan¹⁶ &
…
Amir Atyia¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3399 Accesses
36 Citations

Abstract

A lot of work has been done to give the individual words of a certain language adequate representations in vector space so that these representations capture semantic and syntactic properties of the language. In this paper, we compare different techniques to build vectorized space representations for Arabic, and test these models via intrinsic and extrinsic evaluations. Intrinsic evaluation assesses the quality of models using benchmark semantic and syntactic dataset, while extrinsic evaluation assesses the quality of models by their impact on two Natural Language Processing applications: Information retrieval and Short Answer Grading. Finally, we map the Arabic vector space to the English counterpart using Cosine error regression neural network and show that it outperforms standard mean square error regression neural networks in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Collobert, R., Weston, J.: A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 160–167 (2008)
Google Scholar
Mnih, A., Hinton, G.: A Scalable Hierarchical Distributed Language Model. In: NIPS: Proceedings of Neural Information Processing Systems, Vancouver, B.C, Canada, pp. 1081–1088 (2009)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL-HLT: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: ACL: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: ICLR: Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, USA, pp. 1301–3781 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representation of Words and Phrases and their Compositionality. In: NIPS: Proceedings of Neural Information Processing Systems Nevada, United States, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP: Proceeding of the Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532–1543 (2014)
Google Scholar
http://opus.lingfil.uu.se/ (accessed January 29, 2015)
Tiedemann, J.: Parallel Data, Tools and Interfaces in OPUS. In: LREC: Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 2214–2218 (2012)
Google Scholar
Raafat, H., Zahran, M., Rashwan, M.: Arabase A Database Combining Different Arabic Resources with Lexical and Semantic Information. In: Proceeding of KDIR is part of IC3K, The International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Portugal, pp. 233–240 (2013)
Google Scholar
Eisele, A., Chen, Y.: MultiUN: A Multilingual corpus from United Nation Documents. In: LREC: Proceeding of the International Conference on Language Resources and Evaluation, Valletta, Malta, pp. 17–23 (2010)
Google Scholar
http://www.opensubtitles.org/ (accessed January 29, 2015)
http://tanzil.net/download/ (accessed January 29, 2015)
Tiedemann, J.: News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In (RANLP): Recent Advances in Natural Language Processing, pp. 237–248. John Benjamins, Amsterdam (2009)
Google Scholar
https://sites.google.com/site/mouradabbas9/corpora (accessed January 29, 2015)
Saad, M.K., Ashour, W.: OSAC: Open Source Arabic Corpus. In: EEECS: the 6th International Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, vol. 10 (2010)
Google Scholar
https://github.com/anastaw/Meedan-Memory (accessed January 29, 2015)
http://ksucorpus.ksu.edu.sa/ar/ (accessed January 29, 2015)
https://code.google.com/p/word2vec/ (accessed January 29, 2015)
http://nlp.stanford.edu/projects/glove/ (accessed January 29, 2015)
Mikolov, T., Le, V.Q., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation. In: arXiv, 1309-4168 (2013)
Google Scholar
Gomaa, W.H., Fahmy, A.A.: Automatic scoring for answers to Arabic test questions. Computer Speech & Language, 833–857 (2014)
Google Scholar
Mahgoub, Y.A., Rashwan, A.M., Raafat, H., Zahran, A.M., Fayek, B.M.: Semantic Query Expansion for Arabic Information Retrieval. In: EMNLP: The Arabic Natural Language Processing Workshop, Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 87–92 (2014)
Google Scholar
Oard, D.W., Gey, F.C.: The TREC 2002 Arabic/English CLIR Track. In: TREC (2002)
Google Scholar
http://sourceforge.net/p/lemur/wiki/Indri/ (accessed January 31, 2015)

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Cairo University, Giza, Egypt
Mohamed A. Zahran, Ahmed Magooda, Ashraf Y. Mahgoub & Amir Atyia
Computer Science Department, Kuwait University, Kuwait, Kuwait
Hazem Raafat
Electronics and Communications Department, Cairo University, Giza, Egypt
Mohsen Rashwan

Authors

Mohamed A. Zahran
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Magooda
View author publications
You can also search for this author in PubMed Google Scholar
Ashraf Y. Mahgoub
View author publications
You can also search for this author in PubMed Google Scholar
Hazem Raafat
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Rashwan
View author publications
You can also search for this author in PubMed Google Scholar
Amir Atyia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed A. Zahran .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zahran, M.A., Magooda, A., Mahgoub, A.Y., Raafat, H., Rashwan, M., Atyia, A. (2015). Word Representations in Vector Space and their Applications for Arabic. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-18111-0_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics