Abstract
Different techniques are used in text mining to analyze data, extract knowledge, information and relations. We aim in this work to extract related terms for specific keywords. In the first step, we extract Arabic keywords from news articles titles using the TF-IDF terms weighting measure. In the next step, we extract the related terms, from both titles and main texts, using Word2Vec model as a word embedding technique. In order to evaluate our proposed approach, we compute the precision values of the extracted terms that are present in Wikipedia articles. The experiments results perform better for the extracted terms from the articles main texts than titles and the international news category has the highest precision value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gök, A., Waterworth, A., Shapira, P.: Use of web mining in studying innovation. Scientometrics 102(1), 653–671 (2015)
Salloum, S.A., AlHamad, A.Q., Al-Emran, M., Shaalan, K.: A survey of Arabic text mining. Intell. Nat. Lang. Process. Trends Appl. Stud. Comput. Intell. 740, 417–431 (2018)
Elayeb, B.: Arabic word sense disambiguation: a review. Artif. Intell. Rev. 50, 1–58 (2018)
Elayeb, B., Bounhas, I.: Arabic cross-language information retrieval: a review. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15(3), 18:1–18:44 (2016)
Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for Arabic text disambiguation and terminology extraction. Knowl. Organ. 38(6), 473–490 (2011)
Al-Mahmoud, H., Al-Razgan, M.: Arabic text mining: a systematic review of the published literature 2002–2014. In: Proceedings of ICCC, pp. 1–7 (2015)
Alhawarat, M., Hegazi, M., Hilal, A.: Processing the text of the Holy Quran: a text mining study. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6(2), 262–267 (2015)
Harrag, F.: Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Comput. Hum. Behav. 30, 558–566 (2013)
Al-Horaibi, L., Khan, M.B.: Sentiment analysis of Arabic tweets using text mining techniques. In: First International Workshop on Pattern Recognition (2016)
Chan, H.K., Lacka, E., Yee, R.W., Lim, M.K.: A case study on mining social media data. In: Proceedings of 2014 IEEE International Conference on Industrial Engineering and Engineering Management, pp. 593–596 (2014)
Cherif, W., Madani, A., Kissi, M.: A new modeling approach for Arabic opinion mining recognition. In: Proceedings of Intelligent Systems and Computer Vision (ISCV), pp. 1–6 (2015)
Rushdi-Saleh, M., MartÃn-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: opinion corpus for Arabic. J. Am. Soc. Inf. Sci. 62(10), 2045–2054 (2011)
Atlam, E.S., Morita, K., Fuketa, M., Aoe, J.I.: A new approach for Arabic text classification using Arabic field-association terms. J. Am. Soc. Inf. Sci. 62(11), 2266–2276 (2011)
Wahsheh, H.A., Alsmadi, I.M., Al-Kabi, M.N.: Analyzing the popular words to evaluate spam in Arabic web pages. Res. Bull. Jordan ACM 11(11), 22–26 (2012)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Soudi, A., Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38, pp. 221–243 (2007). Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_12
Chowdhury, A., Aljlayl, M., Jensen, E.C., Beitzel, S.M., Grossmanand, D.A., Frieder, O.: IIT at TREC 2002 linear combinations based on document structure and varied stemming for Arabic retrieval. In: Proceedings of TREC 2002, pp. 299–310 (2002)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013)
Chouigui, A., Ben Khiroun, O., Elayeb, B.: ANT corpus: an Arabic news text collection for textual classification. In: Proceedings of AICCSA 2017, pp. 135–142 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chouigui, A., Ben Khiroun, O., Elayeb, B. (2019). Related Terms Extraction from Arabic News Corpus Using Word Embedding. In: Debruyne, C., Panetto, H., Guédria, W., Bollen, P., Ciuciu, I., Meersman, R. (eds) On the Move to Meaningful Internet Systems: OTM 2018 Workshops. OTM 2018. Lecture Notes in Computer Science(), vol 11231. Springer, Cham. https://doi.org/10.1007/978-3-030-11683-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-11683-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11682-8
Online ISBN: 978-3-030-11683-5
eBook Packages: Computer ScienceComputer Science (R0)