Skip to main content

Related Terms Extraction from Arabic News Corpus Using Word Embedding

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems: OTM 2018 Workshops (OTM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11231))

Abstract

Different techniques are used in text mining to analyze data, extract knowledge, information and relations. We aim in this work to extract related terms for specific keywords. In the first step, we extract Arabic keywords from news articles titles using the TF-IDF terms weighting measure. In the next step, we extract the related terms, from both titles and main texts, using Word2Vec model as a word embedding technique. In order to evaluate our proposed approach, we compute the precision values of the extracted terms that are present in Wikipedia articles. The experiments results perform better for the extracted terms from the articles main texts than titles and the international news category has the highest precision value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.arabicnlp.pro/.

  2. 2.

    https://deeplearning4j.org.

  3. 3.

    https://antcorpus.github.io/.

  4. 4.

    https://ar.wikipedia.org.

References

  1. Gök, A., Waterworth, A., Shapira, P.: Use of web mining in studying innovation. Scientometrics 102(1), 653–671 (2015)

    Article  Google Scholar 

  2. Salloum, S.A., AlHamad, A.Q., Al-Emran, M., Shaalan, K.: A survey of Arabic text mining. Intell. Nat. Lang. Process. Trends Appl. Stud. Comput. Intell. 740, 417–431 (2018)

    Google Scholar 

  3. Elayeb, B.: Arabic word sense disambiguation: a review. Artif. Intell. Rev. 50, 1–58 (2018)

    Article  Google Scholar 

  4. Elayeb, B., Bounhas, I.: Arabic cross-language information retrieval: a review. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15(3), 18:1–18:44 (2016)

    Article  Google Scholar 

  5. Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for Arabic text disambiguation and terminology extraction. Knowl. Organ. 38(6), 473–490 (2011)

    Google Scholar 

  6. Al-Mahmoud, H., Al-Razgan, M.: Arabic text mining: a systematic review of the published literature 2002–2014. In: Proceedings of ICCC, pp. 1–7 (2015)

    Google Scholar 

  7. Alhawarat, M., Hegazi, M., Hilal, A.: Processing the text of the Holy Quran: a text mining study. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6(2), 262–267 (2015)

    Google Scholar 

  8. Harrag, F.: Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Comput. Hum. Behav. 30, 558–566 (2013)

    Article  Google Scholar 

  9. Al-Horaibi, L., Khan, M.B.: Sentiment analysis of Arabic tweets using text mining techniques. In: First International Workshop on Pattern Recognition (2016)

    Google Scholar 

  10. Chan, H.K., Lacka, E., Yee, R.W., Lim, M.K.: A case study on mining social media data. In: Proceedings of 2014 IEEE International Conference on Industrial Engineering and Engineering Management, pp. 593–596 (2014)

    Google Scholar 

  11. Cherif, W., Madani, A., Kissi, M.: A new modeling approach for Arabic opinion mining recognition. In: Proceedings of Intelligent Systems and Computer Vision (ISCV), pp. 1–6 (2015)

    Google Scholar 

  12. Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: opinion corpus for Arabic. J. Am. Soc. Inf. Sci. 62(10), 2045–2054 (2011)

    Article  Google Scholar 

  13. Atlam, E.S., Morita, K., Fuketa, M., Aoe, J.I.: A new approach for Arabic text classification using Arabic field-association terms. J. Am. Soc. Inf. Sci. 62(11), 2266–2276 (2011)

    Article  Google Scholar 

  14. Wahsheh, H.A., Alsmadi, I.M., Al-Kabi, M.N.: Analyzing the popular words to evaluate spam in Arabic web pages. Res. Bull. Jordan ACM 11(11), 22–26 (2012)

    Google Scholar 

  15. Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Soudi, A., Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38, pp. 221–243 (2007). Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_12

  16. Chowdhury, A., Aljlayl, M., Jensen, E.C., Beitzel, S.M., Grossmanand, D.A., Frieder, O.: IIT at TREC 2002 linear combinations based on document structure and varied stemming for Arabic retrieval. In: Proceedings of TREC 2002, pp. 299–310 (2002)

    Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013)

    Google Scholar 

  18. Chouigui, A., Ben Khiroun, O., Elayeb, B.: ANT corpus: an Arabic news text collection for textual classification. In: Proceedings of AICCSA 2017, pp. 135–142 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amina Chouigui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chouigui, A., Ben Khiroun, O., Elayeb, B. (2019). Related Terms Extraction from Arabic News Corpus Using Word Embedding. In: Debruyne, C., Panetto, H., Guédria, W., Bollen, P., Ciuciu, I., Meersman, R. (eds) On the Move to Meaningful Internet Systems: OTM 2018 Workshops. OTM 2018. Lecture Notes in Computer Science(), vol 11231. Springer, Cham. https://doi.org/10.1007/978-3-030-11683-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11683-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11682-8

  • Online ISBN: 978-3-030-11683-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics