Skip to main content

AKEA: An Arabic Keyphrase Extraction Algorithm

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Abstract

Keyphrase extraction is a critical step in many natural language processing and Information retrieval applications. In this paper, we introduce AKEA, a keyphrase extraction algorithm for single Arabic documents. AKEA is an unsupervised algorithm as it does not need any type of training in order to achieve its task. We rely on heuristics that collaborate linguistic patterns based on Part-Of-Speech (POS) tags, statistical knowledge, and the internal structural pattern of terms (i.e. word-occurrence). We employ the usage of Arabic Wikipedia to improve the ranking (or significance) of candidate keyphrases by adding a confidence score if the candidate exist as an indexed Wikipedia concept. Experimental results show that on average AKEA has the highest precision value, the highest F-measure value which indicates it presents more accurate results compared to its other algorithms

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.ranks.nl/stopwords/arabic.

  2. 2.

    http://nlp.stanford.edu/software/tagger.shtml.

  3. 3.

    https://ar.wikipedia.org/.

  4. 4.

    http://www.elwatannews.com/.

  5. 5.

    http://www.aljazeera.net/portal.

References

  1. Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An assessment of online semantic annotators for the keyword extraction task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 548–560. Springer, Heidelberg (2014)

    Google Scholar 

  2. Harb, H., Fouad, K., Nagdy, N.: Semantic retrieval approach for web documents. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2(9), 4673–4681 (2011)

    Google Scholar 

  3. Fouad, K., Khalifa, A., Nagdy, N., Harb, H.: Web-based semantic and personalized information retrieval. Int. J. Comput. Sci. Iss. (IJCSI) 9(3), 3 (2012)

    Google Scholar 

  4. Babekr, S., Fouad, K. Arshad, N.: Personalized semantic retrieval and summarization of web based documents. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(1) (2013)

    Google Scholar 

  5. Fouad, K., Hassan, M.: Agent for documents clustering using semantic-based model and fuzzy. Int. J. Comput. Appl. (0975–8887) 62(3), 10–16 (2013)

    Google Scholar 

  6. Gupta, V., Lehal, G.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)

    Google Scholar 

  7. Wang, R., Liu, W., McDonald, C.: How preprocessing affects unsupervised keyphrase extraction. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 163–176. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  8. Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient KNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)

    Article  Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. In: Hastie, T., Tibshirani, R., Friedman, J. (eds.) Unsupervised Learning, 2nd edn. Springer, New York (2008)

    Google Scholar 

  10. Aliaa, A., Ghalwash, Y., Amer, E.: KPE: an automatic keyphrase extraction algorithm. In: IEEE Proceeding of International Conference on Information Systems and Computational Intelligence (ICISCI 2011), pp. 103–107 (2011)

    Google Scholar 

  11. El-Beltagy, S., Rafea, A.: KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34, 132–144. Elsevier B.V. (2009)

    Google Scholar 

  12. Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: a look back and into the future. ACM Comput. Surv. 44, 20:1–20:36 (2012)

    Article  MATH  Google Scholar 

  13. You, W., Fontaine, D., Barthès, J.-P.: An automatic keyphrase extraction system for scientific documents. Knowl. Inf. Syst. 34, 691–724. Springer (2013)

    Google Scholar 

  14. Hong, B., Zhen, D.: An extended keyword extraction method. In: 2012 International Conference on Applied Physics and Industrial Engineering. Physics Procedia, vol. 24, pp. 1120–1127. Elsevier B.V. (2012)

    Google Scholar 

  15. El-Ghannam, F., El-Shishtawy, T.: Multi-topic multi-document summarizer. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(6), 77–90 (2013)

    Google Scholar 

  16. Al-Saleh, A., Menai, M.: Automatic Arabic text summarization: a survey. Artif. Intell. Rev. 45, 203–234 (2016)

    Article  Google Scholar 

  17. Huang, Y.-F., Ciou, C.-S.: Constructing personal knowledge base: automatic key-phrase extraction from multiple-domain web pages. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 65–76. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Paukkeria, M., Garca-Plazab, A., Fresnob, V., Unanueb, R., Honkelaa, T.: Learning a taxonomy from a set of text documents. Appl. Soft Comput. 12, 1138–1148. Elsevier B.V. (2012)

    Google Scholar 

  19. Chen, Y., Yin, J., Zhu, W., Qiu, S.: Novel word features for keyword extraction. In: Dong, X.L., Yu, X., Sun, Y., Dong, X.L., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 148–160. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21042-1_12

    Chapter  Google Scholar 

  20. Rodas, A.: Semantic metadata extraction from open domain texts in natural language. Master of Science in Computer Engineering University Of Puerto Rico Mayaguez Campus. ProQuest LLC (2013)

    Google Scholar 

  21. Qureshi, M., O’Riordan, C., Pasi, G.: Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia. ACM (2012). 978-1-4503-1156-4/12/10

    Google Scholar 

  22. Saad, S., Salim, N., Omar, N.: Keyphrase extraction for Islamic knowledge ontology. IEEE (2008). 978-1-4244-2328-6/08

    Google Scholar 

  23. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Collins, M., Steedman, M. (eds.) Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)

    Google Scholar 

  24. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial intelligence, AAAI 2008, vol. 2, pp. 855–860. AAAI Press (2008)

    Google Scholar 

  25. Khoja, S., Garside, R., Knowles, G.: An Arabic tagset for the morphosyntactic tagging of Arabic (2001)

    Google Scholar 

  26. Pu, M.: Fundamental data Compression, 1st edn. Elsevier, UK (2006)

    Google Scholar 

  27. Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceeding of the Eighth ACM Symposium on Document Engineering, pp. 199–208 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eslam Amer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Amer, E., Foad, K. (2017). AKEA: An Arabic Keyphrase Extraction Algorithm. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48308-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48307-8

  • Online ISBN: 978-3-319-48308-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics