Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

Abstract

Keyword extraction has gained increasing interest in the era of information explosion. The use of keyword extraction in documents context categorization, indexing and classification has led to the emphasis on graph-based keyword extraction. This research attempts to examine the impact of several factors on the result of using graph-based keyword extraction approach on a scientific dataset. This study applies a new model that processes the Medline scientific abstracts, produces graphs and extracts 3-graphlets and 4-graphlets from those graphs. The focus of the experiment is to come up with a dataset that consists of the keywords and their occurrences in the proposed graphlets patterns for each abstract with its class. Then, apply a supervised Naïve Bayes classifier in order to assign a probability to each word, whether or not it is a keyword, and finally evaluate the performance of the graph-based keyword extraction approach. The model achieved significant results compared to the Term Frequency/Inverse Document Frequency (TF/IDF) baseline standard. The experimental results proved the capability of using graphs and graphlet patterns in keyword extraction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aizawa, A.: An information-theoretic perspective of tf-idf measures. Inf. Process. Manag. 39(1), 45–65 (2003)

    Google Scholar 

  2. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)

    Google Scholar 

  3. Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)

    Google Scholar 

  4. DePiero, F., Krout, D.: An algorithm using length-r paths to approximate subgraph isomorphism. Pattern Recogn. Lett. 24(1), 33–46 (2003)

    Google Scholar 

  5. Ergan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  6. Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decis. Support Syst. 27(1), 81–104 (1999)

    Google Scholar 

  7. Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(01), 157–169 (2004)

    Google Scholar 

  8. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. Assoc. Comput. Linguist. (2004)

    Google Scholar 

  9. Nabhan, A.R., Shaalan, K.: Keyword identification using text graphlet patterns. In: International Conference on Applications of Natural Language to Information Systems, pp. 152–161. Springer (2016)

    Google Scholar 

  10. Ncbi.nlm.nih.gov. Home-pubmed-ncbi. http://www.ncbi.nlm.nih.gov/pubmed, August (2016)

  11. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web (1999)

    Google Scholar 

  12. Palshikar, G.K.: Keyword extraction from a single document using centrality measures. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 503–510. Springer (2007)

    Google Scholar 

  13. Pržulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), e177–e183 (2007)

    Google Scholar 

  14. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining, pp. 1–20 (2010)

    Google Scholar 

  15. Ruohonen, K.: Graph theory, graafiteoria lecture notes, tut (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omar Alqaryouti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alqaryouti, O., Khwileh, H., Farouk, T., Nabhan, A., Shaalan, K. (2018). Graph-Based Keyword Extraction. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics