Abstract
Keyword extraction has gained increasing interest in the era of information explosion. The use of keyword extraction in documents context categorization, indexing and classification has led to the emphasis on graph-based keyword extraction. This research attempts to examine the impact of several factors on the result of using graph-based keyword extraction approach on a scientific dataset. This study applies a new model that processes the Medline scientific abstracts, produces graphs and extracts 3-graphlets and 4-graphlets from those graphs. The focus of the experiment is to come up with a dataset that consists of the keywords and their occurrences in the proposed graphlets patterns for each abstract with its class. Then, apply a supervised Naïve Bayes classifier in order to assign a probability to each word, whether or not it is a keyword, and finally evaluate the performance of the graph-based keyword extraction approach. The model achieved significant results compared to the Term Frequency/Inverse Document Frequency (TF/IDF) baseline standard. The experimental results proved the capability of using graphs and graphlet patterns in keyword extraction tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aizawa, A.: An information-theoretic perspective of tf-idf measures. Inf. Process. Manag. 39(1), 45–65 (2003)
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)
DePiero, F., Krout, D.: An algorithm using length-r paths to approximate subgraph isomorphism. Pattern Recogn. Lett. 24(1), 33–46 (2003)
Ergan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decis. Support Syst. 27(1), 81–104 (1999)
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(01), 157–169 (2004)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. Assoc. Comput. Linguist. (2004)
Nabhan, A.R., Shaalan, K.: Keyword identification using text graphlet patterns. In: International Conference on Applications of Natural Language to Information Systems, pp. 152–161. Springer (2016)
Ncbi.nlm.nih.gov. Home-pubmed-ncbi. http://www.ncbi.nlm.nih.gov/pubmed, August (2016)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web (1999)
Palshikar, G.K.: Keyword extraction from a single document using centrality measures. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 503–510. Springer (2007)
Pržulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), e177–e183 (2007)
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining, pp. 1–20 (2010)
Ruohonen, K.: Graph theory, graafiteoria lecture notes, tut (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Alqaryouti, O., Khwileh, H., Farouk, T., Nabhan, A., Shaalan, K. (2018). Graph-Based Keyword Extraction. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)