Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Witt, Nils; Milz, Tobias; Seifert, Christin

doi:10.1007/978-3-030-01771-2_24

Nils Witt¹⁷,
Tobias Milz¹⁸ &
Christin Seifert¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11198))

Included in the following conference series:

International Conference on Discovery Science

857 Accesses
1 Citations

Abstract

Automatic keyphrase extraction attempts to capture keywords that accurately and extensively describe the document while being comprehensive at the same time. Unsupervised algorithms for extractive keyphrase extraction, i.e. those that filter the keyphrases from the text without external knowledge, generally suffer from low precision and low recall. In this paper, we propose a scoring of the extracted keyphrases as post-processing to rerank the list of extracted phrases in order to improve precision and recall particularly for the top phrases. The approach is based on the tf-idf score of the keyphrases and is agnostic of the underlying method used for the initial extraction of the keyphrases. Experiments show an increase of up to 14% at 5 keyphrases in the F1-metric on the most difficult corpus out of 4 corpora. We also show that this increase is mostly due to an increase on documents with very low F1-scores. Thus, our scoring and aggregation approach seems to be a promising way for robust, unsupervised keyphrase extraction with a special focus on the most important keyphrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Throughout the document we will use the unifying term keyphrase to refer to keywords as well as keyphrases as defined in the Introduction.
2.
https://doi.org/10.5281/zenodo.1435518.
3.
https://www.nltk.org/.
4.
https://pypi.org/project/rake-nltk/.
5.
https://github.com/jerrygaoLondon/jgtextrank.

References

Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction, pp. 543–551, Oct 2013
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., Han, J.: Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 398–406. SIAM (2014)
Chapter Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 1999, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
Google Scholar
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM, New York, NY, USA (2009)
Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 365–373. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1262–1273 (2014)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223. Association for Computational Linguistics, Stroudsburg, PA, USA (2003)
Google Scholar
Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Google Scholar
Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 135–144. Association for Computational Linguistics (2011)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the conference on empirical methods in natural language processing, pp. 366–376. Association for Computational Linguistics (2010)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP 2009, pp. 257–266. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Google Scholar
Mani, I.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP 2009, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Google Scholar
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)
Google Scholar
Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)
Article Google Scholar
Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 995–1004. ACM (2015)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents, pp. 1–20. Wiley, Chichester (2010)
Google Scholar
Turney, P.: Learning to extract keyphrases from text, Jan 1999
Google Scholar
Wan, X., Xiao, J.: Collabrank: towards a collaborative approach to single- document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976. Coling 2008 Organizing Committee, Manchester, UK, August 2008
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intel- ligence - Volume 2, AAAI 2008, pp. 855–860. AAAI Press (2008)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM, New York, NY, USA (1999)
Google Scholar
Zhang, Y., Fang, Y., Weidong, X.: Deep keyphrase generation with a convolutional sequence to sequence model. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1477–1485. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

ZBW - Leibniz Information Centre for Economics, Kiel, Germany
Nils Witt
University of Passau, Passau, Germany
Tobias Milz
University of Twente, Enschede, The Netherlands
Christin Seifert

Authors

Nils Witt
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Milz
View author publications
You can also search for this author in PubMed Google Scholar
Christin Seifert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nils Witt , Tobias Milz or Christin Seifert .

Editor information

Editors and Affiliations

Goldsmiths University of London, London, UK
Larisa Soldatova
Eindhoven University of Technology, Eindhoven, The Netherlands
Joaquin Vanschoren
University of Cyprus, Nicosia, Cyprus
George Papadopoulos
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Witt, N., Milz, T., Seifert, C. (2018). Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-01771-2_24
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics