Advertisement

Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

  • Nils WittEmail author
  • Tobias MilzEmail author
  • Christin SeifertEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

Automatic keyphrase extraction attempts to capture keywords that accurately and extensively describe the document while being comprehensive at the same time. Unsupervised algorithms for extractive keyphrase extraction, i.e. those that filter the keyphrases from the text without external knowledge, generally suffer from low precision and low recall. In this paper, we propose a scoring of the extracted keyphrases as post-processing to rerank the list of extracted phrases in order to improve precision and recall particularly for the top phrases. The approach is based on the tf-idf score of the keyphrases and is agnostic of the underlying method used for the initial extraction of the keyphrases. Experiments show an increase of up to 14% at 5 keyphrases in the F1-metric on the most difficult corpus out of 4 corpora. We also show that this increase is mostly due to an increase on documents with very low F1-scores. Thus, our scoring and aggregation approach seems to be a promising way for robust, unsupervised keyphrase extraction with a special focus on the most important keyphrases.

References

  1. 1.
    Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
  2. 2.
    Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction, pp. 543–551, Oct 2013Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  5. 5.
    Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., Han, J.: Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 398–406. SIAM (2014)CrossRefGoogle Scholar
  6. 6.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 1999, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)Google Scholar
  7. 7.
    Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM, New York, NY, USA (2009)Google Scholar
  8. 8.
    Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 365–373. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)Google Scholar
  9. 9.
    Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1262–1273 (2014)Google Scholar
  10. 10.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223. Association for Computational Linguistics, Stroudsburg, PA, USA (2003)Google Scholar
  11. 11.
    Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)Google Scholar
  12. 12.
    Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 135–144. Association for Computational Linguistics (2011)Google Scholar
  13. 13.
    Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the conference on empirical methods in natural language processing, pp. 366–376. Association for Computational Linguistics (2010)Google Scholar
  14. 14.
    Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP 2009, pp. 257–266. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)Google Scholar
  15. 15.
    Mani, I.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)Google Scholar
  16. 16.
    Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP 2009, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)Google Scholar
  17. 17.
    Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)
  18. 18.
    Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)Google Scholar
  19. 19.
    Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)CrossRefGoogle Scholar
  20. 20.
    Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 995–1004. ACM (2015)Google Scholar
  21. 21.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents, pp. 1–20. Wiley, Chichester (2010)Google Scholar
  22. 22.
    Turney, P.: Learning to extract keyphrases from text, Jan 1999Google Scholar
  23. 23.
    Wan, X., Xiao, J.: Collabrank: towards a collaborative approach to single- document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976. Coling 2008 Organizing Committee, Manchester, UK, August 2008Google Scholar
  24. 24.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intel- ligence - Volume 2, AAAI 2008, pp. 855–860. AAAI Press (2008)Google Scholar
  25. 25.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008)Google Scholar
  26. 26.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM, New York, NY, USA (1999)Google Scholar
  27. 27.
    Zhang, Y., Fang, Y., Weidong, X.: Deep keyphrase generation with a convolutional sequence to sequence model. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1477–1485. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.ZBW - Leibniz Information Centre for EconomicsKielGermany
  2. 2.University of PassauPassauGermany
  3. 3.University of TwenteEnschedeThe Netherlands

Personalised recommendations