Semantic Measures for Keywords Extraction

  • Davide Colla
  • Enrico Mensa
  • Daniele P. RadicioniEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10640)


In this paper we introduce a minimalist hypothesis for keywords extraction: keywords can be extracted from text documents by considering concepts underlying document terms. Furthermore, central concepts are individuated as the concepts that are more related to title concepts. Namely, we propose five metrics, that are diverse in essence, to compute the centrality of concepts in the document body with respect to those in the title. We finally report about an experimentation over a popular data set of human annotated news articles; the results confirm the soundness of our hypothesis.


Keywords extraction Natural language semantics Conceptual similarity Word similarity Lexical resources 



We desire to thank Simone Donetti and the Technical Staff of the Computer Science Department of the University of Turin, for their support in the setup and administration of the computer system used in the experimentation.

The authors are also grateful to the anonymous reviewers for their valuable comments and suggestions.


  1. 1.
    Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: NASARI: a novel approach to a semantically-aware representation of items. In: Proceedings of NAACL, pp. 567–577 (2015)Google Scholar
  2. 2.
    Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)CrossRefzbMATHGoogle Scholar
  3. 3.
    El-Beltagy, S.R., Rafea, A.: KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)CrossRefGoogle Scholar
  4. 4.
    Haggag, M.H.: Keyword extraction using semantic analysis. Int. J. Comput. Appl. 61(1), 1–6 (2013)MathSciNetGoogle Scholar
  5. 5.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP 2003, pp. 216–223 (2003)Google Scholar
  6. 6.
    Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An assessment of online semantic annotators for the keyword extraction task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 548–560. Springer, Cham (2014). Google Scholar
  7. 7.
    Lieto, A., Mensa, E., Radicioni, D.P.: A resource-driven approach for anchoring linguistic resources to conceptual spaces. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 435–449. Springer, Cham (2016). CrossRefGoogle Scholar
  8. 8.
    Marujo, L., Gershman, A., Carbonell, J.G., Frederking, R.E., Neto, J.P.: Supervised topical key phrase extraction of news stories using crowdsourcing and co-reference normalization. In: Proceedings of LREC, pp. 399–403. ELRA (2012)Google Scholar
  9. 9.
    Mensa, E., Radicioni, D.P., Lieto, A.: MERALI at SemEval-2017 task 2 subtask 1: a cognitively inspired approach. In: Proceedings of SemEval-2017, pp. 236–240. ACL (2017).
  10. 10.
    Mensa, E., Radicioni, D.P., Lieto, A.: TTCS\(^{\cal{E}}\): a vectorial resource for computing conceptual similarity. In: EACL 2017 Workshop on Sense, Concept and Entity Representations and their Applications, pp. 96–101. ACL (2017).
  11. 11.
    Mihalcea, R., Tarau, P.: Textrank: Bringing Order into Texts. Association for Computational Linguistics (2004)Google Scholar
  12. 12.
    Mimno, D.M., Wallach, H.M., Talley, E.M., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL (2011)Google Scholar
  13. 13.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the ACM/IEEE JCDL2010. ACM (2010)Google Scholar
  15. 15.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity: measuring the relatedness of concepts. In: HLT-NAACL, pp. 38–41. ACL (2004)Google Scholar
  16. 16.
    Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of WSDM 2015, pp. 399–408. ACM (2015)Google Scholar
  17. 17.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining, pp. 1–20 (2010)Google Scholar
  18. 18.
    Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coherence over many models and many topics. In: Proceedings of EMNLP-CoNLL, pp. 952–961. ACL (2012)Google Scholar
  19. 19.
    Tsatsaronis, G., Varlamis, I., Nørvåg, K.: Semanticrank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082. ACL (2010)Google Scholar
  20. 20.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of JCDL, pp. 254–255. ACM (1999)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Davide Colla
    • 1
  • Enrico Mensa
    • 1
  • Daniele P. Radicioni
    • 1
    Email author
  1. 1.Dipartimento di InformaticaUniversità di TorinoTurinItaly

Personalised recommendations