Using Word Embeddings to Enhance Keyword Identification for Scientific Publications

  • Rui Wang
  • Wei LiuEmail author
  • Chris McDonald
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9093)


Automatic keyword identification is a desirable but difficult task. It requires considerations of not only the extraction of important words or phrases from a text, but also the generation of abstractive ones that do not appear in the text. In this paper, we propose an approach that uses word embedding vectors as an external knowledge base for both keyword extraction and generation. Our evaluation shows that our approach outperforms many baseline algorithms, and is comparable to the state-of-the-art algorithm on our chosen dataset. In addition, we also introduce a new approach for evaluating the task of keyword extraction, that overcomes a common problem of overly strict matching criteria. We show that using word embedding vectors is a simpler, yet effective, method for both keyword extraction and generation.


Keyword extraction Keyword generation Word embeddings 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1), 11–21 (1972)CrossRefGoogle Scholar
  2. 2.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)Google Scholar
  3. 3.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008)Google Scholar
  4. 4.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13(01), 157–169 (2004)CrossRefGoogle Scholar
  5. 5.
    Hasan, K.S., Ng, V.: Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)Google Scholar
  6. 6.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)Google Scholar
  7. 7.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12, 2493–2537 (2011)zbMATHGoogle Scholar
  8. 8.
    Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, pp. 641–648. ACM (2007)Google Scholar
  9. 9.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 384–394 (2010)Google Scholar
  10. 10.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30(1), 107–117 (1998)CrossRefGoogle Scholar
  11. 11.
    Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics, pp. 257–266 (2009)Google Scholar
  12. 12.
    Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM (2009)Google Scholar
  13. 13.
    Wang, J., Liu, J., Wang, C.: Keyword extraction based on pagerank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  14. 14.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  15. 15.
    Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 135–144 (2011)Google Scholar
  16. 16.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)Google Scholar
  17. 17.
    Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, pp. 423–430 (2003)Google Scholar
  18. 18.
    Grolmusz, V.: A note on the pagerank of undirected graphs (2012). arXiv preprint arXiv:1205.1960
  19. 19.
    Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: 2004 Proceedings of the Second Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2004)Google Scholar
  20. 20.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining (2010)Google Scholar
  21. 21.
    Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 366–376 (2010)Google Scholar
  22. 22.
    Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 365–373 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer Science and Software EngineeringThe University of Western AustraliaCrawleyAustralia

Personalised recommendations