Skip to main content

Extracting Keyphrases Using Heterogeneous Word Relations

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10538))

Included in the following conference series:

Abstract

Extracting keyphrases from documents for providing a quick and insightful summarization is an interesting and important task, on which lots of research efforts have been laid. Most of the existing methods could be categorized as co-occurrence based, statistic-based, or semantics-based. The co-occurrence based methods do not take various word relations besides co-occurrence into full consideration. The statistic-based methods introduce more unrelated noises inevitably due to the inclusion of external text corpus, while the semantic-based methods heavily depend on the semantic meanings of words. In this paper, we propose a novel graph-based approach to extract keyphrases by considering heterogeneous latent word relations (the co-occurrence and the semantics). The underlying random walk model behind the proposed approach is made possible and reasonable by exploiting nearest neighbor documents. Extensive experiments over real data show that our method outperforms the state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://code.google.com/archive/p/word2vec/.

  2. 2.

    http://www-nlpir.nist.gov/projects/duc/past_duc/duc2001/data.html.

  3. 3.

    We use the original version of KEA provided in the public website created by its author. http://www.nzdl.org/Kea/download.html.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. IJCNLP 2013, 834–838 (2013)

    Google Scholar 

  3. Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26

    Chapter  Google Scholar 

  4. Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: COLING, pp. 365–373 (2010)

    Google Scholar 

  5. Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)

    Google Scholar 

  6. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)

    Google Scholar 

  7. Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)

    Google Scholar 

  8. Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)

    Google Scholar 

  9. Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)

    Google Scholar 

  10. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)

    Google Scholar 

  11. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)

    Google Scholar 

  12. Ng, M.K., Li, X., Ye, Y.: Multirank: co-ranking for objects and relations in multi-relational data. In: SIGKDD 2011, pp. 1217–1225 (2011)

    Google Scholar 

  13. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. 1999

    Google Scholar 

  14. Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)

    Google Scholar 

  15. Turney, P.D.: Learning algorithms for keyphrase extraction. CoRR, cs.LG/0212020 (2002)

    Google Scholar 

  16. Turney, P.D.: Learning to extract keyphrases from text. CoRR, cs.LG/0212013 (2002)

    Google Scholar 

  17. Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010)

    Google Scholar 

  18. Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007 (2007)

    Google Scholar 

  19. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: ACM DL 1999, pp. 254–255 (1999)

    Google Scholar 

  20. Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-whitney statistic. In: ICML 2003, pp. 848–855 (2003)

    Google Scholar 

  21. Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Research Grant Council of Hong Kong SAR No. 14221716, Natural Science Foundation of Jiangsu Province (BK20171447) and Nanjing University of Posts and Telecommunications (NY215045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shi, W., Liu, Z., Zheng, W., Yu, J.X. (2017). Extracting Keyphrases Using Heterogeneous Word Relations. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68155-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68154-2

  • Online ISBN: 978-3-319-68155-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics