Extracting Keyphrases Using Heterogeneous Word Relations

Shi, Wei; Liu, Zheng; Zheng, Weiguo; Yu, Jeffrey Xu

doi:10.1007/978-3-319-68155-9_13

Wei Shi¹⁶,
Zheng Liu¹⁷,
Weiguo Zheng¹⁶ &
…
Jeffrey Xu Yu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10538))

Included in the following conference series:

Australasian Database Conference

1030 Accesses
1 Citations

Abstract

Extracting keyphrases from documents for providing a quick and insightful summarization is an interesting and important task, on which lots of research efforts have been laid. Most of the existing methods could be categorized as co-occurrence based, statistic-based, or semantics-based. The co-occurrence based methods do not take various word relations besides co-occurrence into full consideration. The statistic-based methods introduce more unrelated noises inevitably due to the inclusion of external text corpus, while the semantic-based methods heavily depend on the semantic meanings of words. In this paper, we propose a novel graph-based approach to extract keyphrases by considering heterogeneous latent word relations (the co-occurrence and the semantics). The underlying random walk model behind the proposed approach is made possible and reasonable by exploiting nearest neighbor documents. Extensive experiments over real data show that our method outperforms the state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://code.google.com/archive/p/word2vec/.
2.
http://www-nlpir.nist.gov/projects/duc/past_duc/duc2001/data.html.
3.
We use the original version of KEA provided in the public website created by its author. http://www.nzdl.org/Kea/download.html.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. IJCNLP 2013, 834–838 (2013)
Google Scholar
Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26
Chapter Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: COLING, pp. 365–373 (2010)
Google Scholar
Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Ng, M.K., Li, X., Ye, Y.: Multirank: co-ranking for objects and relations in multi-relational data. In: SIGKDD 2011, pp. 1217–1225 (2011)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. 1999
Google Scholar
Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. CoRR, cs.LG/0212020 (2002)
Google Scholar
Turney, P.D.: Learning to extract keyphrases from text. CoRR, cs.LG/0212013 (2002)
Google Scholar
Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010)
Google Scholar
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007 (2007)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: ACM DL 1999, pp. 254–255 (1999)
Google Scholar
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-whitney statistic. In: ICML 2003, pp. 848–855 (2003)
Google Scholar
Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Research Grant Council of Hong Kong SAR No. 14221716, Natural Science Foundation of Jiangsu Province (BK20171447) and Nanjing University of Posts and Telecommunications (NY215045).

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Hong Kong, China
Wei Shi, Weiguo Zheng & Jeffrey Xu Yu
Nanjing University of Posts and Telecommunications, Nanjing, China
Zheng Liu

Authors

Wei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Shi .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Queensland, Australia
Zi Huang
Nanyang Technological University, Singapore, Singapore
Xiaokui Xiao
University of New South Wales, Sydney, New South Wales, Australia
Xin Cao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, W., Liu, Z., Zheng, W., Yu, J.X. (2017). Extracting Keyphrases Using Heterogeneous Word Relations. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-68155-9_13
Published: 20 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68154-2
Online ISBN: 978-3-319-68155-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics