Advertisement

Chinese News Keyword Extraction Algorithm Based on TextRank and Word-Sentence Collaboration

  • Qing GuoEmail author
  • Ao Xiong
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 905)

Abstract

TextRank always chooses frequent words as keywords of a text. However, some infrequent words may also be keywords. To solve the problem, a keyword extraction algorithm based on TextRank is proposed. The algorithm takes the importance of sentences into consideration and extracts keywords through word-sentence collaboration. Two text networks are built. One network’s nodes are words where the diffusion of two words is defined to calculate the correlation between words. Another’s nodes are sentences where BM25 algorithm is used to calculate the correlation between sentences. Then a sentence-word matrix is constructed to extract the keywords of a text. Experiments are conducted on the Chinese news corpus. Results show the proposed algorithm outperforms TextRank in Precision, Recall and F1-measure.

Keywords

Keyword extraction TextRank BM25 Word-sentence collaboration 

References

  1. 1.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–441. ACL, Stroudsburg (2004)Google Scholar
  2. 2.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: 16th International Joint Conference on Artificial Intelligence (IJCAI 99), pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  3. 3.
    Turney, P.D.: Learning to extract keyphrases from text. Inf. Retr. 2(4), 303–336 (2000)CrossRefGoogle Scholar
  4. 4.
    Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRefGoogle Scholar
  5. 5.
    Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Information Storage and Retrieval, pp. 30–39. ACM Press, New York (1981)Google Scholar
  6. 6.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  7. 7.
    Li, W., Zhao, J.: TextRank algorithm by exploiting Wikipedia for short text keywords extraction. In: 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), pp. 683–686. IEEE, Piscataway (2016)Google Scholar
  8. 8.
    Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)Google Scholar
  9. 9.
    Liu, Z.Y.: Research on Keyword Extraction Using Document Topical Structure. Tsinghua University, Beijing (2011). (in Chinese)Google Scholar
  10. 10.
    Géry, M., Largeron, C.: BM25t: A BM25 extension for focused information retrieval. Knowl. Inf. Syst. 32(1), 1–25 (2012)CrossRefGoogle Scholar
  11. 11.
    Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institute of Network TechnologyBeijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations