Novel Word Features for Keyword Extraction

  • Yiqun Chen
  • Jian YinEmail author
  • Weiheng Zhu
  • Shiding Qiu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, and information retrieval etc. This paper introduces novel word features for keyword extraction. These new word features are derived according to the background knowledge supplied by patent data. Given a document, to acquire its background knowledge, this paper first generates a query for searching the patent data based on the key facts present in the document. The query is used to find files in patent data that are closely related to the contents of the document. With the patent search result file set, the information of patent inventors, assignees, and citations in each file are used to mining the hidden knowledge and relationship between different patent files. Then the related knowledge is imported to extend the background knowledge base, which would be extracted to derive the novel word features. The newly introduced word features that reflect the document’s background knowledge offer valuable indications on individual words’ importance in the input document and serve as nice complements to the traditional word features derivable from explicit information of a document. The keyword extraction problem can then be regarded as a classification problem and the Support Vector Machine (SVM) is used to extract the keywords. Experiments have been done using two different data sets. The results show our method improves the performance of keyword extraction.


Keyword extraction Patent data Information retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    USPTO Bulk Downloads. Patent Grant Full Text. The United States: Patent and Trademark Office.
  2. 2.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: Proceedings of DL (1999)Google Scholar
  3. 3.
    Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceeding of Conference on Empirical Methods in Natural Language, pp. 404–411 (2004)Google Scholar
  4. 4.
    Page, L., Brin, S., Motwani, R., et al.: The pagerank citation ranking: Bringing order to the web. Technical report of Stanford Digital LibraryGoogle Scholar
  5. 5.
    Kleinberg, J.: Hubs, Authorities, and Communities. Cornell University (1999)Google Scholar
  6. 6.
    Mihalcea R., Csomai A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 233–242 (2007)Google Scholar
  7. 7.
    Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)Google Scholar
  8. 8.
    Qureshi, M.A., O’Riordan, C., Pasi, G.: Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia. In: Proceedings of ACM CIKM (2012)Google Scholar
  9. 9.
    Joorabchi, A., Mahdi, A.E.: Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms. In: EKAW 2012. LNCS (LNAI), vol. 7603, pp. 32–41. Springer, Heidelberg (2012)Google Scholar
  10. 10.
    Songhua, X., Yang, S., Lau, F.C.M.: Keyword Extraction and Headline Generation Using Novel Word Features. In: Proceedings of the 24th Conference on Artificial Intelligence, AAAI, Atlanta, Georgia, USA, July 11-15, pp. 1461–1466 (2011)Google Scholar
  11. 11.
    Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An Assessment of Online Semantic Annotators for the Keyword Extraction Task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 548–560. Springer, Heidelberg (2014)Google Scholar
  12. 12.
  13. 13.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity – measuring the relatedness of concepts. In: Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 1024–1025 (2004)Google Scholar
  14. 14.
    Miller, G.A.: Wordnet: a Lexical Database for English. Communications of the ACM (1995)Google Scholar
  15. 15.
  16. 16.
    LIBSVM – A Library for Support Vector Machines.
  17. 17.
    Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of SemEval (2010)Google Scholar
  18. 18.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)Google Scholar
  19. 19.
    El-Beltagy, S.R., Rafea, A.: Kp-miner: A keyphrase extraction system for english and Arabic documents. Information Systems (2009)Google Scholar
  20. 20.
    Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)Google Scholar
  21. 21.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yiqun Chen
    • 1
    • 2
  • Jian Yin
    • 1
    Email author
  • Weiheng Zhu
    • 3
  • Shiding Qiu
    • 3
  1. 1.Department of Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Department of Computer ScienceGuangdong University of EducationGuangzhouChina
  3. 3.College of Information Science TechnologyJinan UniversityGuangzhouChina

Personalised recommendations