Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages

  • Yin-Fu Huang
  • Cin-Siang Ciou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)


In the paper, we proposed a general framework that could automatically extract key-phrases from a collection of web pages concerning a specific topic with the help of The Free Dictionary and then construct a personal knowledge base. Both the base and visual feature in a web page are used to calculate the weight of each candidate phrase. The system extracts top p% key-phrases for each web page based on these two features and then generates a term set using union operators. Next, the system builds the relationships between terms in the term set by referencing The Free Dictionary, and then generates a list of terms sorted by weights. With the top q terms specified by users, a semantic graph can be constructed to present the part of a personal knowledge base, which shows the relationships between terms from the same domain. Finally, the experimental results show that the key-phrases generated by the proposed extractor are with good quality and acceptable for humans.


key-phrase extraction semantic graph learning mechanism term correlation POS 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D’Avanzo, E., Magnini, B.: A Keyphrase-based Approach to Summarization: the LAKE System at DUC-2005. In: Document Understanding Workshop (2005)Google Scholar
  2. 2.
    El-Beltagy, S.R., Rafea, A.: KP-Miner: a Keyphrase Extraction System for English and Arabic Documents. Information Systems 34(1), 132–144 (2009)CrossRefGoogle Scholar
  3. 3.
    HaCohen-Kerner, Y.: Automatic Extraction of Keywords from Abstracts. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773, pp. 843–849. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    HaCohen-Kerner, Y., Gross, Z., Masa, A.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Kumar, N., Srinathan, K.: Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique. In: 8th ACM Symposium on Document Engineering, pp. 199–208 (2008)Google Scholar
  6. 6.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 303–336 (2000)CrossRefGoogle Scholar
  7. 7.
    Turney, P.D.: Coherent Keyphrase Extraction via Web Mining. In: 20th International Joint Conference on Artificial Intelligence, pp. 434–439 (2003)Google Scholar
  8. 8.
    Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: Practical Automatic Keyphrase Extraction. In: 4th ACM Conference on Digital Libraries, pp. 254–255 (1999)Google Scholar
  9. 9.
    Zhang, K., Xu, H., Tang, J., Li, J.: eyword Extraction Using Support Vector Machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Schmid, H.: Probabilistic Part-of-speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)Google Scholar
  11. 11.
    The Free Dictionary,
  12. 12.
    Cao, L.: In-depth Behavior Understanding and Use: the Behavior Informatics Approach. Information Science 180(17), 3067–3085 (2010)CrossRefGoogle Scholar
  13. 13.
    Zhang, Y., Milios, E., Zincir-Heywood, N.: Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora. In: 7th Annual ACM International Workshop on Web Information and Data Management, pp. 51–58 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yin-Fu Huang
    • 1
  • Cin-Siang Ciou
    • 1
  1. 1.National Yunlin University of Science and TechnologyYunlinTaiwan

Personalised recommendations