Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages
In the paper, we proposed a general framework that could automatically extract key-phrases from a collection of web pages concerning a specific topic with the help of The Free Dictionary and then construct a personal knowledge base. Both the base and visual feature in a web page are used to calculate the weight of each candidate phrase. The system extracts top p% key-phrases for each web page based on these two features and then generates a term set using union operators. Next, the system builds the relationships between terms in the term set by referencing The Free Dictionary, and then generates a list of terms sorted by weights. With the top q terms specified by users, a semantic graph can be constructed to present the part of a personal knowledge base, which shows the relationships between terms from the same domain. Finally, the experimental results show that the key-phrases generated by the proposed extractor are with good quality and acceptable for humans.
Keywordskey-phrase extraction semantic graph learning mechanism term correlation POS
Unable to display preview. Download preview PDF.
- 1.D’Avanzo, E., Magnini, B.: A Keyphrase-based Approach to Summarization: the LAKE System at DUC-2005. In: Document Understanding Workshop (2005)Google Scholar
- 5.Kumar, N., Srinathan, K.: Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique. In: 8th ACM Symposium on Document Engineering, pp. 199–208 (2008)Google Scholar
- 7.Turney, P.D.: Coherent Keyphrase Extraction via Web Mining. In: 20th International Joint Conference on Artificial Intelligence, pp. 434–439 (2003)Google Scholar
- 8.Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: Practical Automatic Keyphrase Extraction. In: 4th ACM Conference on Digital Libraries, pp. 254–255 (1999)Google Scholar
- 10.Schmid, H.: Probabilistic Part-of-speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)Google Scholar
- 11.The Free Dictionary, http://www.thefreedictionary.com/
- 13.Zhang, Y., Milios, E., Zincir-Heywood, N.: Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora. In: 7th Annual ACM International Workshop on Web Information and Data Management, pp. 51–58 (2005)Google Scholar