Exploring Words with Semantic Correlations from Chinese Wikipedia
In this paper, we work on semantic correlation between Chinese words based on Wikipedia documents. A corpus with about 50,000 structured documents is generated from Wikipedia pages. Then considering of hyper-links, text overlaps and word frequency, about 300,000 word pairs with semantic correlations are explored from these documents. We roughly measure the degree of semantic correlations and find groups with tight semantic correlations by self clustering.
KeywordsWord Frequency Word Pair Semantic Relatedness Candidate Group Chinese Word
- 1.D. Ahn, V. Jijkoun etc.: Using Wikipedia at the TREC QA track. In Proc. of TREC-13 (2004)Google Scholar
- 2.S. Banerjee, T. Pedersen: Extended gloss overlap as a measure of semantic relatedness. In Proc. of IJCAI-03 (2003)Google Scholar
- 3.M. Strube, SP. Ponzetto: WikiRelate! Computing semantic relatedness using Wikipedia Proc. of AAAI (2006)Google Scholar
- 4.R. Bunescu, M. Pasca: Using Encyclopedic Knowledge for Named Entity Disambiguation Proceedings of the 11th Conference of the European Chapter (2006)Google Scholar
- 5.SP. Ponzetto, M. Strube: Deriving a Large Scale Taxonomy from Wikipedia, Proceedings of the 22nd National Conference on Artificial (2007)Google Scholar