Abstract
Compared with plain-text resources, the ones in “semi-semantic” web sites, such as Wikipedia, contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper, we propose a “collaborative annotating” approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach, four typical semantic features in Wikipedia, namely incoming link, outgoing link, section heading and template item, are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating, with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Voss, J.: Collaborative thesaurus tagging the Wikipedia way. Wikimetrics (2006)
Ruiz-Casado, M.: From Wikipedia to Semantic Relationships: a semi-automated Annotation Approach. SemWiki (2006)
Lee, T.B., Hardler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (2001)
Melville, P., Mooney, R.J., Nagarajan, R.: Content-Boosted Collaborative Filtering for Improved Recommendations. AAAI (2002)
Buitelaar, P.: Ontology Learning from Text. Tutorial at ECML/PKDD (2005)
Uren, V., Cimiano, P., Iria, J., Handschuh, S., Ciravegna, F.: Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics (2005)
Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic Annotation of Content-Rich HTML Documents:Sementic Analysis. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)
Erdmann, M., Maedche, A.: From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools. Semantic Annotation (2000)
Kiryakov, A., Popov, B., Terziev, I.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics (2004)
Cimiano, P., Handschuh, S., Staab, S.: Towards the SelfAnnotating Web. In: WWW 2004 (2004)
Cimiano, P., Handschuh, S., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with C-PANKOW. WWW (2005)
Marques, O., Barman, N.: Semi-Automatic Semantic Annotation of Images Using Machine Learning Techniques. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)
Ruiz-Casado, M.: Automatic Extraction of Semantic Relationships forWordNet by Means of Pattern Learning fromWikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, Springer, Heidelberg (2005)
Adafre, S.F., de Rijke, M.: Discovering Missing Links in Wikipedia. LinkKDD (2005)
Chernov, S., Iofciu, T.: Extracting Semantic Relationships between Wikipedia Categories. SemWiki (2006)
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data. In: IJCAI 2001 (2001)
Fan, W., Gordon, M.D.: Ranking Function Optimization For Effective Web Search By Genetic Programming: An Empirical Study. In: HICSS 2003 (2003)
Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Proceeding of Text Information Rerieval 2004 (2004)
Liddy, E.D., Paik, W., Yu, E.S.: Text Categorization for Multiple Users Based on Semantic Features from a Machine-Readable Dictionary. ACM TransactIons on Information Systems 12(3), 278–295 (1994)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Journal of Machine Learning 6, 37–66 (1991)
Hepp, M.: Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements. In: Processing of ESWC workshop, SemWiki 2006 (2006)
Vöel, M., Krözsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: WWW 2006 (2006)
Denoyer, L.: The Wikipedia XML Corpus. SIGIR Forum 2006 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Wang, H., Zhu, H., Yu, Y. (2007). Exploit Semantic Information for Category Annotation Recommendation in Wikipedia. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-73351-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)