Skip to main content

Exploit Semantic Information for Category Annotation Recommendation in Wikipedia

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

Compared with plain-text resources, the ones in “semi-semantic” web sites, such as Wikipedia, contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper, we propose a “collaborative annotating” approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach, four typical semantic features in Wikipedia, namely incoming link, outgoing link, section heading and template item, are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating, with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Voss, J.: Collaborative thesaurus tagging the Wikipedia way. Wikimetrics (2006)

    Google Scholar 

  2. Ruiz-Casado, M.: From Wikipedia to Semantic Relationships: a semi-automated Annotation Approach. SemWiki (2006)

    Google Scholar 

  3. Lee, T.B., Hardler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (2001)

    Google Scholar 

  4. Melville, P., Mooney, R.J., Nagarajan, R.: Content-Boosted Collaborative Filtering for Improved Recommendations. AAAI (2002)

    Google Scholar 

  5. Buitelaar, P.: Ontology Learning from Text. Tutorial at ECML/PKDD (2005)

    Google Scholar 

  6. Uren, V., Cimiano, P., Iria, J., Handschuh, S., Ciravegna, F.: Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics (2005)

    Google Scholar 

  7. Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic Annotation of Content-Rich HTML Documents:Sementic Analysis. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)

    Google Scholar 

  8. Erdmann, M., Maedche, A.: From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools. Semantic Annotation (2000)

    Google Scholar 

  9. Kiryakov, A., Popov, B., Terziev, I.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics (2004)

    Google Scholar 

  10. Cimiano, P., Handschuh, S., Staab, S.: Towards the SelfAnnotating Web. In: WWW 2004 (2004)

    Google Scholar 

  11. Cimiano, P., Handschuh, S., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with C-PANKOW. WWW (2005)

    Google Scholar 

  12. Marques, O., Barman, N.: Semi-Automatic Semantic Annotation of Images Using Machine Learning Techniques. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)

    Google Scholar 

  13. Ruiz-Casado, M.: Automatic Extraction of Semantic Relationships forWordNet by Means of Pattern Learning fromWikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, Springer, Heidelberg (2005)

    Google Scholar 

  14. Adafre, S.F., de Rijke, M.: Discovering Missing Links in Wikipedia. LinkKDD (2005)

    Google Scholar 

  15. Chernov, S., Iofciu, T.: Extracting Semantic Relationships between Wikipedia Categories. SemWiki (2006)

    Google Scholar 

  16. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data. In: IJCAI 2001 (2001)

    Google Scholar 

  17. Fan, W., Gordon, M.D.: Ranking Function Optimization For Effective Web Search By Genetic Programming: An Empirical Study. In: HICSS 2003 (2003)

    Google Scholar 

  18. Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Proceeding of Text Information Rerieval 2004 (2004)

    Google Scholar 

  19. Liddy, E.D., Paik, W., Yu, E.S.: Text Categorization for Multiple Users Based on Semantic Features from a Machine-Readable Dictionary. ACM TransactIons on Information Systems 12(3), 278–295 (1994)

    Article  Google Scholar 

  20. Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Journal of Machine Learning 6, 37–66 (1991)

    Google Scholar 

  21. Hepp, M.: Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements. In: Processing of ESWC workshop, SemWiki 2006 (2006)

    Google Scholar 

  22. Vöel, M., Krözsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: WWW 2006 (2006)

    Google Scholar 

  23. Denoyer, L.: The Wikipedia XML Corpus. SIGIR Forum 2006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Y., Wang, H., Zhu, H., Yu, Y. (2007). Exploit Semantic Information for Category Annotation Recommendation in Wikipedia. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73351-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73350-8

  • Online ISBN: 978-3-540-73351-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics