Acquiring Thesauri from Wikis by Exploiting Domain Models and Lexical Substitution

  • Claudio Giuliano
  • Alfio Massimiliano Gliozzo
  • Aldo Gangemi
  • Kateryna Tymoshenko
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6089)


Acquiring structured data from wikis is a problem of increasing interest in knowledge engineering and Semantic Web. In fact, collaboratively developed resources are growing in time, have high quality and are constantly updated. Among these problems, an area of interest is extracting thesauri from wikis. A thesaurus is a resource that lists words grouped together according to similarity of meaning, generally organized into sets of synonyms. Thesauri are useful for a large variety of applications, including information retrieval and knowledge engineering. Most information in wikis is expressed by means of natural language texts and internal links among Web pages, the so-called wikilinks. In this paper, an innovative method for inducing thesauri from Wikipedia is presented. It leverages on the Wikipedia structure to extract concepts and terms denoting them, obtaining a thesaurus that can be profitably used into applications. This method boosts sensibly precision and recall if applied to re-rank a state-of-the-art baseline approach. Finally, we discuss how to represent the extracted results in RDF/OWL, with respect to existing good practices.


Semantic Relatedness Latent Semantic Analysis Semantic Domain Lexical Semantic Anchor Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Linguistic Data Consortium, Philadelphia (2006)Google Scholar
  3. 3.
    Buitelaar, P., Cimiano, P. (eds.): Ontology Learning and Population: Bridging the Gap between Text and Knowledge. Frontiers in Artificial Intelligence and Applications, vol. 167. IOS Press, Amsterdam (2008)zbMATHGoogle Scholar
  4. 4.
    Buitelaar, P., Cimiano, P., Magnini, B. (eds.): Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications, vol. 123. IOS Press, Amsterdam (2005)Google Scholar
  5. 5.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001),
  6. 6.
    Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, Pennsylvania, USA, July 2002, pp. 59–66. Association for Computational Linguistics (2002)Google Scholar
  7. 7.
    Dagan, I.: Contextual Word Similarity, ch. 19, pp. 459–476. Marcel Dekker Inc., New York (2000)Google Scholar
  8. 8.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  9. 9.
    Fellbaum, C.: WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  10. 10.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
  11. 11.
    Giuliano, C., Gliozzo, A., Strapparava, C.: Fbk-irst: Lexical substitution task exploiting domain and syntagmatic coherence. In: Proceedings of the Fourth International Workshop on Semantic Evaluations, SemEval 2007, Prague, Czech Republic, June 2007, pp. 145–148 (2007)Google Scholar
  12. 12.
    Giuliano, C., Gliozzo, A.M., Strapparava, C.: Fbk-irst: Lexical substitution task exploiting domain and syntagmatic coherence. In: Fourth International Workshop on Semantic Evaluations, SemEval 2007. ACL (2007)Google Scholar
  13. 13.
    Gliozzo, A.: Semantic Domains in Computational Linguistics. PhD thesis, University of Trento (2005)Google Scholar
  14. 14.
    Gliozzo, A., Pennacchiotti, M., Pantel, P.: The domain restriction hypothesis: Relating term similarity and semantic consistency. In: Proceedings of NAACL-HLT (2006)Google Scholar
  15. 15.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France (July 1992)Google Scholar
  16. 16.
    Ito, M., Nakayama, K., Hara, T., Nishio, S.: Association thesaurus construction methods based on link co-occurrence analysis for wikipedia. In: CIKM 2008: Proceeding of the 17th ACM conference on Information and knowledge management, pp. 817–826. ACM, New York (2008)CrossRefGoogle Scholar
  17. 17.
    McCarthy, D., Navigli, R.: Semeval-2007 task 10: English lexical substitution task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, June 2007, pp. 48–53. Association for Computational Linguistics (2007)Google Scholar
  18. 18.
    Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence, WIKIAI 2008 (2008)Google Scholar
  19. 19.
    Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scaleweb dictionaries. In: 21st International Conference on Advanced Networking and Applications (AINA 2007), pp. 932–939 (2007)Google Scholar
  20. 20.
    Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI. AAAI Press, Menlo Park (2006)Google Scholar
  21. 21.
    Suchanek, F., Kasneci, G., Weikum, G.: Yago - a large ontology from wikipedia and wordnet. Elsevier Journal of Web Semantics 6(3), 203–217 (2008)CrossRefGoogle Scholar
  22. 22.
    Yang, D., Powers, D.M.: Automatic thesaurus construction. In: Dobbie, G., Mans, B. (eds.) Thirty-First Australasian Computer Science Conference (ACSC 2008), CRPIT, Wollongong, NSW, Australia, vol. 74, pp. 147–156. ACS (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Claudio Giuliano
    • 1
  • Alfio Massimiliano Gliozzo
    • 2
  • Aldo Gangemi
    • 2
  • Kateryna Tymoshenko
    • 1
  1. 1.FBKTrentoItaly
  2. 2.STLab-CNRRomeItaly

Personalised recommendations