A Semi-automatic Method for Domain Ontology Extraction from Portuguese Language Wikipedia’s Categories

  • Clarissa Castellã Xavier
  • Vera Lúcia Strube de Lima
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6404)


The increasing need for ontologies and the difficulties of manual construction give place to initiatives proposing methods for automatic and semi-automatic ontology learning. In this work we present a semi-automatic method for domain ontologies extraction from Wikipedia’s categories. In order to validate the method, we have conducted a case study in which we implemented a prototype generating a Tourism ontology. The results are evaluated against a manually built Golden Standard reporting 79.51% Precision and 91.95% Recall, comparable to those found in the literature for other languages.


ontologies Wikipedia semi-automatic ontology extraction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zareen, S., et al.: Wikipedia as an Ontology for Describing Documents.In: Proceedings of the Second International Conference on Weblogs and Social Media (2008)Google Scholar
  2. 2.
    Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 41–50. ACM, New York (2007)CrossRefGoogle Scholar
  3. 3.
    Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (1993)CrossRefGoogle Scholar
  4. 4.
    Guarino, N.: Formal Ontology. In: Proceedings of the 1st International Conference on Information Systems. IOS Press, Trento (1998)Google Scholar
  5. 5.
    Smith, B., Welty, C.: FOIS introduction: Ontology- towards a new synthesis. In: Proceedings of the International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, USA, October 17-19. ACM, New York (2001)Google Scholar
  6. 6.
    Krötzsch, M., Cić, D., Völkel, M.: Wikipedia and semantic web - the missing links (2005)Google Scholar
  7. 7.
    Maedche, A.D.: Ontology Learning for the Semantic Web. Kluwer Academic Publishers, Dordrecht (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    Hepp, M., Bachlechner, D., Siorpaer, K.: Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements. IEEE Internet Computing 11(5), 54–65 (2007)CrossRefGoogle Scholar
  9. 9.
    Lima, V., Nunes, M., Vieira, R.: Desafios do Processamento de Línguas Naturais. In: SEMISH - XXXIV Seminário Integrado de Software e Hardware, Anais do XXVII Congresso da SBC, Rio de Janeiro (2007)Google Scholar
  10. 10.
    Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipédia. In: Proceedings of the 15th International Conference on World Wide Web, pp. 585–594 (2006)Google Scholar
  11. 11.
    Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proc. of the 17th Int. Conf. on WWW, pp. 635–644. ACM, New York (2008)Google Scholar
  12. 12.
    Ponzetto, S., Strube, M.: Knowledge Derived from Wikipedia for Computing Semantic Relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)zbMATHGoogle Scholar
  13. 13.
    Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1440–1445. AAAI Press, Menlo Park (2007)Google Scholar
  14. 14.
    Nastase, V., Strube, M.: Decoding Wikipedia Categories for Knowledge Acquisition. In: Twenty-Third AAAI Conference on Artificial Intelligence, pp.1219–1224 (2008)Google Scholar
  15. 15.
    Ponzetto, S.P., Strube, M.: WikiTaxonomy: A Large Scale Knowledge Resource. In: Ghallab, M., Spyropoulos, C.D., Fakotakis, N., Avouris, N. (eds.) Proceeding of the 2008 Conference on ECAI 2008. Frontiers in Artificial Intelligence and Applications, vol. 178, pp. 751–752. IOS Press, Amsterdam (2008)Google Scholar
  16. 16.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Large Ontology from Wikipedia and WordNet. Web Semant. 6(3), 203–217 (2008)CrossRefGoogle Scholar
  17. 17.
    Syed, Z., Finin, T., Joshi, A.: Wikipedia as an Ontology for Describing Documents. In: Proceedings of the International Conference on Weblogs and Social Media (2008)Google Scholar
  18. 18.
    Zirn, C., Nastase, V., Strube, M.: Distinguishing between Instances and Classes in the Wikipedia Taxonomy. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 376–387. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Nastase, V., Strube, M.: Decoding Wikipedia Categories for Knowledge Acquisition. In: AAAI 2008, pp. 1219–1224 (2008)Google Scholar
  20. 20.
    Gruber, T.R.: Ontolingua: A Mechanism to Support Portable Ontologies. Technical Report (1992)Google Scholar
  21. 21.
    Horridge, C., Knublauch, H., Rector, A., Stevens, R., Wroe, C.: A practical guide to building OWL Ontologies using the Protégé OWL Plug-in and CO-ODE Tools (2004),
  22. 22.
    Miller, G.A., Hristea, F.: WordNet nouns: Classes and instances. Computational Linguistics 32(1), 1–3 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Clarissa Castellã Xavier
    • 1
  • Vera Lúcia Strube de Lima
    • 1
  1. 1.Faculdade de InformáticaPUCRSPorto AlegreBrazil

Personalised recommendations