Abstract
Ontologies are tools for describing and structuring knowledge, with many applications in searching and analyzing complex knowledge bases. Since building them manually is a costly process, there are various approaches for bootstrapping ontologies automatically through the analysis of appropriate documents. Such an analysis needs to find the concepts and the relationships that should form the ontology. However, since relationship extraction methods are imprecise and cannot homogeneously cover all concepts, the initial set of relationships is usually inconsistent and rather imbalanced - a problem which, to the best of our knowledge, was mostly ignored so far. In this paper, we define the problem of extracting a consistent as well as properly structured ontology from a set of inconsistent and heterogeneous relationships. Moreover, we propose and compare three graph-based methods for solving the ontology extraction problem. We extract relationships from a large-scale data set of more than 325K documents and evaluate our methods against a gold standard ontology comprising more than 12K relationships. Our study shows that an algorithm based on a modified formulation of the dominating set problem outperforms greedy methods.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brank, J., Grobelnik, M., Mladenić, D.: A Survey of Ontology Evaluation Techniques. In: Proc. Conf. Data Mining and Data Warehouses (2005)
Buyko, E., Wermter, J., Poprat, M., Hahn, U.: Automatically Adapting an NLP Core Engine to the Biology Domain. In: Proc. of the ISMB 2006: BioLink & Bio-Ontoligies SIG Meeting (2006)
Cimiano, P., Pivk, A., Schmidt-Thieme, L., Staab, S.: Learning Taxonomic Relations from Heterogeneous Sources of Evidence, ch. II.4, pp. 59–76. IOS Press Publication, Amsterdam (2003)
Eilbeck, K., Lewis, S.E., Mungall, C.J., Yandell, M., Stein, L., Durbin, R., Ashburner, M.: The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6 (2005)
Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)
Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res 11, 1425–1433 (2001)
Groth, P., Pavlova, N., Kalev, I., Tonov, S., Georgiev, G., Pohlenz, H.-D., Weiss, B.: PhenomicDB: a new cross-species genotype/phenotype resource. Nucl. Acids Res. 35, 696–699 (2007)
Groth, P., Weiss, B.: Phenotype Data: A Neglected Resource in Biomedical Research? Current Bioinformatics, vol. 1, pp. 347–358 (2006)
Gulla, J.A., Brasethvik, T.: A Hybrid Approach to Ontology Relationship Learning. In: Proc. of the 13th Int. Conf. on Natural Language and Information Systems, pp. 79–90. Springer, Heidelberg (2008)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of the 14th Conf. on Computational Linguistics. Association for Computational Linguistics (1992)
Jurisica, I., Mylopoulos, J., Yu, E.: Ontologies for Knowledge Management: An Information Systems Perspective. Knowl. Inf. Syst. 6, 380–401 (2004)
Kasneci, G., Suchanek, F.M., Ifrim, G., Ramanath, M., Weikum, G.: NAGA: Searching and Ranking Knowledge. In: Proc. of the 24th Int. Conf. on Data Engineering, pp. 953–962 (2008)
Krishna, K., Krishnapuram, R.: A clustering algorithm for asymmetrically related data with applications to text mining. In: Proc. of the 10th Int. Conf. on Information and Knowledge Management, pp. 571–573. ACM, New York (2001)
Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: Proc. of the 24th Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 349–357. ACM Press, New York (2001)
Maedche, A., Staab, S.: Ontology Learning for the Semantic Web. IEEE Intelligent Systems 16, 72–79 (2001)
Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Comput. Linguist. 30, 151–179 (2004)
Porter, M.F.: An algorithm for suffix stripping. Readings in Information Retrieval, 313–316 (1997)
Rijsbergen, C.J.v.: Information retrieval. Butterworths (1979)
Sanchis, L.A.: Exoerimental Analysis of Heuristic Algorithms for the Dominating Set Problem. Algorithmica 33, 3–18 (2002)
Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proc. of the 22nd Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 206–213. ACM Press, New York (1999)
Schmitz, P.: Inducing Ontology from Flickr Tags. In: Proc. of the Collaborative Web Tagging Workshop (WWW 2006). IW3C2 (2006)
Smith, C.L., Goldsmith, C.A., Eppig, J.T.: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 6 (2005)
Supekar, K., Patel, C., Lee, Y.: Characterizing Quality of Knowledge on Semantic Web. In: Proc. of the Seventeenth Int. Florida Artificial Intelligence Research Society Conf., (2004)
Tartir, S., Arpinar, I.B., Moore, M., Sheth, A.P., Aleman-Meza, B.: OntoQA: Metric-Based Ontology Quality Analysis. In: Proc. of IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources (2005)
Zhou, L.: Ontology learning: state of the art and open issues. Information Technology and Management 8, 241–252 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Böhm, C., Groth, P., Leser, U. (2009). Graph-Based Ontology Construction from Heterogenous Evidences. In: Bernstein, A., et al. The Semantic Web - ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol 5823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04930-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-04930-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04929-3
Online ISBN: 978-3-642-04930-9
eBook Packages: Computer ScienceComputer Science (R0)