GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics

Traverso-Ribón, Ignacio; Vidal, Maria-Esther

doi:10.1007/978-3-319-98809-2_11

Ignacio Traverso-Ribón¹⁸ &
Maria-Esther Vidal^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11029))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1282 Accesses
2 Citations

Abstract

Knowledge graphs encode semantics that describes entities in terms of several characteristics, e.g., attributes, neighbors, class hierarchies, or association degrees. Several data-driven tasks, e.g., ranking, clustering, or link discovery, require for determining the relatedness between knowledge graph entities. However, state-of-the-art similarity measures may not consider all the characteristics of an entity to determine entity relatedness. We address the problem of similarity assessment between knowledge graph entities and devise GARUM, a semantic similarity measure for knowledge graphs. GARUM relies on similarities of entity characteristics and computes similarity values considering simultaneously several entity characteristics. This combination can be manually or automatically defined with the help of a machine learning approach. We empirically evaluate the accuracy of GARUM on knowledge graphs from different domains, e.g., networks of proteins and media news. In the experimental study, GARUM exhibits higher correlation with gold standards than studied existing approaches. Thus, these results suggest that similarity measures should not consider entity characteristics in isolation; contrary, combinations of these characteristics are required to precisely determine relatedness among entities in a knowledge graph. Further, the combination functions found by a machine learning approach outperform the results obtained by the manually defined aggregation functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://dbpedia.org.
2.
http://yago-knowledge.org.
3.
https://www.w3.org/community/fibo/.
4.
https://github.com/chrispau1/SemRelDocSearch/blob/master/data/Pincombe_annotated_xLisa.json.
5.
http://xldb.di.fc.ul.pt/tools/cessm/index.php.
6.
http://xldb.fc.ul.pt/biotools/cessm2014/index.html.
7.
http://scikit-learn.org/stable/index.html.
8.
https://keras.io/.
9.
Due to the lack of training data GARUM could not be evaluated in CESSM 2014 with ECC and Pfam.
10.
Transversal relations correspond to object properties in the knowledge graph.

References

Benik, J., Chang, C., Raschid, L., Vidal, M.-E., Palma, G., Thor, A.: Finding cross genome patterns in annotation graphs. In: Bodenreider, O., Rance, B. (eds.) DILS 2012. LNCS, vol. 7348, pp. 21–36. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31040-9_3
Chapter Google Scholar
Gene Ontology Consortium, et al.: Gene ontology consortium: going forward. Nucleic Acids Res. 43(D1), D1049–D1056 (2015)
Google Scholar
Couto, F.M., Silva, M.J., Coutinho, P.M.: Measuring semantic similarity between Gene Ontology terms. Data Knowl. Eng. 61(1), 137–152 (2007)
Article Google Scholar
Damljanovic, D., Stankovic, M., Laublet, P.: Linked data-based concept recommendation: comparison of different methods in open innovation scenario. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 24–38. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_9
Chapter Google Scholar
Devos, D., Valencia, A.: Practical limits of function prediction. Prot.: Struct. Funct. Bioinform. 41(1), 98–107 (2000)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)
Google Scholar
Hassan, S., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: AAAI (2011)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint arXiv:cmp-lg/9709008 (1997)
Kazakov, Y.: SRIQ and SROIQ are harder than SHOIQ. In: Description Logics. CEUR Workshop Proceedings, vol. 353. CEUR-WS.org (2008)
Google Scholar
Köhler, S., et al.: The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), D966–D974 (2014)
Article Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Log. Q. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In: Proceedings of the 19th annual meeting of the Cognitive Science Society, pp. 412–417 (1997)
Google Scholar
Lee, M., Pincombe, B., Welsh, M.: An empirical evaluation of models of text document similarity. In: Cognitive Science (2005)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)
Google Scholar
Paul, C., Rettinger, A., Mogadala, A., Knoblock, C.A., Szekely, P.: Efficient graph-based document similarity. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 334–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_21
Chapter Google Scholar
Pekar, V., Staab, S.: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Pesquita, C., Faria, D., Bastos, H., Falcão, A., Couto, F.: Evaluating go-based semantic similarity measures. In: Proceedings of 10th Annual Bio-Ontologies Meeting, vol. 37, p. 38 (2007)
Google Scholar
Pesquita, C., Pessoa, D., Faria, D., Couto, F.: CESSM: collaborative evaluation of semantic similarity measures. JB2009: Chall. Bioinform. 157, 190 (2009)
Google Scholar
Resnik, P., et al.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. (JAIR) 11, 95–130 (1999)
Article Google Scholar
Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 543–552. ACM (2014)
Google Scholar
Sevilla, J.L., et al.: Correlation between gene expression and GO semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 330–338 (2005)
Article Google Scholar
Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)
Google Scholar
Traverso-Ribón, I., Vidal, M.: Exploiting information content and semantics to accurately compute similarity of GO-based annotated entities. In: IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB, pp. 1–8 (2015)
Google Scholar
Traverso-Ribón, I., Vidal, M.-E., Palma, G.: OnSim: a similarity measure for determining relatedness between ontology terms. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 70–86. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21843-4_6
Chapter Google Scholar

Download references

Acknowledgements

This work has been partially funded by the EU H2020 Programme for the Project No. 727658 (IASIS).

Author information

Authors and Affiliations

University of Cadiz, Cádiz, Spain
Ignacio Traverso-Ribón
L3S Research Center, Hanover, Germany
Maria-Esther Vidal
TIB Leibniz Information Center for Science and Technology, Hanover, Germany
Maria-Esther Vidal

Authors

Ignacio Traverso-Ribón
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ignacio Traverso-Ribón .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Regensburg, Regensburg, Germany
Günther Pernul
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Traverso-Ribón, I., Vidal, ME. (2018). GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11029. Springer, Cham. https://doi.org/10.1007/978-3-319-98809-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-98809-2_11
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98808-5
Online ISBN: 978-3-319-98809-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics