Skip to main content

GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11029))

Included in the following conference series:

Abstract

Knowledge graphs encode semantics that describes entities in terms of several characteristics, e.g., attributes, neighbors, class hierarchies, or association degrees. Several data-driven tasks, e.g., ranking, clustering, or link discovery, require for determining the relatedness between knowledge graph entities. However, state-of-the-art similarity measures may not consider all the characteristics of an entity to determine entity relatedness. We address the problem of similarity assessment between knowledge graph entities and devise GARUM, a semantic similarity measure for knowledge graphs. GARUM relies on similarities of entity characteristics and computes similarity values considering simultaneously several entity characteristics. This combination can be manually or automatically defined with the help of a machine learning approach. We empirically evaluate the accuracy of GARUM on knowledge graphs from different domains, e.g., networks of proteins and media news. In the experimental study, GARUM exhibits higher correlation with gold standards than studied existing approaches. Thus, these results suggest that similarity measures should not consider entity characteristics in isolation; contrary, combinations of these characteristics are required to precisely determine relatedness among entities in a knowledge graph. Further, the combination functions found by a machine learning approach outperform the results obtained by the manually defined aggregation functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dbpedia.org.

  2. 2.

    http://yago-knowledge.org.

  3. 3.

    https://www.w3.org/community/fibo/.

  4. 4.

    https://github.com/chrispau1/SemRelDocSearch/blob/master/data/Pincombe_annotated_xLisa.json.

  5. 5.

    http://xldb.di.fc.ul.pt/tools/cessm/index.php.

  6. 6.

    http://xldb.fc.ul.pt/biotools/cessm2014/index.html.

  7. 7.

    http://scikit-learn.org/stable/index.html.

  8. 8.

    https://keras.io/.

  9. 9.

    Due to the lack of training data GARUM could not be evaluated in CESSM 2014 with ECC and Pfam.

  10. 10.

    Transversal relations correspond to object properties in the knowledge graph.

References

  1. Benik, J., Chang, C., Raschid, L., Vidal, M.-E., Palma, G., Thor, A.: Finding cross genome patterns in annotation graphs. In: Bodenreider, O., Rance, B. (eds.) DILS 2012. LNCS, vol. 7348, pp. 21–36. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31040-9_3

    Chapter  Google Scholar 

  2. Gene Ontology Consortium, et al.: Gene ontology consortium: going forward. Nucleic Acids Res. 43(D1), D1049–D1056 (2015)

    Google Scholar 

  3. Couto, F.M., Silva, M.J., Coutinho, P.M.: Measuring semantic similarity between Gene Ontology terms. Data Knowl. Eng. 61(1), 137–152 (2007)

    Article  Google Scholar 

  4. Damljanovic, D., Stankovic, M., Laublet, P.: Linked data-based concept recommendation: comparison of different methods in open innovation scenario. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 24–38. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_9

    Chapter  Google Scholar 

  5. Devos, D., Valencia, A.: Practical limits of function prediction. Prot.: Struct. Funct. Bioinform. 41(1), 98–107 (2000)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)

    Google Scholar 

  7. Hassan, S., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: AAAI (2011)

    Google Scholar 

  8. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint arXiv:cmp-lg/9709008 (1997)

  9. Kazakov, Y.: SRIQ and SROIQ are harder than SHOIQ. In: Description Logics. CEUR Workshop Proceedings, vol. 353. CEUR-WS.org (2008)

    Google Scholar 

  10. Köhler, S., et al.: The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), D966–D974 (2014)

    Article  Google Scholar 

  11. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Log. Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  12. Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In: Proceedings of the 19th annual meeting of the Cognitive Science Society, pp. 412–417 (1997)

    Google Scholar 

  13. Lee, M., Pincombe, B., Welsh, M.: An empirical evaluation of models of text document similarity. In: Cognitive Science (2005)

    Google Scholar 

  14. Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)

    Google Scholar 

  15. Paul, C., Rettinger, A., Mogadala, A., Knoblock, C.A., Szekely, P.: Efficient graph-based document similarity. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 334–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_21

    Chapter  Google Scholar 

  16. Pekar, V., Staab, S.: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  17. Pesquita, C., Faria, D., Bastos, H., Falcão, A., Couto, F.: Evaluating go-based semantic similarity measures. In: Proceedings of 10th Annual Bio-Ontologies Meeting, vol. 37, p. 38 (2007)

    Google Scholar 

  18. Pesquita, C., Pessoa, D., Faria, D., Couto, F.: CESSM: collaborative evaluation of semantic similarity measures. JB2009: Chall. Bioinform. 157, 190 (2009)

    Google Scholar 

  19. Resnik, P., et al.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. (JAIR) 11, 95–130 (1999)

    Article  Google Scholar 

  20. Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 543–552. ACM (2014)

    Google Scholar 

  21. Sevilla, J.L., et al.: Correlation between gene expression and GO semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 330–338 (2005)

    Article  Google Scholar 

  22. Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)

    Article  Google Scholar 

  23. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  24. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)

    Google Scholar 

  25. Traverso-Ribón, I., Vidal, M.: Exploiting information content and semantics to accurately compute similarity of GO-based annotated entities. In: IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB, pp. 1–8 (2015)

    Google Scholar 

  26. Traverso-Ribón, I., Vidal, M.-E., Palma, G.: OnSim: a similarity measure for determining relatedness between ontology terms. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 70–86. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21843-4_6

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the EU H2020 Programme for the Project No. 727658 (IASIS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ignacio Traverso-Ribón .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Traverso-Ribón, I., Vidal, ME. (2018). GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11029. Springer, Cham. https://doi.org/10.1007/978-3-319-98809-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98809-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98808-5

  • Online ISBN: 978-3-319-98809-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics