MateTee: A Semantic Similarity Metric Based on Translation Embeddings for Knowledge Graphs

  • Camilo Morales
  • Diego CollaranaEmail author
  • Maria-Esther Vidal
  • Sören Auer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10360)


Large Knowledge Graphs (KGs), e.g., DBpedia or Wikidata, are created with the goal of providing structure to unstructured or semi-structured data. Having these special datasets constantly evolving, the challenge is to utilize them in a meaningful, accurate, and efficient way. Further, exploiting semantics encoded in KGs, e.g., class and property hierarchies, provides the basis for addressing this challenge and producing a more accurate analysis of KG data. Thus, we focus on the problem of determining relatedness among entities in KGs, which corresponds to a fundamental building block for any semantic data integration task. We devise MateTee, a semantic similarity measure that combines the gradient descent optimization method with semantics encoded in ontologies, to precisely compute values of similarity between entities in KGs. We empirically study the accuracy of MateTee with respect to state-of-the-art methods. The observed results show that MateTee is competitive in terms of accuracy with respect to existing methods, with the advantage that background domain knowledge is not required.


Gene Ontology Similarity Measure Connectivity Pattern Link Prediction Stochastic Gradient Descent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported in part by the European Union under the Horizon 2020 Framework Program for the project BigDataEurope (GA 644564) as well as by the German Ministry of Education and Research with grant no. 13N13627 for the project LiDaKrA. We thank Mikhail Galkin for creating the DBpedia collection used in our experiments, and Ignacio Traverso Ribón for his support on the experimental comparison with GADES.


  1. 1.
    Benik, J., Chang, C., Raschid, L., Vidal, M.-E., Palma, G., Thor, A.: Finding cross genome patterns in annotation graphs. In: Bodenreider, O., Rance, B. (eds.) DILS 2012. LNCS, vol. 7348, pp. 21–36. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31040-9_3 CrossRefGoogle Scholar
  2. 2.
    Bernstein, A., Hendler, J.A., Noy, N.F.: A new look at the semantic web. Commun. ACM 59(9), 35–37 (2016)CrossRefGoogle Scholar
  3. 3.
    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2787–2795. Curran Associates Inc. (2013)Google Scholar
  4. 4.
    Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases (2011)Google Scholar
  5. 5.
    Collarana, D., Galkin, M., Lange, C., Grangel-González, I., Vidal, M., Auer, S.: Fuhsen: a federated hybrid search engine for building a knowledge graph on-demand (short paper). In: Debruyne, C., et al. (eds.) OTM Conferences - ODBASE. LNCS, vol. 10033, pp. 752–761. Springer, Heidelberg (2016)Google Scholar
  6. 6.
    Couto, F.M., Silva, M.J., Coutinho, P.: Measuring semantic similarity between gene ontology terms. Data Knowl. Eng. 61(1), 137–152 (2007)Google Scholar
  7. 7.
    Devos, D., Valencia, A.: Practical limits of function prediction. Proteins: Struct., Funct., Bioinf. 41(1), 98–107 (2000)CrossRefGoogle Scholar
  8. 8.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2010). Society for Artificial Intelligence and Statistics (2010)Google Scholar
  9. 9.
    Grover, A., Leskovec,J.: node2vec: scalable feature learning for networks (2016). arXiv:1607.00653. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
  10. 10.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of 10th International Conference on Research in Computational Linguistics, ROCLING 1997 (1997)Google Scholar
  11. 11.
    Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98 (1998)Google Scholar
  12. 12.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, San Francisco, CA, USA, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  13. 13.
    Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)CrossRefGoogle Scholar
  14. 14.
    Pekar, V., Staab, S.: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In: COLING 2002 Proceedings of the 19th International Conference on Computational Linguistics, vol. 2, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  15. 15.
    Pesquita, C., Faria, D., Bastos, H., Falcão, A.O., Couto, F.M.: Evaluating go-based semantic similarity measures. In: Proceedings of the 10th Annual Bio-Ontologies Meeting (BIOONTOLOGIES), pp. 37–40 (2007)Google Scholar
  16. 16.
    Pesquita, C., Pessoa, D., Faria, D., Couto, F.: Cessm: collaborative evaluation of semantic similarity measures. JB2009: Challenges Bioinform. 157, 190 (2009)Google Scholar
  17. 17.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1998)zbMATHGoogle Scholar
  18. 18.
    Sevilla, J.L., Segura, V., Podhorski, A., Guruceaga, E., Mato, J.M., Martínez-Cruz, L.A., Corrales, F.J., Rubio, A.: Correlation between gene expression and go semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 330–338 (2005)Google Scholar
  19. 19.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRefGoogle Scholar
  20. 20.
    Traverso-Ribón, I., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: Exploiting relation and class taxonomy semantics to compute similarity in knowledge graphs. In: SEMANTICS (2016)Google Scholar
  21. 21.
    Traverso-Ribón, I., Vidal, M.-E.: Exploiting information content and semantics to accurately compute similarity of go-based annotated entities. In: IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB (2015)Google Scholar
  22. 22.
    Traverso-Ribón, I., Vidal, M.-E., Palma, G.: OnSim: a similarity measure for determining relatedness between ontology terms. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 70–86. Springer, Cham (2015). doi: 10.1007/978-3-319-21843-4_6 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Camilo Morales
    • 1
    • 2
  • Diego Collarana
    • 1
    • 2
    Email author
  • Maria-Esther Vidal
    • 2
    • 3
  • Sören Auer
    • 1
    • 2
  1. 1.Enterprise Information Systems (EIS)University of BonnBonnGermany
  2. 2.Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)Sankt AugustinGermany
  3. 3.Universidad Simón BolívarCaracasVenezuela

Personalised recommendations