Knowledge Graph Embeddings over Hundreds of Linked Datasets

  • Michalis MountantonakisEmail author
  • Yannis Tzitzikas
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1057)


There is an increasing trend of using Linked Datasets for creating embeddings from URI sequences, since such embeddings can be exploited for several tasks, i.e., for machine learning problems, tasks related to content-based similarity, and others. Existing techniques exploit either a single or a few datasets (or RDF graphs) for creating URI sequences for one or more entities. However, there are not available approaches, where data from multiple datasets are combined, for enriching the URI sequences for a given entity. For this reason, we introduce a prototype, called LODVec, that exploits LODsyndesis knowledge graph, which is the largest knowledge graph including all inferred equivalence relationships. LODVec exploits this graph for creating URI sequences for millions of entities by combining data from 400 datasets, whereas it offers several configurable options for creating such URI sequences that are based on metadata (e.g., provenance). Moreover, it uses as input the produced URI sequences for creating URI embeddings through word2vec model. We evaluate the gain of exploiting several datasets (instead of a single or few ones) and the impact of cross-dataset reasoning for machine-learning based tasks (i.e., classification and regression), and we compare the effectiveness of several configurations and machine learning models.


URI embeddings Multiple datasets Machine learning 



The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under the HFRI PhD Fellowship grant (GA. No. 166).


  1. 1.
    Antoniou, G., van Harmelen, F.: A Semantic Web Primer, 2nd edn. The MIT Press, Cambridge (2008)Google Scholar
  2. 2.
    Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Biased graph walks for RDF graph embeddings. In: WIMS, p. 21. ACM (2017)Google Scholar
  3. 3.
    Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). Scholar
  4. 4.
    Dietze, S., Mohapatra, N., Iosifidis, V., Ekbal, A., Fafalios, P.: Time-aware and corpus-specific entity relatedness, pp. 33–39 (2018)Google Scholar
  5. 5.
    Hajra, A., Tochtermann, K.: Linking science: approaches for linking scientific publications across different LOD repositories. IJMSO 12(2–3), 124–141 (2017)CrossRefGoogle Scholar
  6. 6.
    Inan, E., Dikenelli, O.: Effect of enriched ontology structures on RDF embedding-based entity linking. In: Garoufallou, E., Virkus, S., Siatri, R., Koutsomiha, D. (eds.) MTSR 2017. CCIS, vol. 755, pp. 15–24. Springer, Cham (2017). Scholar
  7. 7.
    Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI Conference (2015)Google Scholar
  8. 8.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  10. 10.
    Mountantonakis, M., Tzitzikas, Y.: How linked data can aid machine learning-based tasks. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 155–168. Springer, Cham (2017). Scholar
  11. 11.
    Mountantonakis, M., Tzitzikas, Y.: High performance methods for linked open data connectivity analytics. Information 9(6), 134 (2018)CrossRefGoogle Scholar
  12. 12.
    Mountantonakis, M., Tzitzikas, Y.: LODsyndesis: global scale knowledge services. Heritage 1(2), 335–348 (2018)CrossRefGoogle Scholar
  13. 13.
    Mountantonakis, M., Tzitzikas, Y.: Large scale semantic integration of linked data: a survey. ACM Comput. Surv. 52, 103 (2019)CrossRefGoogle Scholar
  14. 14.
    Nechaev, Y., Corcoglioniti, F., Giuliano, C.: Type prediction combining linked open data and social media. In: CIKM, pp. 1033–1042. ACM (2018)Google Scholar
  15. 15.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of EMNLP Conference, pp. 1532–1543 (2014)Google Scholar
  16. 16.
    Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. J. Web Semant. 35, 142–151 (2015)CrossRefGoogle Scholar
  17. 17.
    Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semant. Web 10(4), 721–752 (2019)CrossRefGoogle Scholar
  18. 18.
    Ristoski, P., de Vries, G.K.D., Paulheim, H.: A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 186–194. Springer, Cham (2016). Scholar
  19. 19.
    Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI Conference on Artificial Intelligence (2014)Google Scholar
  20. 20.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Computer ScienceFORTH-ICSHeraklionGreece
  2. 2.Computer Science DepartmentUniversity of CreteHeraklionGreece

Personalised recommendations