Knowledge graph construction from multiple online encyclopedias

Abstract

In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Notes

  1. 1.

    http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

  2. 2.

    https://baike.baidu.com/

  3. 3.

    http://www.baike.com/

  4. 4.

    http://www.doopedia.co.kr/

  5. 5.

    https://www.ecured.cu/

  6. 6.

    https://zh.wikipedia.org/

  7. 7.

    https://stream.wikimedia.org/v2/stream/recentchange/

  8. 8.

    http://www.geonames.org/

  9. 9.

    http://www.omegawiki.org/

  10. 10.

    https://www.wiktionary.org/

  11. 11.

    http://dumps.wikimedia.org/

  12. 12.

    http://www.mediawiki.org/wiki/Extension:OAIRepository

  13. 13.

    We directly use Google Translate: https://translate.google.com/.

  14. 14.

    owl:sameAs denotes the equality relation (between individual entities) defined by the W3C Web Ontology Language.

  15. 15.

    The Dempster’s rule has the best performance in combining entity matching rules in our previous work [24] when comparing with other combination methods.

  16. 16.

    http://oaei.ontologymatching.org/

  17. 17.

    http://zhishi.me/

  18. 18.

    https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/linking/

References

  1. 1.

    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. Proc. of VLDB 1215, 487–499 (1994)

    Google Scholar 

  2. 2.

    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  3. 3.

    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia-a crystallization point for the Web of data. J. Web Semantics 7(3), 154–165 (2009)

    Article  Google Scholar 

  4. 4.

    Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci., 101–117 (2001)

  5. 5.

    Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proc. of IJCAI, pp 1511–1517 (2017)

  6. 6.

    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B (Methodological), 1–38 (1977)

  7. 7.

    Euzenat, J., Shvaiko, P.: Ontology Matching. Springer (2007)

  8. 8.

    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  9. 9.

    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)

    Article  Google Scholar 

  10. 10.

    Hellmann, S., Stadler, C., Lehmann, J., Auer, S.: DBpedia live extraction (2009)

  11. 11.

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. 12.

    Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic Web. In: Proc. of WWW, pp 87–96 (2011)

  13. 13.

    Hu, W., Jia, C.: A bootstrapping approach to entity linkage on the semantic Web. J. Web Semantics 34, 1–12 (2015)

    Article  Google Scholar 

  14. 14.

    Jin, H., Li, C., Zhang, J., Hou, L., Li, J., Zhang, P.: XLORE2: Large-scale cross-lingual knowledge graph construction and application. Data Intell. 1 (1), 77–98 (2019)

    Article  Google Scholar 

  15. 15.

    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al: Dbpedia–A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)

    Article  Google Scholar 

  16. 16.

    Liang, J., Zhang, S., Xiao, Y.: How to keep a knowledge base synchronized with its encyclopedia source. In: Proc. of IJCAI, pp 3749–3755 (2017)

  17. 17.

    Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: A knowledge base from multilingual wikipedias. In: Proc. of CIDR (2013)

  18. 18.

    Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proc. of SLT, pp 234–239 (2013)

  19. 19.

    Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link discovery frameworks. Semantic Web 8(3), 419–436 (2017)

    Article  Google Scholar 

  21. 21.

    Ngomo, A.C.N., Auer, S.: LIMES-a time-efficient approach for large-scale link discovery on the Web of data. In: Proc. of IJCAI, pp 2312–2317 (2011)

  22. 22.

    Nikolov, A., Uren, V., Motta, E.: KnoFuss: A comprehensive architecture for knowledge fusion. In: Proc. of K-CAP, pp 185–186 (2007)

  23. 23.

    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi. me-Weaving Chinese linking open data. In: Proc. of ISWC, Part II, pp 205–220 (2011)

  24. 24.

    Niu, X., Rong, S., Wang, H., Yu, Y.: An effective rule miner for instance matching in a Web of data. In: Proc. of CIKM, pp 1085–1094 (2012)

  25. 25.

    Rico, M., Mihindukulasooriya, N., Gómez-Pérez, A.: Data-driven RDF property semantic-equivalence detection using NLP techniques. In: European Knowledge Acquisition Workshop, pp 797–804 (2016)

  26. 26.

    Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)

    Google Scholar 

  27. 27.

    Sherif, M.A., Ngomo, A.C.N., Lehmann, J.: WOMBAT–A generalization approach for automatic link discovery. In: European Semantic Web Conference, pp 103–119 (2017)

  28. 28.

    Sun, Z., Hu, W., Li, C.: Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proc. of ISWC, Part I, pp 628–644 (2017)

  29. 29.

    Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowledge graph embedding. In: Proc. of IJCAI, pp 4396–4402 (2018)

  30. 30.

    Völker, J., Niepert, M.: Statistical schema induction. In: Proc. of ESWC, pp 124–138 (2011)

  31. 31.

    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the Web of data. In: Proc. of ISWC, pp 650–665 (2009)

  32. 32.

    Wang, Z., Li, J., Wang, Z., Li, S., Li, M., Zhang, D., Shi, Y., Liu, Y., Zhang, P., Tang, J.: XLore: A large-scale english-chinese bilingual knowledge graph. In: Proc. of ISWC (Posters & Demos), vol. 1035, pp 121–124 (2013)

  33. 33.

    Werbos, P.J.: Backpropagation through time: What it does and how to do it? Proc. IEEE 78(10), 1550–1560 (1990)

    Article  Google Scholar 

  34. 34.

    Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proc. of CIKM, pp 41–50 (2007)

  35. 35.

    Wu, T., Qi, G., Li, C., Wang, M.: A survey of techniques for constructing Chinese knowledge graphs and their applications. Sustainability 10(9), 3245 (2018)

    Article  Google Scholar 

  36. 36.

    Wu, T., Qi, G., Luo, B., Zhang, L., Wang, H.: Language-independent type inference of the instances from multilingual wikipedia. Int. J. Semantic Web Inf. Syst. 15(2), 22–46 (2019)

    Article  Google Scholar 

  37. 37.

    Xu, B., Xu, Y., Liang, J., Xie, C., Liang, B., Cui, W., Xiao, Y.: CN-DBpedia: A never-ending chinese knowledge extraction system. In: Proc. of IEA/AIE, pp 428–438 (2017)

  38. 38.

    Zhang, Z., Gentile, A.L., Blomqvist, E., Augenstein, I., Ciravegna, F.: Statistical knowledge patterns: Identifying synonymous relations in large linked datasets. In: Proc. of ISWC, Part I, pp 703–719 (2013)

  39. 39.

    Zhu, H., Xie, R., Liu, Z., Sun, M.: Iterative entity alignment via joint knowledge embeddings. In: Proc. of IJCAI, pp 4258–4264 (2017)

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (2017YFB1002801, 2018YFC0830200), National Natural Science Foundation of China Key Project (U1736204), and the Judicial Big Data Research Centre, School of Law at Southeast University.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Guilin Qi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Application-Driven Knowledge Acquisition

Guest Editors: Xue Li, Sen Wang, and Bohan Li

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, T., Wang, H., Li, C. et al. Knowledge graph construction from multiple online encyclopedias. World Wide Web 23, 2671–2698 (2020). https://doi.org/10.1007/s11280-019-00719-4

Download citation

Keywords

  • Knowledge graph
  • Knowledge extraction
  • Knowledge linking
  • Semantic Web