Advertisement

Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data

  • Wenqiang LiuEmail author
  • Jun Liu
  • Haimeng Duan
  • Wei Hu
  • Bifan Wei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)

Abstract

Considerable effort has been exerted to increase the scale of Linked Data. However, an inevitable problem arises when dealing with data integration from multiple sources. Various sources often provide conflicting objects for a certain predicate of the same real-world entity, thereby causing the so-called object conflict problem. At present, object conflict problem has not received sufficient attention in the Linked Data community. Thus, in this paper, we firstly formalize the object conflict resolution as computing the joint distribution of variables on a heterogeneous information network called the Source-Object Network, which successfully captures three correlations from objects and Linked Data sources. Then, we introduce a novel approach based on network effects called ObResolution (object resolution), to identify a true object from multiple conflicting objects. ObResolution adopts a pairwise Markov Random Field (pMRF) to model all evidence under a unified framework. Extensive experimental results on six real-world datasets show that our method achieves higher accuracy than existing approaches and it is robust and consistent in various domains.

Keywords

Linked Data quality Object conflicts Truth discovery 

Notes

Acknowledgments

This work is funded by the National Key Research and Development Program of China (Grant No. 2016YFB1000903), the MOE Research Program for Online Education (Grant No. 2016YB166) and the National Science Foundation of China (Grant Nos. 61370019, 61672419, 61672418, 61532004, 61532015).

References

  1. 1.
    Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)Google Scholar
  2. 2.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. PVLDB 2(1), 550–561 (2009). Lyon, FranceGoogle Scholar
  3. 3.
    Dong, X.L., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge-based trust: Estimating the trustworthiness of web sources. PVLDB 8(9), 938–949 (2015). Hawaii, USAGoogle Scholar
  4. 4.
    Dutta, A., Meilicke, C., Ponzetto, S.P.: A probabilistic approach for integrating heterogeneous knowledge sources. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 286–301. Springer, Cham (2014). doi: 10.1007/978-3-319-07443-6_20CrossRefGoogle Scholar
  5. 5.
    Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)CrossRefGoogle Scholar
  6. 6.
    Käfer, T., Harth, A.: Billion Triples Challenge data set (2014). http://km.aifb.kit.edu/projects/btc-2014/
  7. 7.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? PVLDB 6(2), 97–108 (2012). Istanbul, TurkeyGoogle Scholar
  9. 9.
    Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. PVLDB 8(4), 425–436 (2014). Hangzhou, ChinaGoogle Scholar
  10. 10.
    Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J.: On the discovery of evolving truth. In: KDD, Sydney, Australia, pp. 675–684 (2015)Google Scholar
  11. 11.
    Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: SIGMOD, Utah, USA, pp. 1187–1198 (2014)Google Scholar
  12. 12.
    Liu, W., Liu, J., Zhang, J., Duan, H., Wei, B.: Truthdiscover: a demonstration of resolving object conflicts on massive linked data. In: WWW, Perth, Australia (2017)Google Scholar
  13. 13.
    Manola, F., Miller, E., McBride, B.: RDF 1.1 primer. http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/
  14. 14.
    Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: EDBT/ICDT Berlin, Germany, pp. 116–123 (2012)Google Scholar
  15. 15.
    Michelfeit, J., Knap, T., Nečaskỳ, M.: Linked data integration with conflicts. arXiv preprint arXiv:1410.7990 (2014)
  16. 16.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  17. 17.
    Rayana, S., Akoglu, L.: Collective opinion spam detection: bridging review networks and metadata. In: KDD, Melbourne, Australia, pp. 985–994 (2015)Google Scholar
  18. 18.
    Vydiswaran, V., Zhai, C., Roth, D.: Content-driven trust propagation framework. In: KDD, San Diego, USA, pp. 974–982 (2011)Google Scholar
  19. 19.
    Wang, H., Fang, Z., Zhang, L., Pan, J.Z., Ruan, T.: Effective online knowledge graph fusion. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 286–302. Springer, Cham (2015). doi: 10.1007/978-3-319-25007-6_17CrossRefGoogle Scholar
  20. 20.
    Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)CrossRefGoogle Scholar
  21. 21.
    Yin, X., Tan, W.: Semi-supervised truth discovery. In: WWW, Lyon, France, pp. 217–226 (2011)Google Scholar
  22. 22.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S., Hitzler, P.: Quality assessment methodologies for linked open data. Semant. Web J. 7, 63–93 (2013)CrossRefGoogle Scholar
  23. 23.
    Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB 5(6), 550–561 (2012). Istanbul, TurkeyGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Wenqiang Liu
    • 1
    Email author
  • Jun Liu
    • 1
  • Haimeng Duan
    • 1
  • Wei Hu
    • 2
  • Bifan Wei
    • 1
  1. 1.MOEKLINNS LabXi’an Jiaotong UniversityXi’anChina
  2. 2.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations