Advertisement

World Wide Web

, Volume 22, Issue 4, pp 1855–1885 | Cite as

SmartVote: a full-fledged graph-based model for multi-valued truth discovery

  • Xiu Susie FangEmail author
  • Quan Z. Sheng
  • Xianzhi Wang
  • Dianhui Chu
  • Anne H. H. Ngu
Article
  • 114 Downloads

Abstract

In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

Keywords

Truth discovery Multi-valued objects Graph-based model Source relations Object popularity Long-tail phenomenon 

Notes

Acknowledgements

Quan Z. Sheng’s research has been partially supported by Australian Research Council (ARC) Future Fellowship FT140101247 and Discovery Project Grant DP180102378. Dianhui Chu’s research has been partially supported by National Science Foundation of China (NSFC, No 61772159). The authors would like to thank the anonymous reviewers for their valuable feedback on this work.

References

  1. 1.
    Benslimane, D., et al.: The uncertain Web: concepts, challenges, and current solutions. ACM Transactions on Internet Technology (TOIT) 16(1), 1 (2015)CrossRefGoogle Scholar
  2. 2.
    Bleiholder, J., Naumann, F.: Conflict handling strategies in an integrated information system. In: Proceedings of the Intelligence Workshop on Information Integration on the Web (IIWeb) (2006)Google Scholar
  3. 3.
    Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1–41 (2009)CrossRefGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  5. 5.
    Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endowment 3(1-2), 1358–1369 (2010)CrossRefGoogle Scholar
  6. 6.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endowment 2(1), 550–561 (2009)CrossRefGoogle Scholar
  7. 7.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endowment 2(1), 562–573 (2009)CrossRefGoogle Scholar
  8. 8.
    Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)CrossRefGoogle Scholar
  9. 9.
    Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. Proc. VLDB Endowment 6(2), 37–48 (2012)CrossRefGoogle Scholar
  10. 10.
    Dong, X.L., et al.: From data fusion to knowledge fusion. Proc. VLDB Endowment 7(10), 881–892 (2014)CrossRefGoogle Scholar
  11. 11.
    Dong, X.L., et al.: Knowledge vault: a Web-scale approach to probabilistic knowledge fusion. In: Proceedings of the ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 601–610 (2014)Google Scholar
  12. 12.
    Fan, W.: Data quality: theory and practice. In: Web-Age Information Management, pp 1–16 (2012)Google Scholar
  13. 13.
    Fan, W., et al.: Data quality problems beyond consistency and duduplication. In: Search of Elegance in the Theory and Practice of Computation, pp 237–249 (2013)Google Scholar
  14. 14.
    Fang, X.S.: Generating actionable knowledge from big data. In: Proceedings of the 2015 SIGMOD Phd Symposium (SIGMOD), pp 3–8 (2015)Google Scholar
  15. 15.
    Fang, X.S., Sheng, Q.Z., Wang, X., Ngu, A.H.: Value veracity estimation for multi-truth ojbects via a graph-based approach. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 217–226 (2017)Google Scholar
  16. 16.
    Fang, X.S., Wang, X., Sheng, Q.Z.: Ontology augmentation via attribute extraction from multiple types of sources. In: Proceedings of the 26Th Australasian Database Conference (ADC), pp 16–27 (2015)Google Scholar
  17. 17.
    Galland, A., et al.: Corroborating information from disagreeing views. In: Proceedings of the ACM Intelligence Conference on Web Search and Data Mining (WSDM), pp 131–140 (2010)Google Scholar
  18. 18.
    Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)CrossRefGoogle Scholar
  19. 19.
    Gleich, D.F., et al.: Tracking the random surfer: empirically measured teleportation parameters in pagerank. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 381–390 (2010)Google Scholar
  20. 20.
    Gwet, K.L.: Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Adv. Anal. LLC 4, 57–64 (2014)Google Scholar
  21. 21.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endowment 8(4), 425–436 (2014)CrossRefGoogle Scholar
  23. 23.
    Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings ACM SIGMOD Intelligence Conference on Management of Data, pp 1187–1198 (2014)Google Scholar
  24. 24.
    Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved?. Proc. VLDB Endowment 6(2), 97–108 (2012)CrossRefGoogle Scholar
  25. 25.
    Li, X., et al.: Scaling up copy detection. In: IEEE Intelligence Conference on Data Engineering (ICDE), pp 89–100 (2015)Google Scholar
  26. 26.
    Li, Y., et al.: A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17(2), 1–16 (2016)CrossRefGoogle Scholar
  27. 27.
    Liu, X., et al.: Online data fusion. Proc. VLDB Endowment 4(11), 932–943 (2011)Google Scholar
  28. 28.
    Mukherjee, S., et al.: People on drugs: credibility of user statements in health communities. In: ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 65–74 (2014)Google Scholar
  29. 29.
    Naumann, F., et al.: Data fusion in three steps: resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)Google Scholar
  30. 30.
    Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of Intelligent Conference on Computational Linguistics (COLING), pp 877–885 (2010)Google Scholar
  31. 31.
    Pochampally, R., et al.: Fusing data with correlations. In: Proceedings of the ACM SIGMOD Intelligent Conference on Management of Data, pp 433–444 (2014)Google Scholar
  32. 32.
    Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Where the truth lies: explaining the credibility of emerging claims on the Web and social media. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 1003–1012 (2017)Google Scholar
  33. 33.
    Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1176–1185 (2014)Google Scholar
  34. 34.
    Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. arXiv:1409.6428 (2014)
  35. 35.
    Wan, M., et al.: From truth discovery to trustworthy opinion discovery: an uncertainty-aware quantitative modeling approach. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1885–1894 (2016)Google Scholar
  36. 36.
    Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings the 24th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 493–502 (2015)Google Scholar
  37. 37.
    Wang, X., et al: Empowering truth discovery with multi-truth prediction. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 881–890 (2016)Google Scholar
  38. 38.
    Wang, X., et al.: Truth discovery via exploiting implications from multi-source data. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 861–870 (2016)Google Scholar
  39. 39.
    Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1935–1944 (2016)Google Scholar
  40. 40.
    Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp 1925–1934 (2016)Google Scholar
  41. 41.
    Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering (TKDE) 20(6), 796–808 (2008)CrossRefGoogle Scholar
  42. 42.
    Yin, X., et al.: Semi-supervised truth discovery. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 217–226 (2011)Google Scholar
  43. 43.
    Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings Intelligent Conference on Computational Linguistics (COLING), pp 1567–1578 (2014)Google Scholar
  44. 44.
    Zhang, H., Li, Q., Ma, F., Xiao, H., Li, Y., Gao, J., Su, L.: Influence-aware truth discovery. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 851–860 (2016)Google Scholar
  45. 45.
    Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of the Intelligent Workshop on Quality in Databases (QDB), Coheld with VLDB (2012)Google Scholar
  46. 46.
    Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endowment 5(6), 550–561 (2012)CrossRefGoogle Scholar
  47. 47.
    Zhi, S., Zhao, B., Tong, W., Gao, J., Yu, D., Ji, H., Han, J.: Modeling truth existence in truth discovery. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1543–1552 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of ComputingMacquarie UniversitySydneyAustralia
  2. 2.Living Analytics Research Center, School of Information SystemsSingapore Management UniversitySingaporeSingapore
  3. 3.School of Computer Science and TechnologyHarbin Institute of Technology (Weihai)WeihaiChina
  4. 4.Department of Computer ScienceTexas State UniversitySan MarcosUSA

Personalised recommendations