SmartVote: a full-fledged graph-based model for multi-valued truth discovery

Fang, Xiu Susie; Sheng, Quan Z.; Wang, Xianzhi; Chu, Dianhui; Ngu, Anne H. H.

doi:10.1007/s11280-018-0629-3

SmartVote: a full-fledged graph-based model for multi-valued truth discovery

Published: 22 August 2018

Volume 22, pages 1855–1885, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Xiu Susie Fang ORCID: orcid.org/0000-0003-1527-9270¹,
Quan Z. Sheng¹,
Xianzhi Wang²,
Dianhui Chu³ &
…
Anne H. H. Ngu⁴

461 Accesses
5 Citations
Explore all metrics

Abstract

In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new truth discovery method for resolving object conflicts over Linked Data with scale-free property

Article 03 May 2018

Wenqiang Liu, Jun Liu, … Wei Hu

A Effective Truth Discovery Algorithm with Multi-source Sparse Data

Multi-truth Discovery with Correlations of Candidates in Crowdsourcing Systems

Notes

In this paper we focus on the parent-children relation in the dataset because it corresponds to multi-valued objects.
Note that this probability is based on a prior knowledge that s₁ and s₂ each provides a true value, which is different from the probability of two sources s₁ and s₂ independently provide the same true value.
Here we neglect the smoothing links, if there is no common value between two sources, there is no link between them in the graphs.
We neglect the confidence scores of each source and omit the dependence score normalization step in this example.
https://hama.apache.org/
Such values are then normalized to represent probabilities.
For Voting^∗, we predict the number of true values as the number with the highest vote counts.
Note that there are overlaps among those categories. For example, Investment belongs to both Web-link based methods and iterative methods.

References

Benslimane, D., et al.: The uncertain Web: concepts, challenges, and current solutions. ACM Transactions on Internet Technology (TOIT) 16(1), 1 (2015)
Article Google Scholar
Bleiholder, J., Naumann, F.: Conflict handling strategies in an integrated information system. In: Proceedings of the Intelligence Workshop on Information Integration on the Web (IIWeb) (2006)
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1–41 (2009)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endowment 3(1-2), 1358–1369 (2010)
Article Google Scholar
Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endowment 2(1), 550–561 (2009)
Article Google Scholar
Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endowment 2(1), 562–573 (2009)
Article Google Scholar
Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)
Article Google Scholar
Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. Proc. VLDB Endowment 6(2), 37–48 (2012)
Article Google Scholar
Dong, X.L., et al.: From data fusion to knowledge fusion. Proc. VLDB Endowment 7(10), 881–892 (2014)
Article Google Scholar
Dong, X.L., et al.: Knowledge vault: a Web-scale approach to probabilistic knowledge fusion. In: Proceedings of the ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 601–610 (2014)
Fan, W.: Data quality: theory and practice. In: Web-Age Information Management, pp 1–16 (2012)
Fan, W., et al.: Data quality problems beyond consistency and duduplication. In: Search of Elegance in the Theory and Practice of Computation, pp 237–249 (2013)
Fang, X.S.: Generating actionable knowledge from big data. In: Proceedings of the 2015 SIGMOD Phd Symposium (SIGMOD), pp 3–8 (2015)
Fang, X.S., Sheng, Q.Z., Wang, X., Ngu, A.H.: Value veracity estimation for multi-truth ojbects via a graph-based approach. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 217–226 (2017)
Fang, X.S., Wang, X., Sheng, Q.Z.: Ontology augmentation via attribute extraction from multiple types of sources. In: Proceedings of the 26Th Australasian Database Conference (ADC), pp 16–27 (2015)
Galland, A., et al.: Corroborating information from disagreeing views. In: Proceedings of the ACM Intelligence Conference on Web Search and Data Mining (WSDM), pp 131–140 (2010)
Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)
Article Google Scholar
Gleich, D.F., et al.: Tracking the random surfer: empirically measured teleportation parameters in pagerank. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 381–390 (2010)
Gwet, K.L.: Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Adv. Anal. LLC 4, 57–64 (2014)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endowment 8(4), 425–436 (2014)
Article Google Scholar
Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings ACM SIGMOD Intelligence Conference on Management of Data, pp 1187–1198 (2014)
Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved?. Proc. VLDB Endowment 6(2), 97–108 (2012)
Article Google Scholar
Li, X., et al.: Scaling up copy detection. In: IEEE Intelligence Conference on Data Engineering (ICDE), pp 89–100 (2015)
Li, Y., et al.: A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17(2), 1–16 (2016)
Article Google Scholar
Liu, X., et al.: Online data fusion. Proc. VLDB Endowment 4(11), 932–943 (2011)
Google Scholar
Mukherjee, S., et al.: People on drugs: credibility of user statements in health communities. In: ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 65–74 (2014)
Naumann, F., et al.: Data fusion in three steps: resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)
Google Scholar
Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of Intelligent Conference on Computational Linguistics (COLING), pp 877–885 (2010)
Pochampally, R., et al.: Fusing data with correlations. In: Proceedings of the ACM SIGMOD Intelligent Conference on Management of Data, pp 433–444 (2014)
Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Where the truth lies: explaining the credibility of emerging claims on the Web and social media. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 1003–1012 (2017)
Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1176–1185 (2014)
Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. arXiv:1409.6428 (2014)
Wan, M., et al.: From truth discovery to trustworthy opinion discovery: an uncertainty-aware quantitative modeling approach. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1885–1894 (2016)
Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings the 24th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 493–502 (2015)
Wang, X., et al: Empowering truth discovery with multi-truth prediction. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 881–890 (2016)
Wang, X., et al.: Truth discovery via exploiting implications from multi-source data. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 861–870 (2016)
Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1935–1944 (2016)
Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp 1925–1934 (2016)
Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering (TKDE) 20(6), 796–808 (2008)
Article Google Scholar
Yin, X., et al.: Semi-supervised truth discovery. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 217–226 (2011)
Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings Intelligent Conference on Computational Linguistics (COLING), pp 1567–1578 (2014)
Zhang, H., Li, Q., Ma, F., Xiao, H., Li, Y., Gao, J., Su, L.: Influence-aware truth discovery. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 851–860 (2016)
Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of the Intelligent Workshop on Quality in Databases (QDB), Coheld with VLDB (2012)
Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endowment 5(6), 550–561 (2012)
Article Google Scholar
Zhi, S., Zhao, B., Tong, W., Gao, J., Yu, D., Ji, H., Han, J.: Modeling truth existence in truth discovery. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1543–1552 (2015)

Download references

Acknowledgements

Quan Z. Sheng’s research has been partially supported by Australian Research Council (ARC) Future Fellowship FT140101247 and Discovery Project Grant DP180102378. Dianhui Chu’s research has been partially supported by National Science Foundation of China (NSFC, No 61772159). The authors would like to thank the anonymous reviewers for their valuable feedback on this work.

Author information

Authors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, 2109, Australia
Xiu Susie Fang & Quan Z. Sheng
Living Analytics Research Center, School of Information Systems, Singapore Management University, 178902, Singapore, Singapore
Xianzhi Wang
School of Computer Science and Technology, Harbin Institute of Technology (Weihai), Weihai, China
Dianhui Chu
Department of Computer Science, Texas State University, San Marcos, TX, 78666, USA
Anne H. H. Ngu

Authors

Xiu Susie Fang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dianhui Chu
View author publications
You can also search for this author in PubMed Google Scholar
Anne H. H. Ngu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiu Susie Fang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, X.S., Sheng, Q.Z., Wang, X. et al. SmartVote: a full-fledged graph-based model for multi-valued truth discovery. World Wide Web 22, 1855–1885 (2019). https://doi.org/10.1007/s11280-018-0629-3

Download citation

Received: 21 July 2017
Revised: 20 May 2018
Accepted: 30 July 2018
Published: 22 August 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11280-018-0629-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SmartVote: a full-fledged graph-based model for multi-valued truth discovery

Abstract

Access this article

Similar content being viewed by others

A new truth discovery method for resolving object conflicts over Linked Data with scale-free property

A Effective Truth Discovery Algorithm with Multi-source Sparse Data

Multi-truth Discovery with Correlations of Candidates in Crowdsourcing Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SmartVote: a full-fledged graph-based model for multi-valued truth discovery

Abstract

Access this article

Similar content being viewed by others

A new truth discovery method for resolving object conflicts over Linked Data with scale-free property

A Effective Truth Discovery Algorithm with Multi-source Sparse Data

Multi-truth Discovery with Correlations of Candidates in Crowdsourcing Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation