Skip to main content

An Ensemble Approach for Better Truth Discovery

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Abstract

Truth discovery is a hot research topic in the Big Data era, with the goal of identifying true values from the conflicting data provided by multiple sources on the same data items. Previously, many methods have been proposed to tackle this issue. However, none of the existing methods is a clear winner that consistently outperforms the others due to the varied characteristics of different methods. In addition, in some cases, an improved method may not even beat its original version as a result of the bias introduced by limited ground truths or different features of the applied datasets. To realize an approach that achieves better and robust overall performance, we propose to fully leverage the advantages of existing methods by extracting truth from the prediction results of these existing truth discovery methods. In particular, we first distinguish between the single-truth and multi-truth discovery problems and formally define the ensemble truth discovery problem. Then, we analyze the feasibility of the ensemble approach, and derive two models, i.e., serial model and parallel model, to implement the approach, and to further tackle the above two types of truth discovery problems. Extensive experiments over three large real-world datasets and various synthetic datasets demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    If a source claims value(s) for a certain object, it implicitly votes against other candidate values of this object.

  2. 2.

    Hereafter we call the revised methods the modified single-truth discovery methods.

  3. 3.

    Such values are then normalized to represent probabilities.

  4. 4.

    We chose this order because it is the increasing order of precision of these four methods performed on three real-world datasets in [15].

  5. 5.

    Random ground truth distribution per source means the number of true positive claims per source is random.

  6. 6.

    80-pessimistic ground truth distribution per source means 80 % of the sources provide 20 % true positive claims, while 20 % of the sources provide 80 % true positive claims.

References

  1. Berti-Equille, L.: Data veracity estimation with ensembling truth discovery methods. In: IEEE Big Data Workshop on Data Quality Issues in Big Data (2015)

    Google Scholar 

  2. Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems (MCS 2000), Cagliari, Italy (2000)

    Google Scholar 

  3. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)

    Article  MATH  Google Scholar 

  4. Dong, X.L., et al.: From data fusion to knowledge fusion. In: Proceedings of the 40th International Conference on Very Large Data Bases (VLDB 2014), Hangzhou, China (2014)

    Google Scholar 

  5. Dong, X.L., et al.: Integrating conflicting data: the role of source dependence. VLDB Endowment (PVLDB) 2(1), 550–561 (2009)

    Article  Google Scholar 

  6. Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), New York, NY, USA (2010)

    Google Scholar 

  7. Goasdoué, F., et al.: Fact checking and analyzing the web. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), New York, NY, USA (2013)

    Google Scholar 

  8. Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD 2014), Snowbird, Utah, USA (2014)

    Google Scholar 

  9. Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. VLDB Endowment (PVLDB) 8(4), 425–436 (2015)

    Article  Google Scholar 

  10. Li, X., et al.: Truth finding on the deep web: is the problem solved? VLDB Endowment (PVLDB) 6(2), 97–108 (2013)

    Article  Google Scholar 

  11. Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. ACM SIGKDD Explor. Newsl. (2016)

    Google Scholar 

  12. Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Stroudsburg, PA, USA (2010)

    Google Scholar 

  13. Pasternack, J., Roth, D.: Latent credibility analysis. In: Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil (2013)

    Google Scholar 

  14. Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. CoRR abs/1409.6428 (2014)

    Google Scholar 

  15. Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia (2015)

    Google Scholar 

  16. Yin, X., Tan, W.: Semi-supervised truth discovery. In: Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India (2011)

    Google Scholar 

  17. Yin, X., et al.: Truth discovery with multiple conflicting information providers on the web. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), San Jose, California, USA (2007)

    Google Scholar 

  18. Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings of the International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland (2014)

    Google Scholar 

  19. Zhao, B., et al.: A Bayesian approach to discovering truth from conflicting sources for data integration. VLDB Endowment (PVLDB) 5(6), 550–561 (2012)

    Article  Google Scholar 

  20. Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of 10th International Workshop on Quality in Databases (QDB 2012), Instanbul, Turkey (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiu Susie Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fang, X.S., Sheng, Q.Z., Wang, X. (2016). An Ensemble Approach for Better Truth Discovery. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49586-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49585-9

  • Online ISBN: 978-3-319-49586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics