IncompFuse: a logical framework for historical information fusion with inaccurate data sources

  • Jiawei XuEmail author
  • Vladimir Zadorozhny
  • John Grant


We propose a novel framework, called IncompFuse, that significantly improves the accuracy of existing methods for reconstructing aggregated historical data from inaccurate historical reports. IncompFuse supports efficient data reliability assessment using the incompatibility probability of historical reports. We provide a systematic approach to define this probability based on properties of the data and relationships between the reports. Our experimental study demonstrates high utility of the proposed framework. In particular, we were able to detect noisy historical reports with very high detection accuracy.


Inaccurate data sources Incompatibility probability Error detection 



We wish to thank the reviewers for helpful comments and suggestions.


  1. Almutairi, F.M., Yang, F., Song, H.A., Faloutsos, C., Sidiropoulos, N., Zadorozhny, V. (2018). Homerun: scalable sparse-spectrum reconstruction of aggregated historical data. Journal Proceedings of the VLDB Endowment, 11(11), 1496–1508.CrossRefGoogle Scholar
  2. Amazon. (2002). Amazon auctions. [Online].
  3. Askarizade, M., Nematbakhsh, M.A., Davoodi Jam, E. (2012). Data conflict resolution among same entities in web of data. In: 2012 2nd International eConference on Computer and Knowledge Engineering (ICCKE) (pp. 278–282).Google Scholar
  4. Bohannon, P., Fan, W., Flaster, M., Rastogi, R. (2005). A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (pp. 143–154). ACM.Google Scholar
  5. Dong, X.L., Berti-Equille, L., Srivastava, D. (2009). Integrating conflicting data: the role of source dependence. Journal Proceedings of the VLDB Endowment, 2 (1), 550–561.CrossRefGoogle Scholar
  6. Dong, X.L., & Naumann, F. (2009). Data fusion: resolving data conflicts for integration. Journal Proceedings of the VLDB Endowment, 2(2), 1654–1655.CrossRefGoogle Scholar
  7. Dong, X.L., Saha, B., Srivastava, D. (2012). . Less is More:, Selecting Sources Wisely for Integration, 6(2), 37–48.Google Scholar
  8. Galland, A., Abiteboul, S., Marian, A., Senellart, P. (2010). Corroborating information from disagreeing views. In: Proceedings of the third ACM international conference on Web search and data mining (pp. 131–140). ACM.Google Scholar
  9. Grant, J. (1978). Classifications for inconsistent theories. Notre Dame Journal of Formal Logic, 19(3), 435–444.MathSciNetzbMATHCrossRefGoogle Scholar
  10. Grant, J., & Martinez, M.V. (2018). Measuring Inconsistency in Information. College Publications.Google Scholar
  11. Levien, R. (2009). Attack-Resistant Trust Metrics, (pp. 121–132). Berlin: Springer.Google Scholar
  12. Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D. (2012). . Truth Finding on the Deep Web:, Is the Problem Solved?, 6, 97–108.Google Scholar
  13. Liu, Z., Song, H.A., Zadorozhny, V., Faloutsos, C., Sidiropoulos, N. (2017). Hfuse: Efficient fusion of aggregated historical data. In: Proceedings of SIAM International Conference on Data Mining.Google Scholar
  14. Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The pagerank citation ranking: Bringing order to the Web. Report, Stanford InfoLab.Google Scholar
  15. Pasternack, J., & Roth, D. (2010). Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics (pp. 877–885). Association for Computational Linguistics.Google Scholar
  16. Resnick, P., Kuwabara, K., Zeckhauser, R., Friedman, E. (2000). Reputation systems. Communications of the ACM, 43(12), 45–48.CrossRefGoogle Scholar
  17. Sharma, D. (2010). Efficient information access in data-intensive sensor networks. PhD dissertation, University of Pittsburgh.Google Scholar
  18. Staworko, S., & Chomicki, J. (2010). Consistent query answers in the presence of universal constraints. Information Systems, 35(1), 1–22.CrossRefGoogle Scholar
  19. Thimm, M. (2018). On the evaluation of inconsistency measures. In Grant, J., & Martinez, M.V. (Eds.) Measuring Inconsistency in Information. College Publications, London, UK.Google Scholar
  20. Yi, R., Zadorozhny, V., Oleshchuk, V., Li, F. (2014). A novel approach to trust management in unattended wireless sensor networks. IEEE Transactions on Mobile Computing, 13(7), 1409–1423.CrossRefGoogle Scholar
  21. Yin, X., Han, J., Philip, S.Y. (2008). Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering, 20(6), 796–808.CrossRefGoogle Scholar
  22. Yin, X., & Tan, W. (2011). Semi-supervised truth discovery. In: Proceedings of the 20th International Conference on World Wide Web (pp. 217–226). ACM.Google Scholar
  23. Zadorozhny, V., & Grant, J. (2016). A systematic approach to reliability assessment in integrated databases. Journal of Intelligent Information Systems, 46(3), 409–424.CrossRefGoogle Scholar
  24. Zadorozhny, V., & Hsu, Y.-F. (2011). Scalable Uncertainty Management. Fifth International Conference Proceedings. In Benferhat, S., & Grant, J. (Eds.) (pp. 331–345). Berlin: Springer.Google Scholar
  25. Zadorozhny, V., Krishnamurthy, P., Abdelhakim, M., Pelechrinis, K., Xu, J. (2017). Data credence in iot: Vision and challenges. Open Journal of Internet of Things (OJIOT), 3(1), 114–126. Special Issue:, Proceedings of the International Workshop on Very Large Internet of Things (VLIoT 2017) in conjunction with the VLDB 2017 Conference., 3(1):114–126.Google Scholar
  26. Zadorozhny, V., & Lewis, M. (2013). Information fusion for usar operations based on crowdsourcing. In: 2013 16th International Conference on Information Fusion (FUSION) (pp. 1450–1457).Google Scholar
  27. Zadorozhny, V., Manning, P., Bain, D.J., Mostern, R. (2013). . Journal of World-Historical Information: JWHI, 1(1), 1.CrossRefGoogle Scholar
  28. Zadorozhny, V., & Raschid, L. (2007). Alternative path selection in resilient web infrastructure using performance dependencies. Journal of Web Engineering, 6(2), 121–130.Google Scholar
  29. Ziegler, C.-N., & Lausen, G. (2004). Spreading activation models for trust propagation. In: EEE’04. 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004 (pp. 83–97).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Informatics and Networked Systems, School of Computing and InformationUniversity of PittsburghPittsburghUSA
  2. 2.Department of Computer Science and UMIACSUniversity of MarylandCollege ParkUSA

Personalised recommendations