Skip to main content

Conflict-Aware Historical Data Fusion

  • Conference paper
Scalable Uncertainty Management (SUM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6929))

Included in the following conference series:

Abstract

Historical data reports on numerous events for overlapping time intervals, locations, and names. As a result, it may include severe data conflicts caused by database redundancy that prevent researchers from obtaining the correct answers to queries on an integrated historical database. In this paper, we propose a novel conflict-aware data fusion strategy for historical data sources. We evaluated our approach on a large-scale data warehouse that integrates historical data from approximately 50,000 reports on US epidemiological data for more than 100 years. We demonstrate that our approach significantly reduces data aggregation error in the integrated historical database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati, F., Kolaitis, P.: Repair Checking in Inconsistent Databases: Algorithms and Complexity. In: Proc. of ICDT (2009)

    Google Scholar 

  2. Agarwal, S., Keller, A., Wiederhold, G., Saraswat, K.: Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In: Proc. of ICDE (1995)

    Google Scholar 

  3. Arenas, M., Bertossi, L., Chomicki, J.: Specifying and Querying Database Repairs using Logic Programs with Exceptions. In: Proc. of FQAS (2000)

    Google Scholar 

  4. Bernstein, P., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: Proc. of ACM SIGMOD (2007)

    Google Scholar 

  5. Bertossi, L.: Consistent Query Answering in Databases. ACM SIGMOD Record 35(2) (2006)

    Google Scholar 

  6. Bertossi, L., Chomicki, J.: Query Answering in Inconsistent Databases. In: Logics for Emerging Applications of Databases. Springer, Heidelberg (2003)

    Google Scholar 

  7. Bleiholder, J., Naumann, F.: Data Fusion. ACM Computing Surveys 41(1) (2008)

    Google Scholar 

  8. Bohannon, P., Flaster, M., Fan, W., Rastorgi, R.: A Cost-based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: Proc. of ACM SIGMOD (2005)

    Google Scholar 

  9. Brodie, M.: Data Integration at Scale: From Relational Data Integration to Information Ecosystems. In: Proc. of AINA (2010)

    Google Scholar 

  10. Brodie, M.: Data Management Challenges in Very Large Enterprises. In: Proc. of VLDB (2002)

    Google Scholar 

  11. Bry, F.: Query Answering in Information Systems with Integrity Constraints. In: Proc. of IICIS (1997)

    Google Scholar 

  12. Caroprese, L., Greco, S.: Active Integrity Constraints for Database Consistency Maintenance. IEEE TKDE 21(7) (2009)

    Google Scholar 

  13. Chomicki, J., Staworko, S., Marcinkowski, J.: Computing Consistent Query Answers Using Conflict Hypergraph. In: Proc. of CIKM (2004)

    Google Scholar 

  14. Date, J., Darwen, H., Lorentzos: Temporal Data and the Relational Model. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  15. Dong, X., Naumann, F.: Data Fusion - Resolving Data Conflicts for Integration. In: PVLDB, vol. 2(2) (2009)

    Google Scholar 

  16. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate Record Detection: A Survey. IEEE TKDE 19(1) (2007)

    Google Scholar 

  17. Flesca, S., Furfaro, F., Parisi, F.: Querying and Repairing Inconsistent Numerical Databases. ACM TODS 35(2) (2010)

    Google Scholar 

  18. Flesca, S., Furfaro, F., Parisi, F.: Consistent Query Answers on Numerical Databases Under Aggregate Constraints. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 279–294. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Fagin, R., Kolaitis, P., Popa, L.: Data Exchange: Getting to the Core. ACM TODS 30(1) (2005)

    Google Scholar 

  20. Haas, L.: Beauty and the Beast: The Theory and Practice of Information Integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Imelinski, T., Lipski, W.: Incomplete Information in Relational Databases. Journal of ACM 31(4) (1984)

    Google Scholar 

  22. Jensen, C., Snograss, R.: Temporal Data Management. IEEE TKDE 11(1) (1999)

    Google Scholar 

  23. Kay, S.: Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Englewood Cliffs (1993)

    MATH  Google Scholar 

  24. Rahm, E., Bernstein, P.: A Survey of Approaches to Automatic Schema Matching. The VLDB Journal 10(4) (2001)

    Google Scholar 

  25. Senn, S.: Overstating the Evidence - Double Counting in Meta-analysis and Related Problems. BMC Medical Research Methodology 9(10) (2009)

    Google Scholar 

  26. Snodgrass, R.: Developing Time-oriented Database Applications in SQL. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  27. Staworko, S., Chomicki, J.: Consistent Query Answers in the Presence of Universal Constraints. Inf. Syst. 35(1) (2010)

    Google Scholar 

  28. Wijsen, J.: Consistent Query Answering under Primary Keys: A Characterization of Tractable Queries. In: Proc. of ICDT (2009)

    Google Scholar 

  29. Wijsen, J.: Database repairing using updates. ACM TODS 30(3) (2005)

    Google Scholar 

  30. Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth Discovery and Copying Detection in a Dynamic World. In: PVLDB, vol. 2(1) (2009)

    Google Scholar 

  31. Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating Conflicting Data: The Role of Source Dependence. In: PVLDB, vol. 2(1) (2009)

    Google Scholar 

  32. Yin, X., Han, J., Yu, P.: Truth Discovery with Multiple Conflicting Information Provided on the Web. In: Proc. of SIGKDD (2007)

    Google Scholar 

  33. Zadorozhny, V., Raschid, L., Gal, A.: Scalable Catalog Infrastructure for Managing Access Costs and Source Selection in Wide Area Networks. International Journal of Cooperative Information Systems 17(1) (2008)

    Google Scholar 

  34. Zadorozhny, V., Gal, A., Raschid, L., Ye, Q.: AReNA: Adaptive Distributed Catalog Infrastructure Based On Relevance Networks. In: Proc. of VLDB (2005)

    Google Scholar 

  35. Zadorozhny, V., Bright, L., Vidal, M.E., Raschid, L., Urhan, T.: Efficient Evaluation of Queries in a Mediator for WebSources. In: Proc. of ACM SIGMOD (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zadorozhny, V., Hsu, YF. (2011). Conflict-Aware Historical Data Fusion. In: Benferhat, S., Grant, J. (eds) Scalable Uncertainty Management. SUM 2011. Lecture Notes in Computer Science(), vol 6929. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23963-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23963-2_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23962-5

  • Online ISBN: 978-3-642-23963-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics