Skip to main content

Reconciling Inconsistent Data in Probabilistic XML Data Integration

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5071))

Abstract

The problem of dealing with inconsistent data while integrating XML data from different sources is an important task, necessary to improve data integration quality. Typically, in order to remove inconsistencies, i.e. conflicts between data, data cleaning (or repairing) procedures are applied. In this paper, we present a probabilistic XML data integration setting. A probability is assigned to each data source and its probability models the reliability level of the data source. In this way, an answer (a tuple of values of XML trees) has a probability assigned to it. The problem is how to compute such probability, especially when the same answer is produced by many sources. We consider three semantics for computing such probabilistic answers: by-peer, by-sequence, and by-subtree semantics. The probabilistic answers can be used for resolving a class of inconsistencies violating XML functional dependencies defined over the target schema. Having a probability distribution over a set of conflicting answers, we can choose the one for which the probability of being correct is the highest.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M.: Normalization theory for XML. SIGMOD Record 35(4), 57–64 (2006)

    Article  Google Scholar 

  2. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: PODS, pp. 68–79 (1999)

    Google Scholar 

  3. Arenas, M., Libkin, L.: XML Data Exchange: Consistency and Query Answering. In: PODS Conference, pp. 13–24 (2005)

    Google Scholar 

  4. Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: SIGMOD Conference, pp. 143–154 (2005)

    Google Scholar 

  5. Buneman, P., Davidson, S.B., Fan, W., Hara, C.S., Tan, W.C.: Reasoning about keys for XML. Information Systems 28(8), 1037–1063 (2003)

    Article  Google Scholar 

  6. Dong, X.L., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. In: VLDB, pp. 687–698. ACM, New York (2007)

    Google Scholar 

  7. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing Schema Mappings: Second-Order Dependencies to the Rescue. In: PODS, pp. 83–94 (2004)

    Google Scholar 

  8. Fuxman, A., Fazli, E., Miller, R.J.: ConQuer: Efficient Management of Inconsistent Databases. In: SIGMOD Conference, pp. 155–166 (2005)

    Google Scholar 

  9. Greco, G., Lembo, D.: Data Integration with Preferences Among Sources. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 231–244. Springer, Heidelberg (2004)

    Google Scholar 

  10. Greco, S., Sirangelo, C., Trubitsyna, I., Zumpano, E.: Preferred Repairs for Inconsistent Databases. In: IDEAS 2003, pp. 202–211. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  11. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Popa, L. (ed.) PODS, pp. 233–246. ACM, New York (2002)

    Google Scholar 

  12. Madhavan, J., Halevy, A.Y.: Composing Mappings Among Data Sources. In: VLDB, pp. 572–583 (2003)

    Google Scholar 

  13. Pankowski, T.: XML data integration in SixP2P – a theoretical framework. In: EDBT 2008 Workshop on Data Management in P2P Systems, ACM Digital Library (2008)

    Google Scholar 

  14. Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  15. Staworko, S., Chomicki, J., Marcinkowski, J.: Preference-Driven Querying of Inconsistent Relational Databases. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Mesiti, M., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 318–335. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Taylor, N.E., Ives, Z.G.: Reconciling while tolerating disagreement in collaborative data sharing. In: SIGMOD Conference, pp. 13–24. ACM, New York (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alex Gray Keith Jeffery Jianhua Shao

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pankowski, T. (2008). Reconciling Inconsistent Data in Probabilistic XML Data Integration. In: Gray, A., Jeffery, K., Shao, J. (eds) Sharing Data, Information and Knowledge. BNCOD 2008. Lecture Notes in Computer Science, vol 5071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70504-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70504-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70503-1

  • Online ISBN: 978-3-540-70504-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics