Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML

  • M. Lamine Ba
  • Sebastien Montenez
  • Ruiming Tang
  • Talel Abdessalem
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8505)


We study in this vision paper the problem of integrating several web data sources under uncertainty and dependencies. We present a concrete application with web sources about objects in the maritime domain where uncertainties and dependencies are omnipresent. Uncertainties are mainly caused by imprecise information trackers and imperfect human knowledge. Dependencies come from the recurrent copying relationships occurring among the sources. We answer the issue of data integration in such a setting by reformulating it as the merge of several uncertain versions of the same global XML document. As an initial result, we put forward a probabilistic XML data integration model by getting some intuitions from the versioning model with uncertain data we proposed in [5]. We explain how this model can be used for materializing the integration outcome.


Data Integration Shared Object Unordered Tree Edit Script Maritime Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are grateful to Pierre Senellart and Stephane Bressan for their precious remarks and suggestions. This work was partially funded by the NORMATIS project, and the French government under the STIC-Asia program, CCIPX project.


  1. 1.
    Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. 18, 1041–1064 (2009)CrossRefGoogle Scholar
  2. 2.
    Agrawal, P., Sarma, A.D., Ullman, J., Widom, J.: Foundations of uncertain-data integration. Proc. VLDB Endow. 3, 1080–1090 (2010)CrossRefGoogle Scholar
  3. 3.
    Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P.: An uncertain data integration system. In: Meersman, R., et al. (eds.) OTM 2012, Part II. LNCS, vol. 7566, pp. 825–842. Springer, Heidelberg (2012) Google Scholar
  4. 4.
    Ba, M.L., Abdessalem, T., Senellart, P.: Merging uncertain multi-version XML documents. In: Proceedings of DChanges, Florence, Italy (2013)Google Scholar
  5. 5.
    Ba, M.L., Abdessalem, T., Senellart, P.: Uncertain version control in open collaborative editing of tree-structured documents. In: Proceedings of Document Engineering (2013)Google Scholar
  6. 6.
    Cobena, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. In: BDA (2002)Google Scholar
  7. 7.
    Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of SIGMOD (2008)Google Scholar
  8. 8.
    Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of VLDB (2007)Google Scholar
  9. 9.
    Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1358–1369 (2010)CrossRefGoogle Scholar
  10. 10.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endow. 2, 550–561 (2009)CrossRefGoogle Scholar
  11. 11.
    Kharlamov, E., Nutt, W., Senellart, P.: Updating probabilistic xml. In: Proceedings of EDBT/ICDT Workshops (2010)Google Scholar
  12. 12.
    Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. In: Ma, Z., Yan, L. (eds.) Advances in Probabilistic Databases for Uncertain Information Management. Springer, Heidelberg (2013)Google Scholar
  13. 13.
    Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep Web: is the problem solved? In: Proceedings of VLDB, Sept 2013Google Scholar
  14. 14.
    Lindholm, T., Kangasharju, J., Tarkoma, S.: Fast and simple XML tree differencing by sequence alignment. In: Proceedings on Document Engineering (2006)Google Scholar
  15. 15.
    Peters, L.: Change detection in XML trees: a survey. In: TSIT Conference (2005)Google Scholar
  16. 16.
    van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18, 1191–1217 (2009)CrossRefGoogle Scholar
  17. 17.
    van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of ICDE (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • M. Lamine Ba
    • 1
  • Sebastien Montenez
    • 1
  • Ruiming Tang
    • 2
  • Talel Abdessalem
    • 1
  1. 1.Institut Mines-Télécom, Télécom-ParisTechLTCIParisFrance
  2. 2.National University of SingaporeSingaporeSingapore

Personalised recommendations