An Uncertain Data Integration System

  • Naser Ayat
  • Hamideh Afsarmanesh
  • Reza Akbarinia
  • Patrick Valduriez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7566)


Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. An important task in setting up a data integration system is to match the attributes of the source schemas. In this paper, we propose a data integration system which uses the knowledge implied within functional dependencies for matching the source schemas. We build our system on a probabilistic data model to capture the uncertainty arising during the matching process. Our performance validation confirms the importance of functional dependencies and also using a probabilistic data model in improving the quality of schema matching. Our experimental results show significant performance gain compared to the baseline approaches. They also show that our system scales well.


data integration schema matching uncertain data integration functional dependency 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer (2011)Google Scholar
  2. 2.
    Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: Proc. of CIDR (2007)Google Scholar
  3. 3.
    Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. VLDB J. 18(2), 469–500 (2009)CrossRefGoogle Scholar
  4. 4.
    Sarma, A.D., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: Proc. of SIGMOD (2008)Google Scholar
  5. 5.
    Akbarinia, R., Valduriez, P., Verger, G.: Efficient Evaluation of SUM Queries Over Probabilistic Data. TKDE (to appear, 2012)Google Scholar
  6. 6.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)zbMATHCrossRefGoogle Scholar
  7. 7.
    Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: Proc. of WebDB (2009)Google Scholar
  8. 8.
    Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov. 18(2), 257–282 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)zbMATHCrossRefGoogle Scholar
  10. 10.
    Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proc. of IIWeb (2003)Google Scholar
  11. 11.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Bhattacharjee, A., Jamil, H.M.: Ontomatch: A monotonically improving schema matching system for autonomous data integration. In: Proc. of Conference on Information Reuse & Integration (2009)Google Scholar
  13. 13.
    Palopoli, L., Terracina, G., Ursino, D.: Dike: a system supporting the semi-automatic construction of cooperative information systems from heterogeneous databases. Softw. Pract. Exper. 33(9), 847–884 (2003)CrossRefGoogle Scholar
  14. 14.
    Unal, O., Afsarmanesh, H.: Semi-automated schema integration with sasmint. Knowl. Inf. Syst. 23(1) (2010)Google Scholar
  15. 15.
    Biskup, J., Embley, D.W.: Extracting information from heterogeneous information sources using ontologically specified target views. Inf. Syst. 28(3), 169–212 (2003)CrossRefGoogle Scholar
  16. 16.
    Larson, J.A., Navathe, S.B., Elmasri, R.: A theory of attribute equivalence in databases with application to schema integration. IEEE Trans. Software Eng. 15(4), 449–463 (1989)zbMATHCrossRefGoogle Scholar
  17. 17.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. of ICDE (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Naser Ayat
    • 1
  • Hamideh Afsarmanesh
    • 1
  • Reza Akbarinia
    • 2
  • Patrick Valduriez
    • 2
  1. 1.Informatics InstituteUniversity of AmsterdamAmsterdamNetherlands
  2. 2.INRIA and LIRMMMontpellierFrance

Personalised recommendations