Advertisement

Declarative Data Fusion – Syntax, Semantics, and Implementation

  • Jens Bleiholder
  • Felix Naumann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3631)

Abstract

In today’s integrating information systems data fusion, i.e., the merging of multiple tuples about the same real-world object into a single tuple, is left to ETL tools and other specialized software. While much attention has been paid to architecture, query languages, and query execution, the final step of actually fusing data from multiple sources into a consistent and homogeneous set is often ignored.

This paper states the formal problem of data fusion in relational databases and discusses which parts of the problem can already be solved with standard Sql. To bridge the final gap, we propose the SQL Fuse By statement and define its syntax and semantics. A first implementation of the statement in a prototypical database system shows the usefulness and feasibility of the new operator.

Keywords

Data Fusion Resolution Function Aggregation Function Outer Union Single Tuple 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    van Bercken, J., Blohsfeld, B., Dittrich, J.-P., Krämer, J., Schäfer, T., Schneider, M., Seeger, B.: XXL - a library approach to supporting efficient implementations of advanced database queries. In: Proc. of VLDB 2001, pp. 39–48 (2001)Google Scholar
  2. 2.
    Dayal, U.: Processing queries over generalization hierarchies in a multidatabase system. In: Proc. of VLDB 1983, pp. 342–353 (1983)Google Scholar
  3. 3.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An extensible data cleaning tool. In: Proc. of SIGMOD, p. 590 (2000)Google Scholar
  4. 4.
    Galindo-Legaria, C.: Outerjoins as disjunctions. In: Proc. of SIGMOD, pp. 348–358 (1994)Google Scholar
  5. 5.
    Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst. 8(2), 117–132 (1997)CrossRefGoogle Scholar
  6. 6.
    Greco, S., Pontieri, L., Zumpano, E.: Integrating and managing conflicting data. In: Revised Papers from the 4th Int. Andrei Ershov Memorial Conf. on Perspectives of System Informatics, pp. 349–362 (2001)Google Scholar
  7. 7.
    Motro, A.: Completeness information and its application to query processing. In: Proc. of VLDB Kyoto, pp. 170–178 (August 1986)Google Scholar
  8. 8.
    Motro, A., Anokhin, P.: Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources. Information Fusion (2004) (In Press)Google Scholar
  9. 9.
    Naumann, F., Freytag, J.-C., Leser, U.: Completeness of integrated information sources. Information Systems 29(7), 583–615 (2004)CrossRefGoogle Scholar
  10. 10.
    Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proc. of VLDB, pp. 413–424 (1996)Google Scholar
  11. 11.
    Raman, V., Hellerstein, J.: Potter’s Wheel: An interactive data cleaning system. In: Proc. of VLDB, pp. 381–390 (2001)Google Scholar
  12. 12.
    Rao, J., Pirahesh, H., Zuzarte, C.: Canonical abstraction for outerjoin optimization. In: Proc. of SIGMOD, pp. 671–682. ACM Press, New York (2004)CrossRefGoogle Scholar
  13. 13.
    Sattler, K., Conrad, S., Saake, G.: Adding Conflict Resolution Features to a Query Language for Database Federations. In: Proc. 3rd Int. Workshop on Engineering Federated Information Systems, EFIS, pp. 41–52 (2000)Google Scholar
  14. 14.
    Scannapieco, M., Batini, C.: Completeness in the relational model: a comprehensive framework. In: Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, pp. 333–345 (2004)Google Scholar
  15. 15.
    Schallehn, E., Sattler, K.-U., Saake, G.: Efficient similarity-based operations for data integration. Data Knowl. Eng. 48(3), 361–387 (2004)CrossRefGoogle Scholar
  16. 16.
    Subrahmanian, V.S., Adali, S., Brink, A., Emery, R., Lu, J.L., Rajput, A., Rogers, T.J., Ross, R., Ward, C.: Hermes: A heterogeneous reasoning and mediator system. Technical report, University of Maryland (1995)Google Scholar
  17. 17.
    Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for object- relational systems. In: Proc of VLDB, pp. 166–175 (2000)Google Scholar
  18. 18.
    Yan, L.L., Özsu, M.: Conflict tolerant queries in AURORA. In: Proc. of CoopIS, p. 279 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jens Bleiholder
    • 1
  • Felix Naumann
    • 1
  1. 1.Humboldt-Universität zu BerlinBerlinGermany

Personalised recommendations