Advertisement

Logic-Based Techniques in Data Integration

  • Alon Y. Levy
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 597)

Abstract

The data integration problem is to provide uniform access to multiple heterogeneous information sources available online (e.g., databases on the WWW). This problem has recently received considerable attention from researchers in the fields of Artificial Intelligence and Database Systems. The data integration problem is complicated by the facts that (1) sources contain closely related and overlapping data, (2) data is stored in multiple data models and schemas, and (3) data sources have differing query processing capabilities.

A key element in a data integration system is the language used to describe the contents and capabilities of the data sources. While such a language needs to be as expressive as possible, it should also enable to efficiently address the main inference problem that arises in this context: to translate a user query that is formulated over a mediated schema into a query on the local schemas. This paper describes several languages for describing contents of data sources, the tradeoffs between them, and the associated reformulation algorithms.

Keywords

Data integration description logics views 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S. and Duschka, O. (1998). Complexity of answering queries using materialized views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 254–263, Seattle, WA.Google Scholar
  2. Adali, S., Candan, K., Papakonstantinou, Y., and Subrahmanian, V. (1996). Query caching and optimization in distributed mediator systems. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 137–148, Montreal, Canada.Google Scholar
  3. Arens, Y., Knoblock, C. A., and Shen, W.-M. (1996). Query reformulation for dynamic information integration. International Journal on Intelligent and Cooperative Information Systems, (6) 2/3:99–130.Google Scholar
  4. Beeri, C., Levy, A. Y., and Rousset, M.-C. (1997). Rewriting queries using views in description logics. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 99–108, Tucson, Arizona.Google Scholar
  5. Cadoli, M., Palopoli, L., and Lenzerini, M. (1997). Datalog and description logics: Expressive power. In Proceedings of the International Workshop on Database Programming Languages, 281–198.Google Scholar
  6. Calvanese, D., Giacomo, G. D., and Lenzerini, M. (1999). Answering queries using views in description logics. In Working notes of the KRDB Workshop pages 6–10.Google Scholar
  7. Catarci, T. and Lenzerini, M. (1993). Representing and using interschema knowledge in cooperative information systems. Journal of Intelligent and Cooperative Information Systems, 55–62.Google Scholar
  8. Chandra, A. and Merlin, P. (1977). Optimal implementation of conjunctive queries in relational databases. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pages 77–90.CrossRefGoogle Scholar
  9. Chaudhuri, S., Krishnamurthy, R., Potamianos, S., and Shim, K. (1995). Optimizing queries with materialized views. In Proc. of Int. Conf. on Data Engineering (ICDE), Taipei, Taiwan, 190–200.Google Scholar
  10. Chaudhuri, S. and Vardi, M. (1993). Optimizing real conjunctive queries. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 59–70, Washington D.C.Google Scholar
  11. Chaudhuri, S. and Vardi, M. (1994). On the complexity of equivalence between recursive and nonrecursive Datalog programs. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 55–66, Minneapolis, Minnesota.Google Scholar
  12. Cohen, S., Nutt, W., and Serebrenik, A. (1999). Rewriting aggregate queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 155–166.Google Scholar
  13. Cohen, W. (1998). Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 201–210, Seattle, WA.Google Scholar
  14. Donini, F. M., Lenzerini, M., Nardi, D., and Schaerf, A. (1991). A hybrid system with Datalog and concept languages. In Ardizzone, E., Gaglio, S., and Sorbello, F., editors, Trends in Artificial Intelligence, volume LNAI 549, pages 88–97. Springer Verlag.Google Scholar
  15. Duschka, O. (1997). Query optimization using local completeness. In Proceedings of the AAAI Fourteenth National Conference on Artificial Intelligence, 249–255.Google Scholar
  16. Duschka, O., Genesereth, M., and Levy, A. (1999). Recursive query plans for data integration. Journal of Logic Programming, special issue on Logic Based Heterogeneous Information Systems, 43(l):49–73.MathSciNetGoogle Scholar
  17. Duschka, O. M. and Genesereth, M. R. (1997a). Answering recursive queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), 109–116, Tucson, Arizona.Google Scholar
  18. Duschka, O. M. and Genesereth, M. R. (1997b). Query planning in infomaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA.Google Scholar
  19. Duschka, O. M. and Levy, A. Y. (1997). Recursive plans for information gathering. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 778–784.Google Scholar
  20. Etzioni, O., Golden, K., and Weld, D. (1994). Tractable closed world reasoning with updates. In Proceedings of the Conference on Principles of Knowledge Representation and Reasoning, KR-94, pages 178–189. Extended version to appear in Artificial Intelligence.Google Scholar
  21. Florescu, D., Raschid, L., and Valduriez, P. (1996). Answering queries using OQL view expressions. In Workshop on Materialized Views, in cooperation with ACM SIGMOD, pages 84–90, Montreal, Canada.Google Scholar
  22. Friedman, M., Levy, A., and Millstein, T. (1999). Navigational plans for data integration. In Proceedings of the National Conference on Artificial Intelligence, pages 67–73.Google Scholar
  23. Friedman, M. and Weld, D. (1997). Efficient execution of information gathering plans. In Proceedings of the International Joint Conference on Artificial Intelligence, Nagoya, Japan, 785–791.Google Scholar
  24. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., and Widom, J. (1997). The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 8(2):117–132.Google Scholar
  25. Ives, Z., Florescu, D., Friedman, M., Levy, A., and Weld, D. (1999). An adaptive query execution engine for data integration. In Proc. of ACM SIGM OD Conf. on Management of Data, pages 299–310.Google Scholar
  26. Klug, A. (1988). On conjunctive queries containing inequalities. Journal of the ACM, pages 35(1): 146–160.MathSciNetzbMATHCrossRefGoogle Scholar
  27. Kwok, C. T. and Weld, D. S. (1996). Planning to gather information. In Proceedings of the AAAI Thirteenth National Conference on Artificial Intelligence, 32–39.Google Scholar
  28. Lattes, V. and Rousset, M.-C. (1998). The use of the CARIN language and algorithms for information integration: the PICSEL project. In Proceedings of the ECAI-98 Workshop on Intelligent Information Integration.Google Scholar
  29. Levy, A. and Rousset, M.-C. (1998). Combining Horn rules and description logics in carin. Artificial Intelligence, 104:165–209.MathSciNetzbMATHCrossRefGoogle Scholar
  30. Levy, A. Y. (1996). Obtaining complete answers from incomplete databases. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 402–412.Google Scholar
  31. Levy, A. Y. (1999). Answering queries using views: A survey. Submitted for publication.Google Scholar
  32. Levy, A. Y., Fikes, R. E., and Sagiv, S. (1997). Speeding up inferences using relevance reasoning: A formalism and algorithms. Artificial Intelligence, 97(1–2).Google Scholar
  33. Levy, A. Y., Mendelzon, A. O., Sagiv, Y., and Srivastava, D. (1995). Answering queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 95–104, San Jose, CA.Google Scholar
  34. Levy, A. Y., Rajaraman, A., and Ordille, J. J. (1996a). Query answering algorithms for information agents. In Proceedings of AAAI, pages 40–47.Google Scholar
  35. Levy, A. Y., Rajaraman, A., and Ordille, J.J. (1996b). Querying heterogeneous information sources using source descriptions. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, pages 251–262.Google Scholar
  36. Levy, A. Y., Rajaraman, A., and Ullman, J. D. (1996c). Answering queries using limited external processors. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), 227–237, Montreal, Canada.Google Scholar
  37. Levy, A. Y. and Sagiv, Y. (1993). Queries independent of updates. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 171–181, Dublin, Ireland.Google Scholar
  38. Levy, A. Y. and Suciu, D. (1997). Deciding containment for queries with complex objects and aggregations. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 20–31 Tucson, Arizona.Google Scholar
  39. Litwin, W., Mark, L., and Roussopoulos, N. (1990). Interoperability of multiple autonomous databases. ACM Computing Surveys, 22 (3):267–293.CrossRefGoogle Scholar
  40. MacGregor, R. M. (1994). A description classifier for the predicate calculus. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 213–220.Google Scholar
  41. Papakonstantinou, Y., Abiteboul, S., and Garcia-Molina, H. (1996). Object fusion in mediator systems. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 413–424, Bombay, India.Google Scholar
  42. Pottinger, R. and Levy, A. (1999). A scalable algorithm for answering queries using views. To appear in the proceedings of the 26th conference on very large databases, VLDB-2000, Cairo, Egypt, 2000.Google Scholar
  43. Qian, X. (1996). Query folding. In Proc. of Int. Conf. on Data Engineering (ICDE), pages 48–55, New Orleans, LA.Google Scholar
  44. Sagiv, Y. (1988). Optimizing Datalog programs. In Minker, J., editor, Foundations of Deductive Databases and Logic Programming, pages 659–698. Morgan Kaufmann, Los Altos, CA.Google Scholar
  45. Sagiv, Y. and Yannakakis, M. (1981). Equivalence among relational expressions with the union and difference operators. Journal of the ACM, 27(4):633–655.MathSciNetCrossRefGoogle Scholar
  46. Shmueli, O. (1993). Equivalence of Datalog queries is undecidable. Journal of Logic Programming, 15:231–241.MathSciNetzbMATHCrossRefGoogle Scholar
  47. Srivastava, D., Dar, S., Jagadish, H. V., and Levy, A. Y. (1996). Answering SQL queries using materialized views. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 318–329, Bombay, India.Google Scholar
  48. Srivastava, D. and Ramakrishnan, R. (1992). Pushing constraint selections. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 301–315, San Diego, CA.Google Scholar
  49. Tsatalos, O. G., Solomon, M. H., and Ioannidis, Y. E. (1996). The GMAP: A versatile tool for physical data independence. VLDB Journal, 5(2): 101–118.CrossRefGoogle Scholar
  50. Vassalos, V. and Papakonstantinou, Y. (1997). Describing and using query capabilities of heterogeneous sources. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 256–265, Athens, Greece.Google Scholar
  51. Yang, H. Z. and Larson, P. A. (1987). Query transformation for PSJ-queries. In Proc. of the Int. Conf on Very Large Data Bases (VLDB), pages 245-254, Brighton, England.Google Scholar

Copyright information

© Springer Science+Business Media New York 2000

Authors and Affiliations

  • Alon Y. Levy
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations