Advertisement

Approximate Query Answering Using Data Warehouse Striping

  • Jorge Bernardino
  • Pedro Furtado co
  • Henrique Madeira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2114)

Abstract

This paper presents an approach to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.

Keywords

Data Warehouse Fact Table Query Response Time Approximate Answer Star Schema 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acharaya, S., Gibbons, P.., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. ACM SIGMOD Int. Conf on Management of Data, (2000) 487–498Google Scholar
  2. 2.
    Albrecht, J., Gunzel, H., Lehner, W.: An Architecture for Distributed OLAP. Int. Conference on Parallel and Distributed Processing Techniques and Applications PDPTA, (1998)Google Scholar
  3. 3.
    APB-1 Benchmark, Olap Council, November 1998, http://www.olpacouncil.org
  4. 4.
    Barbara, D., et al.: The New Jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4) (1997) 3–45Google Scholar
  5. 5.
    Bernardino, J., Madeira, H.: A New Technique to Speedup Queries in Data Warehousing. In Proc. of Chalenges ADBIS-DASFAA, Prague (2000) 21–32Google Scholar
  6. 6.
    Chauduri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1), (1997) 65–74CrossRefGoogle Scholar
  7. 7.
    Cochran, William G.: Sampling Techniques, 3rd edn, John Wiley & Sons, New York, 1977.zbMATHGoogle Scholar
  8. 8.
    Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP (online analitycal processing) to useranalysts: An IT mandate. Technical report, E.F. Codd & Associates (1993)Google Scholar
  9. 9.
    Gibbons, P.B., Matias Y.: New sampling-based summary statistics for improving approximate query answers. ACM SIGMOD Int. Conf. on Management of Data (1998) 331–342Google Scholar
  10. 10.
    Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In Proc. 9th Int. Conference on Scientific and Statistical Database Management (1997) 51–62Google Scholar
  11. 11.
    Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. ACM SIGMOD Int. Conference on Management of Data (1997) 171–182Google Scholar
  12. 12.
    Kimball, Ralph: The Data Warehouse Toolkit. Ed. J. Wiley & Sons, Inc (1996)Google Scholar
  13. 13.
    Kimball, Ralph, Reeves, L., Ross, M., Thornthwalte, W.: The Data Warehouse Lifecycle Toolkit. Ed. J. Wiley & Sons, Inc (1998)Google Scholar
  14. 14.
    Selinger, P., et al.: Access Path Selection in a Relational Database Management System. ACM SIGMOD Int. Conf. on Management of Data (1979) 23–34Google Scholar
  15. 15.
    TPC Benchmark H, Transaction Processing Council, June 1999, http://www.tpc.org
  16. 16.
    Vitter, J., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. ACM SIGMOD Int. Conf. on Management of Data (1999) 193–204Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Jorge Bernardino
    • 1
  • Pedro Furtado co
    • 2
  • Henrique Madeira
    • 2
  1. 1.Institute Polytechnic of CoimbraCoimbraPortugal
  2. 2.University of CoimbraCoimbraPortugal

Personalised recommendations