“On-the-fly” VS Materialized Sampling and Heuristics

Furtado, Pedro

doi:10.1007/978-3-540-45228-7_41

Pedro Furtado⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2737))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

440 Accesses

Abstract

Aggregation queries can take hours to return answers in large Data warehouses (DW). The user interested in exploring data in several iterative steps using decision support or data mining tools may feel frustrated for such long response times. The ability to return fast approximate answers accurately and efficiently is important to these applications. Samples for use in query answering can be obtained “On-the-fly” (OS) or from a materialized summary of samples (MS). While MS are typically faster than OS summaries, they have the limitation that sampling rates are predefined upon construction. This paper analyzes the use of OS versus MS for approximate answering of aggregation queries and proposes a Sampling Heuristic that chooses the appropriate sampling rate to provide answers as fast as possible while guaranteeing accuracy targets simultaneously. The experimental section compares OS to MS, analyzing response time and accuracy (TPC-H benchmark), and shows the heuristics strategy in action.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acharaya, S., Gibbons, P.B., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: ACM SIGMOD Int. Conference on Management of Data, pp. 487–498 (June 2000)
Google Scholar
Acharaya, S., et al.: Join synopses for approximate query answering. In: ACM SIGMOD Int. Conference on Management of Data, pp. 275–286 (June 1999)
Google Scholar
Barbara, D., et al.: The New Jersey data reduction report. Bulletin of the Technical Committee on Data Engineering 20(4), 3–45 (1997)
Google Scholar
Furtado, P., Costa, J.P.: Time-interval sampling for improved estimations in data warehouses. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 327–337. Springer, Heidelberg (2002)
Chapter Google Scholar
Furtado, P., Costa, J.P.: The BofS Solution to Limitations of Approximate Summaries. In: DASFAA 2003 (2003)
Google Scholar
Gibbons, P.B., Matias, Y., Poosala, V.: Aqua project white paper. Technical report, Bell Laboratories, Murray Hill, New Jersey (December 1997)
Google Scholar
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proc. ACM SIGMOD Int. Conference on Management of Data, pp. 331–342 (June 1998)
Google Scholar
Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proc. 9th Intl. Conf. Scientific and Statistical Database Management (August 1997)
Google Scholar
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: ACM SIGMOD Int. Conference on Management of Data, pp. 171–182 (May 1997)
Google Scholar
Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Informática e Sistemas (DEI-CISUC), Universidade de Coimbra,
Pedro Furtado

Authors

Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo, 606-8501, Kyoto, Japan
Yahiko Kambayashi
I.B.M. India Research Lab, India
Mukesh Mohania
Institute for Application Oriented Knowledge Processing (FAW), Johannes Kepler University Linz, Austria
Wolfram Wöß

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furtado, P. (2003). “On-the-fly” VS Materialized Sampling and Heuristics. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2003. Lecture Notes in Computer Science, vol 2737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45228-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-45228-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40807-9
Online ISBN: 978-3-540-45228-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics