Hierarchical Group-Based Sampling

Gemulla, Rainer; Berthold, Henrike; Lehner, Wolfgang

doi:10.1007/11511854_10

Rainer Gemulla¹⁹,
Henrike Berthold¹⁹ &
Wolfgang Lehner¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3567))

Included in the following conference series:

British National Conference on Databases

435 Accesses
2 Citations

Abstract

Approximate query processing is an adequate technique to reduce response times and system load in cases where approximate results suffice. In database literature, sampling has been proposed to evaluate queries approximately by using only a subset of the original data. Unfortunately, most of these methods consider either only certain problems arising due to the use of samples in databases (e.g. data skew) or only join operations involving multiple relations. We describe how well-known sampling techniques dealing with group-by operations can be combined with foreign-key joins such that the join is computed after the generation of the sample. In detail, we show how senate sampling and small group sampling can be combined efficiently with the idea of join synopses. Additionally, we introduce different algorithms which maintain the sample if the underlying data changes. Finally, we prove the superiority of our method to the naive approach in an extensive set of experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

University of California at Berkeley: How much Information? (2003), http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
Acharya, S., Gibbons, P., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: Proc. ACM SIGMOD, pp. 487–498 (2000)
Google Scholar
Babcock, B., Chaudhuri, S., Das, G.: Dynamic sample selection for approximate query processing. In: Proc. ACM SIGMOD, pp. 539–550 (2003)
Google Scholar
Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: Proc. ACM SIGMOD, pp. 275–286 (1999)
Google Scholar
Barbará, D., DuMouchel, W., Faloutsos, C., Haas, P., Hellerstein, J., Ioannidis, Y., Jagadish, H., Johnson, T., Ng, R., Poosala, V., Ross, K., Sevcik, K.: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20, 3–45 (1997)
Google Scholar
Hellerstein, J., Haas, P., Wang, H.: Online Aggregation. In: Proc. ACM SIGMOD, pp. 171–182 (1997)
Google Scholar
Vitter, J.: Random Sampling with a Reservoir. ACM Transactions on Mathematical Software 11, 37–57 (1985)
Article MATH MathSciNet Google Scholar
Gemulla, R., Lehner, W.: On Incremental Maintenance of Materialized Offline Samples (2005) (submitted for publication)
Google Scholar
Ganti, V., Lee, M., Ramakrishnan, R.: ICICLES: Self-Tuning Samples for Approximate Query Answering. The VLDB Journal, 176–187 (2000)
Google Scholar
Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.: Overcoming Limitations of Sampling for Aggregation Queries. In: Proc. ICDE, pp. 534–544 (2001)
Google Scholar
Chaudhuri, S., Motwani, R., Narasayya, V.: On Random Sampling over Joins. In: Proc. ACM SIGMOD, pp. 263–274 (1999)
Google Scholar
Gemulla, R., Berthold, H., Lehner, W.: Hierarchical Group-based Sampling (2005), Full version available at http://wwwdb.inf.tu-dresden.de/files/team/gemulla/files/hgs-fullversion.pdf
Transaction Processing Performance Council: TPC-D Benchmark Version 2.1 (1998), http://www.tpc.org

Download references

Author information

Authors and Affiliations

Database Technology Group, Dresden University of Technology,
Rainer Gemulla, Henrike Berthold & Wolfgang Lehner

Authors

Rainer Gemulla
View author publications
You can also search for this author in PubMed Google Scholar
Henrike Berthold
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Geospatial Science, University of Nottingham, UK
Mike Jackson
Motorola, Schaumburg, Illinois, USA
David Nelson
School of Computing and Technology, The Sir Tom Cowie Campus at St. Peter’s, University of Sunderland, SR6 0DD, Sunderland, UK
Sue Stirk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gemulla, R., Berthold, H., Lehner, W. (2005). Hierarchical Group-Based Sampling. In: Jackson, M., Nelson, D., Stirk, S. (eds) Database: Enterprise, Skills and Innovation. BNCOD 2005. Lecture Notes in Computer Science, vol 3567. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11511854_10

Download citation

DOI: https://doi.org/10.1007/11511854_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26973-1
Online ISBN: 978-3-540-31677-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics