Skip to main content

HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Predicates

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Abstract

Current methods for selectivity estimation fall into two broad categories, synopsis-based and sampling-based. Synopsis-based methods, such as histograms, incur minimal overhead at query optimization time and thus are widely used in commercial database systems. Sampling-based methods are more suited for ad-hoc queries, but often involve high I/O cost because of random access to the underlying data. Though both methods serve the same purpose of selectivity estimation, their interaction in the case of selectivity estimation for conjuncts of predicates on multiple attributes is largely unexplored. Our work aims at taking the best of both worlds, by making consistent use of synopses and sample information when they are both present. To achieve this goal, we propose HASE, a novel estimation scheme based on a powerful mechanism called generalized raking. We formalize selectivity estimation in the presence of single attribute synopses and sample information as a constrained optimization problem. By solving this problem, we obtain a new set of weights associated with the sampled tuples, which has the nice property of reproducing the known selectivities when applied to individual predicates. We discuss different variants of the optimization problem and provide algorithms for solving it. We also provide asymptotic error bounds on the estimate. Extensive experiments are performed on both synthetic and real data, and the results show that HASE significantly outperforms both synopsis-based and sampling-based methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD, pp. 294–305 (1996)

    Google Scholar 

  2. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, pp. 486–495 (1997)

    Google Scholar 

  3. Olken, F.: Random sampling from databases. PhD thesis, University of California, Berkeley, CA (1993)

    Google Scholar 

  4. Haas, P.J., König, C.: A bi-level Bernoulli scheme for database sampling. In: SIGMOD Conference, pp. 275–286 (2004)

    Google Scholar 

  5. Deshpande, A., Garofalakis, M.N., Rastogi, R.: Independence is good: Dependencybasedhistogramsynopses forhigh-dimensionaldata. In: SIGMODConference (2001)

    Google Scholar 

  6. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 663–685 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  7. Chaudhuri, S., Das, G., Srivastava, U.: Effective use of block-level sampling in statistics estimation. In: SIGMOD Conference., pp. 287–298 (2004)

    Google Scholar 

  8. Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 376–382 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  9. Deville, J.C., Särndal, C.E., Sautory, O.: Generalized raking procedures in survey sampling. Journal of the American Statistical Association 88, 1013–1020 (1993)

    Article  MATH  Google Scholar 

  10. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1996)

    Google Scholar 

  11. Deming, W.E., Stephan, F.F.: On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11, 427–444 (1940)

    Article  MATH  MathSciNet  Google Scholar 

  12. Särndal, C.E., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, New York (1992)

    MATH  Google Scholar 

  13. Muralikrishna, M., DeWitt, D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In: SIGMOD, pp. 28–36 (1988)

    Google Scholar 

  14. Hettich, S., Bay, S.D.: The UCI KDD Archive, Irvine, CA. University of California, Department of Information and Computer Science (1999)

    Google Scholar 

  15. Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. In: PODS, pp. 40–46 (1990)

    Google Scholar 

  16. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD, pp. 448–459 (1998)

    Google Scholar 

  17. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD, pp. 181–192. ACM Press, New York (1999)

    Google Scholar 

  18. Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)

    Google Scholar 

  19. Fedorowicz, J.: Database evaluation using multiple regression techniques. In: SIGMOD, pp. 70–76 (1984)

    Google Scholar 

  20. Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P.J., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: VLDB, pp. 373–384 (2005)

    Google Scholar 

  21. Haas, P.J., Swami, A.N.: Sequential sampling procedures for query size estimation. In: SIGMOD, pp. 341–350 (1992)

    Google Scholar 

  22. Naughton, J.F., Seshadri, S.: On estimating the size of projections. ICDT 470, 499–513 (1990)

    MathSciNet  Google Scholar 

  23. Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB, pp. 311–322 (1995)

    Google Scholar 

  24. Chaudhuri, S., Motwani, R., Narasayya, V.R.: Random sampling for histogram construction: How much is enough? In: SIGMOD, pp. 436–447 (1998)

    Google Scholar 

  25. Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. In: VLDB, pp. 466–475 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, X., Koudas, N., Zuzarte, C. (2006). HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Predicates. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_29

Download citation

  • DOI: https://doi.org/10.1007/11687238_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics