Skip to main content

A Framework for the Physical Design Problem for Data Synopses

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT 2002 (EDBT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

Abstract

Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data “synopsis” on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the ACM SIGMOD Conference, pages 275–286. ACM Press, 1999.

    Google Scholar 

  2. B. Blohsfeld, D. Korus, and B. Seeger. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. In Proceedings of the ACM SIGMOD Conference, pages 239–250, 1999.

    Google Scholar 

  3. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 111–122, 2000.

    Google Scholar 

  4. S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of ACM PODS Conference, pages 34–43, 1998.

    Google Scholar 

  5. S. Chaudhuri, R. Motwani, and V. R. Narasayya. On Random Sampling over Joins. In Proceedings of the ACM SIGMOD Conference, pages 263–274, 1999.

    Google Scholar 

  6. S. Chaudhuri and V. R. Narasayya. Automating Statistics management for Query Optimizers. IEEE Conference on Data Engineering, pages 339–348, 2000.

    Google Scholar 

  7. C. M. Chen and N. Roussoploulos. Adaptive Selectivity Estimation Using Query Feedback. In Proceedings of the ACM SIGMOD Conference, pages 161–172, 1994.

    Google Scholar 

  8. V. Ganti, M.-L. Lee, and R. Ramakrishnan. Icicles: Self-tuning samples for approximate query answering. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 176–187, 2000.

    Google Scholar 

  9. P. B. Gibbons, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, V. Poosala, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. Technical report, Bell Labs, 1998.

    Google Scholar 

  10. P. B. Gibbons and Y. Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD Conference, 1998.

    Google Scholar 

  11. P. B. Gibbons and Y. Matias. Synopsis Data Structures for Massive Data Sets. In Symposium on Discrete Algorithms, 1999.

    Google Scholar 

  12. P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. In Proceedings of the 23rd International Conference on Very Large Databases, 1997.

    Google Scholar 

  13. P. J. Haas. Selectivity and Cost Estimation for Joins Based on Random Sampling. Journal of Computer and System Sciences, pages 550–569, 1996.

    Google Scholar 

  14. Y. E. Ioannidis and V. Poosala. Histogram-Based Approximation of Set-Valued Query-Answers. In Proceedings of 25th International Conference on Very Large Data Bases, pages 174–185, 1999.

    Google Scholar 

  15. H. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global Optimization of Histograms. In Proceedings of the ACM SIGMOD Conference. ACM Press, 2001.

    Google Scholar 

  16. H. V. Jagadish, N. Koudas, S. Mutukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In Proceedings 24th International Conference on Very Large Databases, pages 275–286, 1998.

    Google Scholar 

  17. N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the ACM SIGMOD Conference, 1998.

    Google Scholar 

  18. A. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In 25th International Conference on Very Large Databases, 1999.

    Google Scholar 

  19. A. König and G. Weikum. Auto-Tuned Spline Synopses for Database Statistics Management. 10th Int. Conference on the Management of Data, Pune, India, 2000.

    Google Scholar 

  20. A. König and G. Weikum. A Framework for the Physical Design Problem for Data Synopses(extended version) available at: http://www-dbs.cs.uni-sb.de/.

  21. J.-H. Lee, D.-H. Kim, and C.-W. Chung. Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In Proceedings of the ACM SIGMOD Conference, pages 205–214, 1999.

    Google Scholar 

  22. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the ACM SIGMOD Conference, pages 448–459, 1998.

    Google Scholar 

  23. V. Pooosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In Proceedings of the ACM SIGMOD Conference, Athens, Greece, 1997.

    Google Scholar 

  24. V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin-Madison, 1997.

    Google Scholar 

  25. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Receipes in C. Cambridge University Press, 1996.

    Google Scholar 

  26. E. Skubalska-Rafajlowicz. The Closed Curve Filling Multidimensional Cube, Technical Report no. 46/94. ICT Technical University of Wroclaw, 1994.

    Google Scholar 

  27. W. Sun, Y. Ling, N. Rishe, and Y. Deng. An instant and accurate Size Estimation Method for Joins and Selections in an Retrival-Intensive Environment. In Proceedings of the ACM SIGMOD Conference, pages 79–88, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

König, A.C., Weikum, G. (2002). A Framework for the Physical Design Problem for Data Synopses. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-45876-X_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43324-8

  • Online ISBN: 978-3-540-45876-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics