Abstract
Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data “synopsis” on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the ACM SIGMOD Conference, pages 275–286. ACM Press, 1999.
B. Blohsfeld, D. Korus, and B. Seeger. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. In Proceedings of the ACM SIGMOD Conference, pages 239–250, 1999.
K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 111–122, 2000.
S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of ACM PODS Conference, pages 34–43, 1998.
S. Chaudhuri, R. Motwani, and V. R. Narasayya. On Random Sampling over Joins. In Proceedings of the ACM SIGMOD Conference, pages 263–274, 1999.
S. Chaudhuri and V. R. Narasayya. Automating Statistics management for Query Optimizers. IEEE Conference on Data Engineering, pages 339–348, 2000.
C. M. Chen and N. Roussoploulos. Adaptive Selectivity Estimation Using Query Feedback. In Proceedings of the ACM SIGMOD Conference, pages 161–172, 1994.
V. Ganti, M.-L. Lee, and R. Ramakrishnan. Icicles: Self-tuning samples for approximate query answering. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 176–187, 2000.
P. B. Gibbons, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, V. Poosala, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. Technical report, Bell Labs, 1998.
P. B. Gibbons and Y. Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD Conference, 1998.
P. B. Gibbons and Y. Matias. Synopsis Data Structures for Massive Data Sets. In Symposium on Discrete Algorithms, 1999.
P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. In Proceedings of the 23rd International Conference on Very Large Databases, 1997.
P. J. Haas. Selectivity and Cost Estimation for Joins Based on Random Sampling. Journal of Computer and System Sciences, pages 550–569, 1996.
Y. E. Ioannidis and V. Poosala. Histogram-Based Approximation of Set-Valued Query-Answers. In Proceedings of 25th International Conference on Very Large Data Bases, pages 174–185, 1999.
H. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global Optimization of Histograms. In Proceedings of the ACM SIGMOD Conference. ACM Press, 2001.
H. V. Jagadish, N. Koudas, S. Mutukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In Proceedings 24th International Conference on Very Large Databases, pages 275–286, 1998.
N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the ACM SIGMOD Conference, 1998.
A. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In 25th International Conference on Very Large Databases, 1999.
A. König and G. Weikum. Auto-Tuned Spline Synopses for Database Statistics Management. 10th Int. Conference on the Management of Data, Pune, India, 2000.
A. König and G. Weikum. A Framework for the Physical Design Problem for Data Synopses(extended version) available at: http://www-dbs.cs.uni-sb.de/.
J.-H. Lee, D.-H. Kim, and C.-W. Chung. Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In Proceedings of the ACM SIGMOD Conference, pages 205–214, 1999.
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the ACM SIGMOD Conference, pages 448–459, 1998.
V. Pooosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In Proceedings of the ACM SIGMOD Conference, Athens, Greece, 1997.
V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin-Madison, 1997.
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Receipes in C. Cambridge University Press, 1996.
E. Skubalska-Rafajlowicz. The Closed Curve Filling Multidimensional Cube, Technical Report no. 46/94. ICT Technical University of Wroclaw, 1994.
W. Sun, Y. Ling, N. Rishe, and Y. Deng. An instant and accurate Size Estimation Method for Joins and Selections in an Retrival-Intensive Environment. In Proceedings of the ACM SIGMOD Conference, pages 79–88, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
König, A.C., Weikum, G. (2002). A Framework for the Physical Design Problem for Data Synopses. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_39
Download citation
DOI: https://doi.org/10.1007/3-540-45876-X_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive