Advertisement

Improving Range Query Result Size Estimation Based on a New Optimal Histogram

  • Wissem Labbadi
  • Jalel Akaichi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8132)

Abstract

Many commercial relational Data Base Management Systems (DBMSs) maintain histograms to approximate the distribution of values in the relation attributes and based on them estimate query result sizes. A histogram approximates the distribution by grouping data into buckets. The estimation-errors resulting from the loss of information during the grouping process affect the accuracy of the decision, made by query optimizers, about choosing the most economical evaluation plan for a query. In front of this challenging problem, many histogram-based estimation techniques including the equi-depth, the v-optimal, the max-diff and the compressed histograms have well contributed to approximate the cost of a query evaluation plan. But, most of the times the obtained estimates have much error. Motivated by the fact that inaccurate estimations can lead to wrong decisions, we propose in this paper an efficient algorithm, called Compressed-V2, for accurate histogram constructions. Both theoretical and effective experiments are done using benchmark data set showing the promising results obtained using the proposed algorithm. We think that this algorithm will significantly contribute for helping to solve the problem of Multi-Query Optimization (MQO) resulting from queries interactions especially in Relational Data Warehouses (RDW) which represent the ideal environment in which complex OLAP queries interact with each other.

Keywords

Optimal histograms Query result size estimation Intermediate query result distribution DBMS Estimation error Multi-query optimization Query interaction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 233–244 (1995)Google Scholar
  2. 2.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, USA, pp. 275–286 (1998)Google Scholar
  3. 3.
    Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 294–305 (1996)Google Scholar
  4. 4.
    Jagadish, H.V., Jin, H., Ooi, B.C., Tan, K.-L.: Global optimization of histograms. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 223–234 (2001)Google Scholar
  5. 5.
    Yu, C., Philip, G., Meng, W.: Distributed top-N query processing with possibly uncooperative local systems. In: Proc. 29th VLDB Conf., Berlin, Germany, pp. 117–128 (2003)Google Scholar
  6. 6.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the ACM SIGMOD International Symposium on Management of Data, Boston, Mass., pp. 23–34 (June 1979)Google Scholar
  7. 7.
    John Oommen, B., Rueda, L.G.: An empirical comparison of histogram-like techniques for query optimization. In: Proceedings of the 2nd International Conference on Entreprise Information Systems, Stafford, UK, July 4-7, pp. 71–78 (2000)Google Scholar
  8. 8.
    Ioannidis, Y., Christodoulakis, S.: On the propagation of errors in the size of join results. In: Proceedings of the 1991 ACM SIGMOD Conference, Denver, CO, pp. 268–277 (May 1991)Google Scholar
  9. 9.
    Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the estimates of query optimizers. To appear in ACM-TODS (1992)Google Scholar
  10. 10.
    Kooi, R.P.: The optimization of queries in relational databases. PhD thesis, Case Western Reserver University (September 1980)Google Scholar
  11. 11.
    Shapiro, G.P., Connell, C.: Accurate Estimation of the Number of Tuples Satisfying a Condition. In: Proceedings of ACM-SIGMOD Conference, pp. 256–276 (1984)Google Scholar
  12. 12.
    Ioannidis, Y.: Universality of serial histograms. In: Proceedings of the 19th Int. Conf. on Very Large Databases, pp. 256–267 (December 1993)Google Scholar
  13. 13.
    Poosala, V., Ioannidis, Y.: Estimation of query-result distribution and its application in parallel-join load balancing. In: Proceedings of the 22nd Int. Conf. on Very Large Databases, pp. 448–459 (1996)Google Scholar
  14. 14.
    Gupta, A., Sudarshan, S., Viswanathan, S.: Query scheduling in multi query optimization. In: IDEAS, pp. 11–19 (2001)Google Scholar
  15. 15.
    Thomas, D., Diwan, A.A., Sudarshan, S.: Scheduling and caching in multi query optimization. In: COMAD, pp. 150–153 (2006)Google Scholar
  16. 16.
    Kerkad, A., Bellatreche, L., Geniet, D.: Queen-Bee: Query interaction- aware for buffer allocation and scheduling problem. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 156–167. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Ioannidis, Y.: Query optimization. In: ACM Computing Surveys, Symposium Issue on the 50th Anniversary of ACM, vol. 28, pp. 121–123 (1996)Google Scholar
  18. 18.
    Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. ACM TODS 9(2), 163–186 (1984)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley, Cambridge (1949)Google Scholar
  20. 20.
    Liu, Y.: Data preprocessing. Department of Biomedical, Industrial and Human Factors Engineering Wright State University (2010)Google Scholar
  21. 21.
    Ioannidis, Y., Poosala, V.: Histogram-based solutions to diverse database estimation problems. IEEE Data Engineering Bulletin 18(3), 10–18 (1995)Google Scholar
  22. 22.
    Muralikrishna, M., Dewitt, D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In: Proceedings of ACM SIGMOD Conference, pp. 28–36 (1988)Google Scholar
  23. 23.
    Mousavi, H., Zaniolo, C.: Fast and Accurate Computation of Equi-Depth Histograms over Data Streams. In: Proceedings of EDBT, Uppsala, Sweden, March 22-24 (2011)Google Scholar
  24. 24.
    Gomes, J.S.: Adaptive Histogram Algorithms for Approximating Frequency Queries in Dynamic Data Streams. In: 12th International Conference on Internet Computing, ICOMP 2011, Las Vegas, NV, July 18-21 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Wissem Labbadi
    • 1
  • Jalel Akaichi
    • 1
  1. 1.Computer Science DepartmentISG-University of TunisLe BardoTunisia

Personalised recommendations