On Estimating COUNT, SUM, and AVERAGE Relational Algebra Queries
CASE-DB is a relational database management system that allows users to specify time constraints in queries. For an aggregate query AGG(E) where AGG is one of COUNT, SUM and AVERAGE, and E is a relational algebra expression, CASE-DB uses statistical estimators to approximate the query. This paper extends our earlier work on statistical estimators of CASE-DB with the following features: (a) New statistical estimators for COUNT queries with projection, (b) Extending the methodology for SUM and AVERAGE aggregate queries, (c) New sampling plans based on systematic sampling and stratified sampling. We also present performance evaluation experiments of the estimators with the above extensions using artificial database instances.
KeywordsSystematic Sampling Simple Random Sampling Stratify Random Sampling Relational Algebra Inclusion Probability
Unable to display preview. Download preview PDF.
- [BuOv 79]Burnham, K.P., Overton, W.S., “Robust Estimation of Population Size When Capture Probabilities Vary Among Animals”, Ecology, Vol. 60, 1979.Google Scholar
- [Chao 84]Chao, A., “Nonparametric Estimation of the Number of Classes in a Population”, Scand. J. Stat., Vol. 11, 1984.Google Scholar
- [Coch 77]Cochran, W., “Sampling Techniques”, Third Ed., John Wiley amp; Sons, Inc., 1977.Google Scholar
- [Good 49]Goodman, L., “On the Estimation of the Number of Classes in a Population”, Ann. Math. Stat., Vol. 20, 1949.Google Scholar
- [HoOT 88]Hou, W-C., Ozsoyoglu, G., Taneja, B., “Statistical Estimators for Relational Algebra Expressions”, ACM PODS Conference, March 1988.Google Scholar
- [HoOT 89]Hou, W-C., Ozsoyoglu, G. Taneja, B., “Processing Aggregate Relational Queries with Hard Time Constraints”, ACM SIGMOD Conference, May 1989.Google Scholar
- Ho0 91] Hou, W-C., Ozsoyoglu, G., “Statistical Estimators for Aggregate Relational Algebra Expressions”, To appear in ACM TODS Journal.Google Scholar
- [Olke 86]Olken, F., “Physical Database Support for Scientific and Statistical Databases”, Third Int. Scientific and Statistical Databases Workshop, 1986.Google Scholar
- [OlkR 86]Olken, F., Rotem, D., “Simple Random Sampling from Relational Databases”, Proc., VLDB Conf. 1986.Google Scholar
- [LNS 90]R.Lipton, J.Naughton and D. Schneider, “Practical Selectivity Estimation through Adaptive Sampling”, ACM SIGMOD, 1990.Google Scholar
- [LiNa 89]R. Lipton and J. Naughton, “Query Size Estimation by Adaptive Sampling”, ACM PODS, 1990.Google Scholar