Precision-time tradeoffs: A paradigm for processing statistical queries on databases

  • Jaideep Srivastava
  • Doron Rotem
Contributed Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 339)


Conventional query processing techniques are aimed at queries which access small amounts of data, and require each data item for the answer. In case the database is used for statistical analysis as well as operational purposes, for some types of queries a large part of the database may be required to compute the answer. This may lead to a data access bottleneck, caused by the excessive number of disk accesses needed to get the data into primary memory. An example is computation of statistical parameters, such as count, average, median, and standard deviation, which are useful for statistical analysis of the database. Yet another example that faces this bottleneck is the verification of the truth of a set of predicates (goals), based on the current database state, for the purposes of intelligent decision making. A solution to this problem is to maintain a set of precomputed information about the database in a view or a snapshot. Statistical queries can be processed using the view rather than the real database. A crucial issue is that the precision of the precomputed information in the view deteriorates with time, because of the dynamic nature of the underlying database. Thus the answer provided is approximate, which is acceptable under many circumstances, especially when the error is bounded. The tradeoff is that the processing of queries is made faster at the expense of the precision in the answer. The concept of precision in the context of database queries is formalized, and a data model to incorporate it is developed. Algorithms are designed to maintain materialized views of data to specified degrees of precision.


Data Item Relational Algebra Data Copy Query Plan Disk Access 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. [ASTR 77]
    Astrahan, M.M., “System R: A Relational Database Management System”, IBM Research Report.Google Scholar
  2. [BLAK 86]
    Blakeley, J.A., P.Larson and F.W.Tompa, “Efficiently Updating Materialized Views”, Proc. of the 1986 ACM-SIGMOD Conf. on Management of Data, Washington DC, May 1986, 61–71.Google Scholar
  3. [COCH 53]
    Cochran, W.G., “Sampling Techniques”, John Wiley Sons, New York, USA, 1953.Google Scholar
  4. [FELL 68]
    Feller, William, “An Introduction to Probability Theory and Its Applications”, John Wiley & Sons, Inc., New York 1968.Google Scholar
  5. [GHOS 85]
    Ghosh, S.P., “SIAM: Statistics Information Access Method,” IBM RJ 4865 (51295).Google Scholar
  6. [HANS 87]
    Hanson, Eric N. “A Performance Analysis of View Materialization Strategies,” Proc. of the 1987 ACM-SIGMOD Intl. Conf. on the Management of Data, San Francisco, CA, May 1987.Google Scholar
  7. [HEBR 86]
    Hebrail, G., “A model for summaries for very large databases,” 3rd Workshop on Statistical & Scientific Databases, 1986.Google Scholar
  8. [HOEL 71]
    Hoel, P.G., S.C. Port and C.J. Stone, “Introduction to Probability Theory”, Houghton Mifflin Company, Boston, 1971.Google Scholar
  9. [HOU 87]
    Hou, Wen-Chi, G. Ozsoyoglu, B.K. Taneja, “Statistical Estimators for Relational Algebra Expressions,” Deptt. of Comp. Sc., Case Western Reserve University, 1987.Google Scholar
  10. [KOEN 81]
    Koenig, S. and R. Paige, “A Transformational Framework for the Automatic Control of Derived Data,” Proc. of the VLDB Conference, 1981.Google Scholar
  11. [OLKE 86]
    Olken, F. and D.Rotem, “Simple Random Sampling from Relational Databases,” Proc. of the Conf. on VLDB, Kyoto, Japan, August, 1986.Google Scholar
  12. [ROSS 85]
    Ross, Sheldon M., “Introduction to Probability Models”, Academic Press, Inc., Orlando, Florida, 1985.Google Scholar
  13. [ROUS 86]
    Roussopoulos, N. and H.Kang, “Principles and Techniques in the Design of ADMS+/−”, Computer, December 1986.Google Scholar
  14. [ROWE 83]
    Rowe, N.C., “Rule-Based Statistical Calculation on a Database Abstract,” Rep. STAN-CS-83-975.Google Scholar
  15. [SHOS 82]
    Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions.” Proc. 8th Intl. Conf. on VLDB, 1982, pp 208–222.Google Scholar
  16. [SRIV 87]
    Srivastava, J. and Doron Rotem, “Analytical Modeling of Materialized View Maintenance,” Lawrence Berkeley Laboratories Tech. Rep., 1987.Google Scholar
  17. [ULLM 82]
    Ullman, J.D., “Principles of Database Systems,” Computer Science Press, 1982.Google Scholar
  18. [VITT 84]
    Vitter, Jefferey S., “Faster methods of Random Sampling,” CACM 27(7):703–718, July 1984.Google Scholar
  19. [ZADE 65]
    Zadeh, L.A., “Fuzzy Sets”, Information and Control 8, 1965, pp. 338–353.Google Scholar
  20. [ZADE 78]
    Zadeh, L.A., “Fuzzy Sets as a basis for a theory of possibility.” Fuzzy Sets and Systems, 1, pp. 3–28, 1978.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Jaideep Srivastava
    • 1
  • Doron Rotem
    • 1
  1. 1.Computer Science Research Lawrence Berkeley LaboratoryUniversity of CaliforniaBerkeley

Personalised recommendations