Skip to main content

Precision-time tradeoffs: A paradigm for processing statistical queries on databases

  • Contributed Papers
  • Chapter
  • First Online:
Statistical and Scientific Database Management (SSDBM 1988)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 339))

  • 187 Accesses

Abstract

Conventional query processing techniques are aimed at queries which access small amounts of data, and require each data item for the answer. In case the database is used for statistical analysis as well as operational purposes, for some types of queries a large part of the database may be required to compute the answer. This may lead to a data access bottleneck, caused by the excessive number of disk accesses needed to get the data into primary memory. An example is computation of statistical parameters, such as count, average, median, and standard deviation, which are useful for statistical analysis of the database. Yet another example that faces this bottleneck is the verification of the truth of a set of predicates (goals), based on the current database state, for the purposes of intelligent decision making. A solution to this problem is to maintain a set of precomputed information about the database in a view or a snapshot. Statistical queries can be processed using the view rather than the real database. A crucial issue is that the precision of the precomputed information in the view deteriorates with time, because of the dynamic nature of the underlying database. Thus the answer provided is approximate, which is acceptable under many circumstances, especially when the error is bounded. The tradeoff is that the processing of queries is made faster at the expense of the precision in the answer. The concept of precision in the context of database queries is formalized, and a data model to incorporate it is developed. Algorithms are designed to maintain materialized views of data to specified degrees of precision.

This work was done while the first author was on leave from the C.S. Division, U.C. Berkeley.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. Astrahan, M.M., “System R: A Relational Database Management System”, IBM Research Report.

    Google Scholar 

  2. Blakeley, J.A., P.Larson and F.W.Tompa, “Efficiently Updating Materialized Views”, Proc. of the 1986 ACM-SIGMOD Conf. on Management of Data, Washington DC, May 1986, 61–71.

    Google Scholar 

  3. Cochran, W.G., “Sampling Techniques”, John Wiley Sons, New York, USA, 1953.

    Google Scholar 

  4. Feller, William, “An Introduction to Probability Theory and Its Applications”, John Wiley & Sons, Inc., New York 1968.

    Google Scholar 

  5. Ghosh, S.P., “SIAM: Statistics Information Access Method,” IBM RJ 4865 (51295).

    Google Scholar 

  6. Hanson, Eric N. “A Performance Analysis of View Materialization Strategies,” Proc. of the 1987 ACM-SIGMOD Intl. Conf. on the Management of Data, San Francisco, CA, May 1987.

    Google Scholar 

  7. Hebrail, G., “A model for summaries for very large databases,” 3rd Workshop on Statistical & Scientific Databases, 1986.

    Google Scholar 

  8. Hoel, P.G., S.C. Port and C.J. Stone, “Introduction to Probability Theory”, Houghton Mifflin Company, Boston, 1971.

    Google Scholar 

  9. Hou, Wen-Chi, G. Ozsoyoglu, B.K. Taneja, “Statistical Estimators for Relational Algebra Expressions,” Deptt. of Comp. Sc., Case Western Reserve University, 1987.

    Google Scholar 

  10. Koenig, S. and R. Paige, “A Transformational Framework for the Automatic Control of Derived Data,” Proc. of the VLDB Conference, 1981.

    Google Scholar 

  11. Olken, F. and D.Rotem, “Simple Random Sampling from Relational Databases,” Proc. of the Conf. on VLDB, Kyoto, Japan, August, 1986.

    Google Scholar 

  12. Ross, Sheldon M., “Introduction to Probability Models”, Academic Press, Inc., Orlando, Florida, 1985.

    Google Scholar 

  13. Roussopoulos, N. and H.Kang, “Principles and Techniques in the Design of ADMS+/−”, Computer, December 1986.

    Google Scholar 

  14. Rowe, N.C., “Rule-Based Statistical Calculation on a Database Abstract,” Rep. STAN-CS-83-975.

    Google Scholar 

  15. Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions.” Proc. 8th Intl. Conf. on VLDB, 1982, pp 208–222.

    Google Scholar 

  16. Srivastava, J. and Doron Rotem, “Analytical Modeling of Materialized View Maintenance,” Lawrence Berkeley Laboratories Tech. Rep., 1987.

    Google Scholar 

  17. Ullman, J.D., “Principles of Database Systems,” Computer Science Press, 1982.

    Google Scholar 

  18. Vitter, Jefferey S., “Faster methods of Random Sampling,” CACM 27(7):703–718, July 1984.

    Google Scholar 

  19. Zadeh, L.A., “Fuzzy Sets”, Information and Control 8, 1965, pp. 338–353.

    Google Scholar 

  20. Zadeh, L.A., “Fuzzy Sets as a basis for a theory of possibility.” Fuzzy Sets and Systems, 1, pp. 3–28, 1978.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maurizio Rafanelli John C. Klensin Per Svensson

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Srivastava, J., Rotem, D. (1989). Precision-time tradeoffs: A paradigm for processing statistical queries on databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027516

Download citation

  • DOI: https://doi.org/10.1007/BFb0027516

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-50575-4

  • Online ISBN: 978-3-540-46045-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics