Abstract
The summary (statistics) data model described herein is an extension of the relational model. The concept of category (type or class) and the additivity property of some statistical functions form the basis of this model. In this approach category shields details of a database instance from users, and plays an important role in deriving new statistics data. Statistics data is a trinary tuple consisting of 〈statistical function, category, summary〉. The additivity property allows new statistics data to be generated without having to access the original database. Statistics data is meta-knowledge summarized by statistical functions of the detailed information typically stored in a conventional database. Unfortunately, deciding whether a category is derivable from a set of categories, in general, is NP-hard. The proposed generating category set can resolve the intractability problem of the category derivation. The derivation of new statistics data within a relation or on multi-relations is investigated, and the efficiency and correctness of the stored statistics data are guaranteed when the original database is updated or when new statistics data is obtained. Finally potential applications and security concerns applying to this model are also discussed.
Preview
Unable to display preview. Download preview PDF.
References
Bates, D., Boral, H., and Dewitt, D., “A Framework for Research in Database Management for Statistical Analysis,” in Proceedings ACM SIGMOD (1982).
Chen, M., “NP-hardness of Derivability Problem,” Internal Report, CS Department, UCLA, (1987).
Chen, P., “The Entity-Relationship Model—Toward a Unified View of Data,” ACM Trans. on Database Systems (March 1976).
Codd, E., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM (June 1970).
Denning, D. and Schlorer, J., “Inference Controls for Statistical Databases,” IEEE Computer (July 1983).
Denning, D., Nicholson, W., Sande, G., and Shoshani, A., “Research Topics in Statistical Database Management,” pp. 46–51 in Proceedings Second International Workshop on Statistical Database (1983).
Fortunato, E., Rafanelli, M., Ricci, F., and Sebastio, A., “An Algebra for Statistical Data,” in Proceedings Third International Workshop on Statistical Database (1986).
Fredman, M., “The Complexity of Maintaining an Array and Computing Its Partial Sums,” JACM (January 1981).
Garey, M. and Johnson, D., Computers and Intractability, Freeman (1979).
Ghosh, S., “SIAM: Statistics Information Access Method,” Tech. Rep. RJ4865, IBM, (1985).
Ghosh, S., Data Base Organization for Data Management, 2nd edition, Academic Press (1986). Chapter 9.
Ghosh, S., “Statistical Relational Tables for Statistical Database Management,” IEEE Trans. on Software Engineering (December 1986). Also published as IBM RJ4394, 1984.
Ghosh, S., “Statistical Metadata: Linear Regression Analysis,” in Foundation of Data Organization, ed. S. Ghosh Y. Kambayashi K. Tanaka, Plenum Press (1987). Also published as IBM RJ4444, 1985.
Ghosh, S., “Category Numerical Relational Operations for Statistical Database Management,” Tech. Rep. RJ5780, IBM, (1987).
Hebrail, G., “A Model of Summaries for Very Large Database,” in Proceedings Third International Workshop on Statistical Databases (1986).
Ikeda, H. and Kobayashi, Y., “Additional Facilities of a Conventional DBMS to Support Interactive Statistical Analysis,” in Proceedings First International Workshop on Statistical Database (1981).
Klug, A., “Equivalence of Relational Algebra and Relational calculus Query Languages Having Aggregate Functions,” ACM JACM (July 1982).
Koening, S. and Page, R., “A Transformational Framework for the Automatic Control of Derived Data,” in Proceedings VLDB (1981).
Nwokogba, I. and Rowan, W., “A Model for an Integrated Statistical and Commercial Database,” in Proceedings COMPSAC (1984).
Nwokogba, I. and Rowan, W., “A Statistical Parameterization Model for an Integrated Statistical and Commercial Database,” in Proceedings Computer Science and Statistics: the Interface (1986).
Sato, H., “Handling Summary Information in a Database: Derivability,” in Proceedings ACM SIGMOD (1981).
Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions,” in Proceedings VLDB (1982).
Shoshani, A. and Wong, H., “Statistical and Scientific Databases Issues,” IEEE Trans. on Software Engineering (October 1985).
Smith, J. and Smith, D., “Database Abstractions: Aggregation and Generalization,” ACM Trans. on Database Systems (June 1977).
Srivastava, J. and Lum, V., “A Tree Based Statistics Access Method(TBSAM),” in Proceedings International Data Engineering (1988). Also published as IBM RJ5399, 1986.
Thorey, T., Yang, D., and Fry, J., “A Logical Design Methodology for Relational Databases Using Extended ER Model,” ACM Computing Surveys (June 1986).
Walker, A., “On Retrieval from a Small Version of a Large Data Base,” in Proceedings VLDB (1980).
Yao, A., “On the Complexity of Maintaining Partial Sums,” SIAM J. Computer (May 1985).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, M.C., McNamee, L., Melkanoff, M. (1989). A model of summary data and its applications in statistical databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027524
Download citation
DOI: https://doi.org/10.1007/BFb0027524
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-50575-4
Online ISBN: 978-3-540-46045-9
eBook Packages: Springer Book Archive