A model of summary data and its applications in statistical databases

Chen, Meng Chang; McNamee, Lawrence; Melkanoff, Michel

doi:10.1007/BFb0027524

Meng Chang Chen¹,
Lawrence McNamee¹ &
Michel Melkanoff¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 339))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

194 Accesses
6 Citations

Abstract

The summary (statistics) data model described herein is an extension of the relational model. The concept of category (type or class) and the additivity property of some statistical functions form the basis of this model. In this approach category shields details of a database instance from users, and plays an important role in deriving new statistics data. Statistics data is a trinary tuple consisting of 〈statistical function, category, summary〉. The additivity property allows new statistics data to be generated without having to access the original database. Statistics data is meta-knowledge summarized by statistical functions of the detailed information typically stored in a conventional database. Unfortunately, deciding whether a category is derivable from a set of categories, in general, is NP-hard. The proposed generating category set can resolve the intractability problem of the category derivation. The derivation of new statistics data within a relation or on multi-relations is investigated, and the efficiency and correctness of the stored statistics data are guaranteed when the original database is updated or when new statistics data is obtained. Finally potential applications and security concerns applying to this model are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bates, D., Boral, H., and Dewitt, D., “A Framework for Research in Database Management for Statistical Analysis,” in Proceedings ACM SIGMOD (1982).
Google Scholar
Chen, M., “NP-hardness of Derivability Problem,” Internal Report, CS Department, UCLA, (1987).
Google Scholar
Chen, P., “The Entity-Relationship Model—Toward a Unified View of Data,” ACM Trans. on Database Systems (March 1976).
Google Scholar
Codd, E., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM (June 1970).
Google Scholar
Denning, D. and Schlorer, J., “Inference Controls for Statistical Databases,” IEEE Computer (July 1983).
Google Scholar
Denning, D., Nicholson, W., Sande, G., and Shoshani, A., “Research Topics in Statistical Database Management,” pp. 46–51 in Proceedings Second International Workshop on Statistical Database (1983).
Google Scholar
Fortunato, E., Rafanelli, M., Ricci, F., and Sebastio, A., “An Algebra for Statistical Data,” in Proceedings Third International Workshop on Statistical Database (1986).
Google Scholar
Fredman, M., “The Complexity of Maintaining an Array and Computing Its Partial Sums,” JACM (January 1981).
Google Scholar
Garey, M. and Johnson, D., Computers and Intractability, Freeman (1979).
Google Scholar
Ghosh, S., “SIAM: Statistics Information Access Method,” Tech. Rep. RJ4865, IBM, (1985).
Google Scholar
Ghosh, S., Data Base Organization for Data Management, 2nd edition, Academic Press (1986). Chapter 9.
Google Scholar
Ghosh, S., “Statistical Relational Tables for Statistical Database Management,” IEEE Trans. on Software Engineering (December 1986). Also published as IBM RJ4394, 1984.
Google Scholar
Ghosh, S., “Statistical Metadata: Linear Regression Analysis,” in Foundation of Data Organization, ed. S. Ghosh Y. Kambayashi K. Tanaka, Plenum Press (1987). Also published as IBM RJ4444, 1985.
Google Scholar
Ghosh, S., “Category Numerical Relational Operations for Statistical Database Management,” Tech. Rep. RJ5780, IBM, (1987).
Google Scholar
Hebrail, G., “A Model of Summaries for Very Large Database,” in Proceedings Third International Workshop on Statistical Databases (1986).
Google Scholar
Ikeda, H. and Kobayashi, Y., “Additional Facilities of a Conventional DBMS to Support Interactive Statistical Analysis,” in Proceedings First International Workshop on Statistical Database (1981).
Google Scholar
Klug, A., “Equivalence of Relational Algebra and Relational calculus Query Languages Having Aggregate Functions,” ACM JACM (July 1982).
Google Scholar
Koening, S. and Page, R., “A Transformational Framework for the Automatic Control of Derived Data,” in Proceedings VLDB (1981).
Google Scholar
Nwokogba, I. and Rowan, W., “A Model for an Integrated Statistical and Commercial Database,” in Proceedings COMPSAC (1984).
Google Scholar
Nwokogba, I. and Rowan, W., “A Statistical Parameterization Model for an Integrated Statistical and Commercial Database,” in Proceedings Computer Science and Statistics: the Interface (1986).
Google Scholar
Sato, H., “Handling Summary Information in a Database: Derivability,” in Proceedings ACM SIGMOD (1981).
Google Scholar
Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions,” in Proceedings VLDB (1982).
Google Scholar
Shoshani, A. and Wong, H., “Statistical and Scientific Databases Issues,” IEEE Trans. on Software Engineering (October 1985).
Google Scholar
Smith, J. and Smith, D., “Database Abstractions: Aggregation and Generalization,” ACM Trans. on Database Systems (June 1977).
Google Scholar
Srivastava, J. and Lum, V., “A Tree Based Statistics Access Method(TBSAM),” in Proceedings International Data Engineering (1988). Also published as IBM RJ5399, 1986.
Google Scholar
Thorey, T., Yang, D., and Fry, J., “A Logical Design Methodology for Relational Databases Using Extended ER Model,” ACM Computing Surveys (June 1986).
Google Scholar
Walker, A., “On Retrieval from a Small Version of a Large Data Base,” in Proceedings VLDB (1980).
Google Scholar
Yao, A., “On the Complexity of Maintaining Partial Sums,” SIAM J. Computer (May 1985).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, UCLA, USA
Meng Chang Chen, Lawrence McNamee & Michel Melkanoff

Authors

Meng Chang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence McNamee
View author publications
You can also search for this author in PubMed Google Scholar
Michel Melkanoff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maurizio Rafanelli John C. Klensin Per Svensson

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, M.C., McNamee, L., Melkanoff, M. (1989). A model of summary data and its applications in statistical databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027524

Download citation

DOI: https://doi.org/10.1007/BFb0027524
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-50575-4
Online ISBN: 978-3-540-46045-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics