Advertisement

A model of summary data and its applications in statistical databases

  • Meng Chang Chen
  • Lawrence McNamee
  • Michel Melkanoff
Open Panels
Part of the Lecture Notes in Computer Science book series (LNCS, volume 339)

Abstract

The summary (statistics) data model described herein is an extension of the relational model. The concept of category (type or class) and the additivity property of some statistical functions form the basis of this model. In this approach category shields details of a database instance from users, and plays an important role in deriving new statistics data. Statistics data is a trinary tuple consisting of 〈statistical function, category, summary〉. The additivity property allows new statistics data to be generated without having to access the original database. Statistics data is meta-knowledge summarized by statistical functions of the detailed information typically stored in a conventional database. Unfortunately, deciding whether a category is derivable from a set of categories, in general, is NP-hard. The proposed generating category set can resolve the intractability problem of the category derivation. The derivation of new statistics data within a relation or on multi-relations is investigated, and the efficiency and correctness of the stored statistics data are guaranteed when the original database is updated or when new statistics data is obtained. Finally potential applications and security concerns applying to this model are also discussed.

Keywords

Statistical Function Relation Scheme Relational Algebra Database Management System Statistical Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Bates, D., Boral, H., and Dewitt, D., “A Framework for Research in Database Management for Statistical Analysis,” in Proceedings ACM SIGMOD (1982).Google Scholar
  2. [2]
    Chen, M., “NP-hardness of Derivability Problem,” Internal Report, CS Department, UCLA, (1987).Google Scholar
  3. [3]
    Chen, P., “The Entity-Relationship Model—Toward a Unified View of Data,” ACM Trans. on Database Systems (March 1976).Google Scholar
  4. [4]
    Codd, E., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM (June 1970).Google Scholar
  5. [5]
    Denning, D. and Schlorer, J., “Inference Controls for Statistical Databases,” IEEE Computer (July 1983).Google Scholar
  6. [6]
    Denning, D., Nicholson, W., Sande, G., and Shoshani, A., “Research Topics in Statistical Database Management,” pp. 46–51 in Proceedings Second International Workshop on Statistical Database (1983).Google Scholar
  7. [7]
    Fortunato, E., Rafanelli, M., Ricci, F., and Sebastio, A., “An Algebra for Statistical Data,” in Proceedings Third International Workshop on Statistical Database (1986).Google Scholar
  8. [8]
    Fredman, M., “The Complexity of Maintaining an Array and Computing Its Partial Sums,” JACM (January 1981).Google Scholar
  9. [9]
    Garey, M. and Johnson, D., Computers and Intractability, Freeman (1979).Google Scholar
  10. [10]
    Ghosh, S., “SIAM: Statistics Information Access Method,” Tech. Rep. RJ4865, IBM, (1985).Google Scholar
  11. [11]
    Ghosh, S., Data Base Organization for Data Management, 2nd edition, Academic Press (1986). Chapter 9.Google Scholar
  12. [12]
    Ghosh, S., “Statistical Relational Tables for Statistical Database Management,” IEEE Trans. on Software Engineering (December 1986). Also published as IBM RJ4394, 1984.Google Scholar
  13. [13]
    Ghosh, S., “Statistical Metadata: Linear Regression Analysis,” in Foundation of Data Organization, ed. S. Ghosh Y. Kambayashi K. Tanaka, Plenum Press (1987). Also published as IBM RJ4444, 1985.Google Scholar
  14. [14]
    Ghosh, S., “Category Numerical Relational Operations for Statistical Database Management,” Tech. Rep. RJ5780, IBM, (1987).Google Scholar
  15. [15]
    Hebrail, G., “A Model of Summaries for Very Large Database,” in Proceedings Third International Workshop on Statistical Databases (1986).Google Scholar
  16. [16]
    Ikeda, H. and Kobayashi, Y., “Additional Facilities of a Conventional DBMS to Support Interactive Statistical Analysis,” in Proceedings First International Workshop on Statistical Database (1981).Google Scholar
  17. [17]
    Klug, A., “Equivalence of Relational Algebra and Relational calculus Query Languages Having Aggregate Functions,” ACM JACM (July 1982).Google Scholar
  18. [18]
    Koening, S. and Page, R., “A Transformational Framework for the Automatic Control of Derived Data,” in Proceedings VLDB (1981).Google Scholar
  19. [19]
    Nwokogba, I. and Rowan, W., “A Model for an Integrated Statistical and Commercial Database,” in Proceedings COMPSAC (1984).Google Scholar
  20. [20]
    Nwokogba, I. and Rowan, W., “A Statistical Parameterization Model for an Integrated Statistical and Commercial Database,” in Proceedings Computer Science and Statistics: the Interface (1986).Google Scholar
  21. [21]
    Sato, H., “Handling Summary Information in a Database: Derivability,” in Proceedings ACM SIGMOD (1981).Google Scholar
  22. [22]
    Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions,” in Proceedings VLDB (1982).Google Scholar
  23. [23]
    Shoshani, A. and Wong, H., “Statistical and Scientific Databases Issues,” IEEE Trans. on Software Engineering (October 1985).Google Scholar
  24. [24]
    Smith, J. and Smith, D., “Database Abstractions: Aggregation and Generalization,” ACM Trans. on Database Systems (June 1977).Google Scholar
  25. [25]
    Srivastava, J. and Lum, V., “A Tree Based Statistics Access Method(TBSAM),” in Proceedings International Data Engineering (1988). Also published as IBM RJ5399, 1986.Google Scholar
  26. [26]
    Thorey, T., Yang, D., and Fry, J., “A Logical Design Methodology for Relational Databases Using Extended ER Model,” ACM Computing Surveys (June 1986).Google Scholar
  27. [27]
    Walker, A., “On Retrieval from a Small Version of a Large Data Base,” in Proceedings VLDB (1980).Google Scholar
  28. [28]
    Yao, A., “On the Complexity of Maintaining Partial Sums,” SIAM J. Computer (May 1985).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Meng Chang Chen
    • 1
  • Lawrence McNamee
    • 1
  • Michel Melkanoff
    • 1
  1. 1.Computer Science DepartmentUCLAUSA

Personalised recommendations