Skip to main content

A model of summary data and its applications in statistical databases

  • Open Panels
  • Chapter
  • First Online:
Statistical and Scientific Database Management (SSDBM 1988)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 339))

Abstract

The summary (statistics) data model described herein is an extension of the relational model. The concept of category (type or class) and the additivity property of some statistical functions form the basis of this model. In this approach category shields details of a database instance from users, and plays an important role in deriving new statistics data. Statistics data is a trinary tuple consisting of 〈statistical function, category, summary〉. The additivity property allows new statistics data to be generated without having to access the original database. Statistics data is meta-knowledge summarized by statistical functions of the detailed information typically stored in a conventional database. Unfortunately, deciding whether a category is derivable from a set of categories, in general, is NP-hard. The proposed generating category set can resolve the intractability problem of the category derivation. The derivation of new statistics data within a relation or on multi-relations is investigated, and the efficiency and correctness of the stored statistics data are guaranteed when the original database is updated or when new statistics data is obtained. Finally potential applications and security concerns applying to this model are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bates, D., Boral, H., and Dewitt, D., “A Framework for Research in Database Management for Statistical Analysis,” in Proceedings ACM SIGMOD (1982).

    Google Scholar 

  2. Chen, M., “NP-hardness of Derivability Problem,” Internal Report, CS Department, UCLA, (1987).

    Google Scholar 

  3. Chen, P., “The Entity-Relationship Model—Toward a Unified View of Data,” ACM Trans. on Database Systems (March 1976).

    Google Scholar 

  4. Codd, E., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM (June 1970).

    Google Scholar 

  5. Denning, D. and Schlorer, J., “Inference Controls for Statistical Databases,” IEEE Computer (July 1983).

    Google Scholar 

  6. Denning, D., Nicholson, W., Sande, G., and Shoshani, A., “Research Topics in Statistical Database Management,” pp. 46–51 in Proceedings Second International Workshop on Statistical Database (1983).

    Google Scholar 

  7. Fortunato, E., Rafanelli, M., Ricci, F., and Sebastio, A., “An Algebra for Statistical Data,” in Proceedings Third International Workshop on Statistical Database (1986).

    Google Scholar 

  8. Fredman, M., “The Complexity of Maintaining an Array and Computing Its Partial Sums,” JACM (January 1981).

    Google Scholar 

  9. Garey, M. and Johnson, D., Computers and Intractability, Freeman (1979).

    Google Scholar 

  10. Ghosh, S., “SIAM: Statistics Information Access Method,” Tech. Rep. RJ4865, IBM, (1985).

    Google Scholar 

  11. Ghosh, S., Data Base Organization for Data Management, 2nd edition, Academic Press (1986). Chapter 9.

    Google Scholar 

  12. Ghosh, S., “Statistical Relational Tables for Statistical Database Management,” IEEE Trans. on Software Engineering (December 1986). Also published as IBM RJ4394, 1984.

    Google Scholar 

  13. Ghosh, S., “Statistical Metadata: Linear Regression Analysis,” in Foundation of Data Organization, ed. S. Ghosh Y. Kambayashi K. Tanaka, Plenum Press (1987). Also published as IBM RJ4444, 1985.

    Google Scholar 

  14. Ghosh, S., “Category Numerical Relational Operations for Statistical Database Management,” Tech. Rep. RJ5780, IBM, (1987).

    Google Scholar 

  15. Hebrail, G., “A Model of Summaries for Very Large Database,” in Proceedings Third International Workshop on Statistical Databases (1986).

    Google Scholar 

  16. Ikeda, H. and Kobayashi, Y., “Additional Facilities of a Conventional DBMS to Support Interactive Statistical Analysis,” in Proceedings First International Workshop on Statistical Database (1981).

    Google Scholar 

  17. Klug, A., “Equivalence of Relational Algebra and Relational calculus Query Languages Having Aggregate Functions,” ACM JACM (July 1982).

    Google Scholar 

  18. Koening, S. and Page, R., “A Transformational Framework for the Automatic Control of Derived Data,” in Proceedings VLDB (1981).

    Google Scholar 

  19. Nwokogba, I. and Rowan, W., “A Model for an Integrated Statistical and Commercial Database,” in Proceedings COMPSAC (1984).

    Google Scholar 

  20. Nwokogba, I. and Rowan, W., “A Statistical Parameterization Model for an Integrated Statistical and Commercial Database,” in Proceedings Computer Science and Statistics: the Interface (1986).

    Google Scholar 

  21. Sato, H., “Handling Summary Information in a Database: Derivability,” in Proceedings ACM SIGMOD (1981).

    Google Scholar 

  22. Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions,” in Proceedings VLDB (1982).

    Google Scholar 

  23. Shoshani, A. and Wong, H., “Statistical and Scientific Databases Issues,” IEEE Trans. on Software Engineering (October 1985).

    Google Scholar 

  24. Smith, J. and Smith, D., “Database Abstractions: Aggregation and Generalization,” ACM Trans. on Database Systems (June 1977).

    Google Scholar 

  25. Srivastava, J. and Lum, V., “A Tree Based Statistics Access Method(TBSAM),” in Proceedings International Data Engineering (1988). Also published as IBM RJ5399, 1986.

    Google Scholar 

  26. Thorey, T., Yang, D., and Fry, J., “A Logical Design Methodology for Relational Databases Using Extended ER Model,” ACM Computing Surveys (June 1986).

    Google Scholar 

  27. Walker, A., “On Retrieval from a Small Version of a Large Data Base,” in Proceedings VLDB (1980).

    Google Scholar 

  28. Yao, A., “On the Complexity of Maintaining Partial Sums,” SIAM J. Computer (May 1985).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maurizio Rafanelli John C. Klensin Per Svensson

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chen, M.C., McNamee, L., Melkanoff, M. (1989). A model of summary data and its applications in statistical databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027524

Download citation

  • DOI: https://doi.org/10.1007/BFb0027524

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-50575-4

  • Online ISBN: 978-3-540-46045-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics