Skip to main content

Building Small Scale Models of Multi-Entity Databases By Clustering

  • Conference paper
Classification, Clustering, and Data Mining Applications

Abstract

A framework is proposed to build small scale models of very large databases describing several entities and their relationships. In the first part, it is shown that the use of sampling is not a good solution when several entities are stored in a database. In the second part, a model is proposed which is based on clustering all entities of the database and storing aggregates on the clusters and on the relationships between the clusters. The last part of the paper discusses the different problems which are raised by this approach. Some solutions are proposed: in particular, the link with symbolic data analysis is established.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bock, H.-H., and Diday, E. (eds.) (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Data Analysis and Knowledge Organization, Springer Verlag, Heidelberg.

    Google Scholar 

  2. Booch, G., Rumbaugh, J., and Jacobson, I. (1999). Unified Modeling Language User Guide, Object Technology Series, Addison-Wesley, New York.

    Google Scholar 

  3. Chaudhuri, S. (1998). “An Overview of Query Optimization in Relational Systems,” in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43.

    Google Scholar 

  4. Chaudhuri, S., Das, G., and Narasayya, V. (2001). “A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries,” Proceedings of ACM SIGMOD 2001.

    Google Scholar 

  5. Chaudhuri, S., and Dayal, U., (1997). “An Overview of Data Warehousing and OLAP Technologies,” ACM SIGMOD Record.

    Google Scholar 

  6. Chen, P. P., (1976). “The Entity-Relationship Model: Towards a Unified View of Data,” in ACM TODS, Vol. 1, No. 1.

    Google Scholar 

  7. Cochran, W. G., (1977). Sampling Techniques, 3rd edition, John Wiley & Sons, New York.

    MATH  Google Scholar 

  8. Diday, E. (1988). “The Symbolic Approach in Clustering and Related Methods of Data Analysis: The Basic Choice,” in Classification and Related Methods of Data Analysis, H.-H. Bock, ed., Amsterdam: North Holland, pp. 673–684.

    Google Scholar 

  9. Duda, R. O., Hart, P. E., and Stork, D. G. (2001). “Chapter 10: Unsupervised Learning and Clustering,” in Pattern Classification, Wiley Interscience, New York.

    Google Scholar 

  10. Gibbons, P. B., and Matias, Y. (1998). “New Sampling-Based Summary Statistics for Improving Approximate Query Answers,” in Proceedings of ACM SIGMOD 1998.

    Google Scholar 

  11. Gibbons, P. B, Matias, Y., and Poosala, V. (1997). “Fast Incremental Maintenance of Approximate Histograms,” Proceedings of the 23rd International Conference on Very Large Data Bases.

    Google Scholar 

  12. Hou, W. (1999). “A Framework for Statistical Data Mining with Summary Tables,” in Proceeding of 11th International Conference on Scientific and Statistical Database Management, Columbus, Ohio.

    Google Scholar 

  13. Kohonen T. (1995). Self-Organizing Maps, Springer, Berlin.

    Book  Google Scholar 

  14. Ng, W. K., Ravishankar, C. V. (1995). “Relational Database Compression Using Augmented Vector Quantization,” in Proceedings of the 11th Conference on Data Engineering, Taiwan.

    Google Scholar 

  15. Olken, F. (1993). Random Sampling from Databases, Ph.D. Dissertation, University of California at Berkeley, USA.

    Google Scholar 

  16. Ozsoyoglu, G., and Ozsoyoglu, Z. M. (1985). “Statistical Database Query Languages,” IEEE Transactions on Software Engineering, 12, 1071–1081.

    Article  Google Scholar 

  17. Poosala, V., and Ganti, V. (1999). “Fast Approximate Answers to Aggregate Queries on a Data Cube,” in 11th International Conferemce on Scientific and Statistical Database Management, Cleveland.

    Google Scholar 

  18. Shoshani, A. (1982). “Statistical Databases, Characteristics, Problems and Some Solutions,” in Proceedings of the 1982 Conference on Very Large Data Bases, VLDB.

    Google Scholar 

  19. Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G. (2000). “The Implementation and Performance of Compressed Databases,” SIGMOD Record, 29, 55–67.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hébrail, G., Lechevallier, Y. (2004). Building Small Scale Models of Multi-Entity Databases By Clustering. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17103-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17103-1_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22014-5

  • Online ISBN: 978-3-642-17103-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics