Building Small Scale Models of Multi-Entity Databases By Clustering

Hébrail, Georges; Lechevallier, Yves

doi:10.1007/978-3-642-17103-1_37

Georges Hébrail²³ &
Yves Lechevallier²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organisation ((STUDIES CLASS))

1512 Accesses

Abstract

A framework is proposed to build small scale models of very large databases describing several entities and their relationships. In the first part, it is shown that the use of sampling is not a good solution when several entities are stored in a database. In the second part, a model is proposed which is based on clustering all entities of the database and storing aggregates on the clusters and on the relationships between the clusters. The last part of the paper discusses the different problems which are raised by this approach. Some solutions are proposed: in particular, the link with symbolic data analysis is established.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bock, H.-H., and Diday, E. (eds.) (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Data Analysis and Knowledge Organization, Springer Verlag, Heidelberg.
Google Scholar
Booch, G., Rumbaugh, J., and Jacobson, I. (1999). Unified Modeling Language User Guide, Object Technology Series, Addison-Wesley, New York.
Google Scholar
Chaudhuri, S. (1998). “An Overview of Query Optimization in Relational Systems,” in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43.
Google Scholar
Chaudhuri, S., Das, G., and Narasayya, V. (2001). “A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries,” Proceedings of ACM SIGMOD 2001.
Google Scholar
Chaudhuri, S., and Dayal, U., (1997). “An Overview of Data Warehousing and OLAP Technologies,” ACM SIGMOD Record.
Google Scholar
Chen, P. P., (1976). “The Entity-Relationship Model: Towards a Unified View of Data,” in ACM TODS, Vol. 1, No. 1.
Google Scholar
Cochran, W. G., (1977). Sampling Techniques, 3rd edition, John Wiley & Sons, New York.
MATH Google Scholar
Diday, E. (1988). “The Symbolic Approach in Clustering and Related Methods of Data Analysis: The Basic Choice,” in Classification and Related Methods of Data Analysis, H.-H. Bock, ed., Amsterdam: North Holland, pp. 673–684.
Google Scholar
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). “Chapter 10: Unsupervised Learning and Clustering,” in Pattern Classification, Wiley Interscience, New York.
Google Scholar
Gibbons, P. B., and Matias, Y. (1998). “New Sampling-Based Summary Statistics for Improving Approximate Query Answers,” in Proceedings of ACM SIGMOD 1998.
Google Scholar
Gibbons, P. B, Matias, Y., and Poosala, V. (1997). “Fast Incremental Maintenance of Approximate Histograms,” Proceedings of the 23rd International Conference on Very Large Data Bases.
Google Scholar
Hou, W. (1999). “A Framework for Statistical Data Mining with Summary Tables,” in Proceeding of 11th International Conference on Scientific and Statistical Database Management, Columbus, Ohio.
Google Scholar
Kohonen T. (1995). Self-Organizing Maps, Springer, Berlin.
Book Google Scholar
Ng, W. K., Ravishankar, C. V. (1995). “Relational Database Compression Using Augmented Vector Quantization,” in Proceedings of the 11th Conference on Data Engineering, Taiwan.
Google Scholar
Olken, F. (1993). Random Sampling from Databases, Ph.D. Dissertation, University of California at Berkeley, USA.
Google Scholar
Ozsoyoglu, G., and Ozsoyoglu, Z. M. (1985). “Statistical Database Query Languages,” IEEE Transactions on Software Engineering, 12, 1071–1081.
Article Google Scholar
Poosala, V., and Ganti, V. (1999). “Fast Approximate Answers to Aggregate Queries on a Data Cube,” in 11th International Conferemce on Scientific and Statistical Database Management, Cleveland.
Google Scholar
Shoshani, A. (1982). “Statistical Databases, Characteristics, Problems and Some Solutions,” in Proceedings of the 1982 Conference on Very Large Data Bases, VLDB.
Google Scholar
Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G. (2000). “The Implementation and Performance of Compressed Databases,” SIGMOD Record, 29, 55–67.
Article Google Scholar

Download references

Author information

Authors and Affiliations

ENST Paris, France
Georges Hébrail
INRIA - Rocquencourt, France
Yves Lechevallier

Authors

Georges Hébrail
View author publications
You can also search for this author in PubMed Google Scholar
Yves Lechevallier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Leanna House Institute of Statistics and Decision Sciences, Duke University, 27708, Durham, NC, USA
David Banks
Department of Mathematics, Illinois Institute of Technology, 10 West 32nd Street, 60616-3793, Chicago, IL, USA
Frederick R. McMorris
Faculty of Management, Rutgers University, 180 University Avenue, 07102-1895, Newark, NJ, USA
Phipps Arabie
Institute of Decision Theory, University of Karlsruhe, Kaiserstr. 12, 76128, Karlsruhe, Germany
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hébrail, G., Lechevallier, Y. (2004). Building Small Scale Models of Multi-Entity Databases By Clustering. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17103-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-17103-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22014-5
Online ISBN: 978-3-642-17103-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics