Abstract
Standard clustering methods do not handle truly large data sets and fail to take into account multilevel data structures. This work outlines an approach to clustering that integrates the Kohonen Self-Organizing Map (SOM) with other clustering methods. Moreover, in order to take into account multilevel structures, a statistical model is proposed, in which a mixture of distributions may have mixing coefficients depending on higher-level variables. Thus, in a first step, the SOM provides a substantial data reduction, whereby a variety of ascending and divisive clustering algorithms becomes accessible. As a second step, statistical modeling provides both a direct means to treat multilevel structures and a framework for model-based clustering. The interplay of these two steps is illustrated on an example of nutritional data from a multicenter study on nutrition and cancer, known as EPIC.
Keywords and phrases
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ambroise, C., Seźe, G., Badran, F., and Thiria, S. (2000). Hierarchical clustering of self-organizing maps for cloud classification, Neurocomputing, 30, 47–52.
Bock, H. H. (1993). Classification and clustering: Problems for the future, In New Approaches in Classification and Data Analysis (Eds., E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and B. Burtschy), pp. 3–24, Springer-Verlag, Heidelberg.
Bock, H. H., and Diday, E. (Eds.) (1999). Analysis of symbolic data, exploratory methods for extracting statistical information from complex data, In Studies in Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Heidelberg.
Chavent, M. (1998). A monothetic clustering algorithm, Pattern Recognition Letters, 19, 989–996.
Ciampi, A., and Lechevallier, Y. (1995). Designing neural networks from statistical models: A new approach to data exploration, In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 45–50, AAAI Press, Menlo Park, California.
Ciampi, A., and Lechevallier, Y. (1997). Statistical models as building blocks of neural networks, Communications in Statistics, 26, 991–1009.
Elemento, O. (1999). Apport de l’analyse en composantes principales pour l’initialisation et la validation de cartes de Kohonen, In Septièmes Journées de la Société Francophone de Classification, Nancy, France.
Gordon, A. D. (1981). Classification: Methods for the Exploratory Analysis of Multivariate Data, Chapman & Hall, London, UK.
Hébrail, G., and Debregeas, A. (1998). Interactive interpretation of Kohonen maps applied to curves, In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 179–183, AAAI press, Menlo Park, California.
Murthag, F. (1995). Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering, Pattern Recognition Letters, 16, 399–408.
Noirhomme-Fraiture, M., and Rouard, M. (1998). Representation of subpopulations and correlation with Zoom Star, In Proceedings of NNTS’98, Sorrento, Italy.
Thiria, S., Lechevallier, Y., Gascuel, O., and Canu, S. (1997). Statistique et Méthodes Neuronales, Dunod, Paris.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Birkhäuser Boston
About this chapter
Cite this chapter
Lechevallier, Y., Ciampi, A. (2007). Multilevel Clustering for Large Databases. In: Auget, JL., Balakrishnan, N., Mesbah, M., Molenberghs, G. (eds) Advances in Statistical Methods for the Health Sciences. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-4542-7_17
Download citation
DOI: https://doi.org/10.1007/978-0-8176-4542-7_17
Publisher Name: Birkhäuser Boston
Print ISBN: 978-0-8176-4368-3
Online ISBN: 978-0-8176-4542-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)