Abstract
Standard clustering methods do not handle truly large data sets and fail to take into account multi-level data structures. This work outlines an approach to clustering that integrates the Kohonen Self Organizing Map (SOM) with other clustering methods. Moreover, in order to take into account multi-level structures, a statistical model is proposed, in which a mixture of distributions may have mixing coefficients depending on higher-level variables. Thus, in a first step, the SOM provides a substantial data reduction, whereby a variety of ascending and divisive clustering algorithms become accessible. As a second step, statistical modelling provides both a direct means to treat multi-level structures and a framework for model-based clustering. The interplay of these two steps is illustrated on an example of nutritional data from a multi-center study on nutrition and cancer, known as EPIC.
Chapter PDF
Similar content being viewed by others
References
Ambroise, C., Seze, G., Badran, F., Thiria, S.: Hierarchical clustering of Self-Organizing Maps for cloud classification. Neurocomputing, 30, (2000) 47–52.
Bock, H. H.: Classification and clustering: Problems for the future. In: E. Diday et al. (eds.): New Approaches in Classification and Data Analysis. Springer, Heidelberg (1993), 3–24.
Bock, H. H.: Clustering and neural networks. In: A. Rizzi et al. (Eds.): Advances in Data Science and Classification. Springer, Heidelberg (1998), 265–278.
Bock, H. H., Diday, E. (Eds.): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data. Studies in Classification, Data Analysis and Knowledge Organization, Springer Heidelberg (1999).
Chavent, M.: A monothetic clustering algorithm. Pattern Recognition Letters, 19, (1998) 989–996.
Ciampi, A., Lechevallier, Y.: Designing neural networks from statistical models: A new approach to data exploration. Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining. AAAI press, Menlo Park (1995) 45–50.
Ciampi, A., Lechevallier, Y.: Statistical Models as Building Blocks of Neural Networks. Communications in Statistics, 26(4), (1997) 991–1009.
Elemento, O.: Apport de l’analyse en composantes principales pour l’initialisation et la validation de cartes de Kohonen. Journées de la Société Francophone de Classi fication, Nancy (1999).
Gordon, A. D.: Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman & Hall, London (1981).
Hébrail, G., Debregeas, A.: Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. AAAI press, Menlo Park (1998) 179–183.
Kohonen, T.: Self-Organizing Maps. Springer New York (1997).
Murthag, F.: Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering. Patterns Recognition Letters, 16, (1995) 399–408.
Noirhomme-Fraiture, M., Rouard, M.: Representation of Sub-Populations and Correlation with Zoom Star. Proceedings of NNTS’98, Sorrento (1998).
Thiria, S., Lechevallier, Y., Gascuel, O., Canu, S.: Statistique et méthodes neuronales. Dunod, Paris, (1997).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ciampi, A., Lechevallier, Y. (2000). Clustering Large, Multi-level Data Sets: An Approach Based on Kohonen Self Organizing Maps. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_36
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive