Symbolic Data Analysis Approach to Clustering Large Datasets

Korenjak-Černe, Simona; Batagelj, Vladimir

doi:10.1007/978-3-642-56181-8_35

Simona Korenjak-Černe^7,8 &
Vladimir Batagelj^8,9

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1764 Accesses
3 Citations

Abstract

The paper builds on the representation of units/clusters with a special type of symbolic objects that consist of distributions of variables. Two compatible clustering methods are developed: the leaders method, that reduces a large dataset to a smaller set of symbolic objects (clusters) on which a hierarchical clustering method is applied to reveal its internal structure. The proposed approach is illustrated on USDA Nutrient Database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BATAGELJ, V.: Generalized Ward and related clustering problems. (H.H. Bock, ed.: Classification and related methods of data analysis ), North-Holland, Amsterdam, 1988, 67–74.
Google Scholar
BOCK, H.-H. (2000): Symbolic Data. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.
Chapter Google Scholar
BOCK, H.-H. and DIDAY, E. (2000): Symbolic Objects. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.
Chapter Google Scholar
DIDAY, E. (1979): Optimisation en classification automatique, Tome 1.,2. INRIA, Rocquencourt (in French).
MATH Google Scholar
DOUGHERTY, J., KOHAVI, R., and SAHAMI, M. (1995): Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann. http://citeseer.nj.nec.com/dougherty95supervised.html
HARTIGAN, J.A. (1975): Clustering Algorithms. Wiley, New York.
MATH Google Scholar
KORENJAK-ČERNE, S. and BATAGELJ, V. (1998): Clustering large datasets of mixed units. In: Rizzi, A., Vichi, M., Bock, H.-H. (Eds.): Advances in Data Science and Classification. Springer.
Google Scholar
VERDE, R., DE CARVALHO, F.A.T. and LECHEVALLIER, Y. (2000): A Dynamic Clustering Algorithm for Multi-nominal Data. In: Kiers, H.A.L., Ras-son, J.-P., Groenen, P.J.F., Schader, M. (Eds.): Data Analysis, Classification, and Related Methods. Springer.
Google Scholar
USDA Nutrient Database for Standard Reference, Release 14. U.S. Department of Agriculture, Agricultural Research Service. 2001: Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp.

Download references

Author information

Authors and Affiliations

Faculty of Economics, University of Ljubljana, Kardeljeva ploščad 17, 1 101, Ljubljana, Slovenia
Simona Korenjak-Černe
Department of TCS, IMFM Ljubljana, Jadranska ulica 19, 1 000, Ljubljana, Slovenia
Simona Korenjak-Černe & Vladimir Batagelj
Department of Mathematics, University of Ljubljana, FMF, Jadranska ulica 19, 1 000, Ljubljana, Slovenia
Vladimir Batagelj

Authors

Simona Korenjak-Černe
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Batagelj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wroclaw University of Economics, ul. Komandorska 118/120, 53-345, Wroclaw, Poland
Krzysztof Jajuga
Department of Statistics, Cracow University of Economics, ul. Rakowicka 27, 31-510, Cracow, Poland
Andrzej Sokołowski
Institute of Statistics, Technical University of Aachen, Wuellnerstrasse 3, 52056, Aachen, Germany
Hans-Hermann Bock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Korenjak-Černe, S., Batagelj, V. (2002). Symbolic Data Analysis Approach to Clustering Large Datasets. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-56181-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics