Symbolic Data Analysis Approach to Clustering Large Datasets
The paper builds on the representation of units/clusters with a special type of symbolic objects that consist of distributions of variables. Two compatible clustering methods are developed: the leaders method, that reduces a large dataset to a smaller set of symbolic objects (clusters) on which a hierarchical clustering method is applied to reveal its internal structure. The proposed approach is illustrated on USDA Nutrient Database.
KeywordsLarge Dataset Initial Cluster Symbolic Data Optimal Leader Hierarchical Cluster Method
Unable to display preview. Download preview PDF.
- BATAGELJ, V.: Generalized Ward and related clustering problems. (H.H. Bock, ed.: Classification and related methods of data analysis ), North-Holland, Amsterdam, 1988, 67–74.Google Scholar
- DOUGHERTY, J., KOHAVI, R., and SAHAMI, M. (1995): Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann. http://citeseer.nj.nec.com/dougherty95supervised.html
- KORENJAK-ČERNE, S. and BATAGELJ, V. (1998): Clustering large datasets of mixed units. In: Rizzi, A., Vichi, M., Bock, H.-H. (Eds.): Advances in Data Science and Classification. Springer.Google Scholar
- VERDE, R., DE CARVALHO, F.A.T. and LECHEVALLIER, Y. (2000): A Dynamic Clustering Algorithm for Multi-nominal Data. In: Kiers, H.A.L., Ras-son, J.-P., Groenen, P.J.F., Schader, M. (Eds.): Data Analysis, Classification, and Related Methods. Springer.Google Scholar
- USDA Nutrient Database for Standard Reference, Release 14. U.S. Department of Agriculture, Agricultural Research Service. 2001: Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp.