Advertisement

Symbolic Data Analysis Approach to Clustering Large Datasets

  • Simona Korenjak-Černe
  • Vladimir Batagelj
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The paper builds on the representation of units/clusters with a special type of symbolic objects that consist of distributions of variables. Two compatible clustering methods are developed: the leaders method, that reduces a large dataset to a smaller set of symbolic objects (clusters) on which a hierarchical clustering method is applied to reveal its internal structure. The proposed approach is illustrated on USDA Nutrient Database.

Keywords

Large Dataset Initial Cluster Symbolic Data Optimal Leader Hierarchical Cluster Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BATAGELJ, V.: Generalized Ward and related clustering problems. (H.H. Bock, ed.: Classification and related methods of data analysis ), North-Holland, Amsterdam, 1988, 67–74.Google Scholar
  2. BOCK, H.-H. (2000): Symbolic Data. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.CrossRefGoogle Scholar
  3. BOCK, H.-H. and DIDAY, E. (2000): Symbolic Objects. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.CrossRefGoogle Scholar
  4. DIDAY, E. (1979): Optimisation en classification automatique, Tome 1.,2. INRIA, Rocquencourt (in French).zbMATHGoogle Scholar
  5. DOUGHERTY, J., KOHAVI, R., and SAHAMI, M. (1995): Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann. http://citeseer.nj.nec.com/dougherty95supervised.html
  6. HARTIGAN, J.A. (1975): Clustering Algorithms. Wiley, New York.zbMATHGoogle Scholar
  7. KORENJAK-ČERNE, S. and BATAGELJ, V. (1998): Clustering large datasets of mixed units. In: Rizzi, A., Vichi, M., Bock, H.-H. (Eds.): Advances in Data Science and Classification. Springer.Google Scholar
  8. VERDE, R., DE CARVALHO, F.A.T. and LECHEVALLIER, Y. (2000): A Dynamic Clustering Algorithm for Multi-nominal Data. In: Kiers, H.A.L., Ras-son, J.-P., Groenen, P.J.F., Schader, M. (Eds.): Data Analysis, Classification, and Related Methods. Springer.Google Scholar
  9. USDA Nutrient Database for Standard Reference, Release 14. U.S. Department of Agriculture, Agricultural Research Service. 2001: Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Simona Korenjak-Černe
    • 1
    • 2
  • Vladimir Batagelj
    • 2
    • 3
  1. 1.Faculty of EconomicsUniversity of LjubljanaLjubljanaSlovenia
  2. 2.Department of TCSIMFM LjubljanaLjubljanaSlovenia
  3. 3.Department of MathematicsUniversity of Ljubljana, FMFLjubljanaSlovenia

Personalised recommendations