Skip to main content

Abstract

Starting from model-based clustering simple techniques based on cores are proposed. A core is a dense region in the high-dimensional space that, for example, can be represented by its most typical observation, by its centroid or, more generally, by assigning weight functions to the observations. Well-known cluster analysis techniques like the partitional K-Means or the hierarchical Ward are useful for discovering partitions or hierarchies in the underlying data. Here these methods are generalised in two ways, firstly by using weighted observations and secondly by allowing different volumes of clusters. Then a more general K-Means approach based on pair-wise distances is recommended. Simulation studies are carried out in order to compare the new clustering techniques with the well-known ones. Moreover, a successful application is presented. Here the task is to discover clusters with quite different number of observations in a high-dimensional space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BANFIELD, J.D. and RAFTERY, A.E. (1993): Model-Based Gaussian and Non-Gaussian Clustering. Biometrics, 49, 803–821.

    Article  MathSciNet  MATH  Google Scholar 

  • BREIMAN, L. (1996): Bias, Variance, and Arcing Classifiers. Technical Report, 460. Statistical Department, University of California, Berkeley.

    Google Scholar 

  • FRALEY C. (1998): Algorithms for Model-Based Gaussian Hierarchical Clustering. Siam J. Sci. Comput., 20, No.1, 270–281.

    Article  MathSciNet  MATH  Google Scholar 

  • FRALEY, C. and RAFTERY, A.E. (2002): Model-based Clustering, Discriminant Analysis, and Density Estimation. JASA, 97, No. 458, 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • GORDON, A. D. (1999): Classification. Chapman & Hall/CRC, London.

    MATH  Google Scholar 

  • GORDON, A. D. and DE CATA, A. (1988): Stability and Influence in Sum of Squares Clustering. Metron, 46, 347–360.

    Google Scholar 

  • GUHA, S., RASTOGI, R., and SHIM, K. (1998): CURE: An Efficient Clustering Algorithm for Large Databases. In: Proc. SIGMOD. ACM, Seattle, 73–84.

    Google Scholar 

  • HAMPLEL, F. (1968): Contributions to the Theory of Robust Estimation. Ph.D. thesis, University of California, Berkeley.

    Google Scholar 

  • HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • JAIN, A.K. and DUBES, R.C. (1988): Algorithms for Clustering Data. Prentice Hall, New Jersey.

    MATH  Google Scholar 

  • KAUFMAN, L. and ROUSSEEUW, P.J. (1990): Finding Groups in Data. Wiley, New York.

    Book  Google Scholar 

  • MACQUEEN, J.B. (1967): Some Methods for Classification and Analysis of Multivariate Observations. In: L. Lecam and J. Neyman (Eds.): Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 1. Univ. California Press, Berkeley, 281–297.

    Google Scholar 

  • MARDIA, K.V., KENT, J.T., and BIBBY, J.M. (1979): Multivariate Analysis. Academic Press, London.

    MATH  Google Scholar 

  • MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.

    MATH  Google Scholar 

  • MUCHA, H.-J. (1995). XClust: Clustering in an Interactive Way. In: W. HärdIe, S. Klinke, and B.A. Turlach (Eds.): XploRe: An Interactive Statistical Computing Environment. Springer, New York, 141–168.

    Chapter  Google Scholar 

  • MUCHA, H.-J., BARTEL, H.-G., and DOLATA, J. (2002): Exploring Roman Brick and Tile by Cluster Analysis with Validation of Results. In: W. Gaul and G. Ritter (Eds.): Classification, Automation, and New Media. Springer, Heidelberg, 471–478.

    Chapter  Google Scholar 

  • RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. JASA, 66, 846–850.

    Article  Google Scholar 

  • SPATH, H. (1985): Cluster Dissection and Analysis. Ellis Horwood, Chichester.

    Google Scholar 

  • WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.

    Google Scholar 

  • ZHANG, T., RAMAKRISHNAN, R., and LIVNY, M. (1996): Birch: An efficient clustering method for very large databases. In: Proc. SIGMOD. ACM Press, Montreal, 103–114.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mucha, HJ., Bartel, HG., Dolata, J. (2003). Core-Based Clustering Techniques. In: Schader, M., Gaul, W., Vichi, M. (eds) Between Data Science and Applied Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18991-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18991-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40354-8

  • Online ISBN: 978-3-642-18991-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics