Skip to main content

Finding Consistent Clusters in Data Partitions

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2096))

Abstract

Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm – voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Bischof and A. Leonardis. Vector quantization and minimum description length. In Sameer Singh, editor, International Conference on Advances on Pattern Recognition, pages 355–364. Springer Verlag, 1999.

    Google Scholar 

  2. J. Buhmann and M. Held. Unsupervised learning without overfitting: Empirical risk approximation as an induction principle for reliable clustering. In Sameer Singh, editor, International Conference on Advances in Pattern Recognition, pages 167–176. Springer Verlag, 1999.

    Google Scholar 

  3. T. Dietterich. Ensemble methods in machine learning. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer, 2000.

    Chapter  Google Scholar 

  4. Y. El-Sonbaty and M. A. Ismail. On-line hierarchical clustering. Pattern Recognition Letters, pages 1285–1291, 1998.

    Google Scholar 

  5. A. L. Fred and J. Leitão. Clustering under a hypothesis of smooth dissimilarity increments. In Proc. of the 15th Int’l Conference on Pattern Recognition, volume 2, pages 190–194, Barcelona, 2000.

    Article  Google Scholar 

  6. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.

    Google Scholar 

  7. A.K. Jain, R. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 22:4–37, January 2000.

    Article  Google Scholar 

  8. A.K. Jain, M. N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.

    Article  Google Scholar 

  9. J. Kittler. Pattern classification: Fusion of information. In S. Singh, editor, Int. Conf. on Advances in Pattern Recognition, pages 13–22, Plymouth, UK, November 1998. Springer.

    Google Scholar 

  10. J. Kittler, M. Hatef, R.P Duin, and J. Matas. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.

    Article  Google Scholar 

  11. L. Lam. Classifier combinations: Implementations and theoretical issues. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 78–86. Springer, 2000.

    Chapter  Google Scholar 

  12. L. Lam and C. Y. Suen. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Systems, Man, and Cybernetics, 27(5):553–568, 1997.

    Article  Google Scholar 

  13. G. McLachlan and K. Basford. Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York, 1988.

    Google Scholar 

  14. B. Mirkin. Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  15. E. J. Pauwels and G. Frederix. Fiding regions of interest for content-extraction. In Proc. of IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases VII, volume SPIE Vol. 3656, pages 501–510, San Jose, January 1999.

    Google Scholar 

  16. S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to gaussian mixture modelling. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(11), November 1998.

    Google Scholar 

  17. H. Tenmoto, M. Kudo, and M. Shimbo. Mdl-based selection of the number of components in mixture models for pattern recognition. In Adnan Amin, Dov Dori, Pavel Pudil, and Herbert Freeman, editors, Advances in Pattern Recognition, volume 1451 of Lecture Notes in Computer Science, pages 831–836. Springer Verlag, 1998.

    Chapter  Google Scholar 

  18. C. Zahn. Graph-theoretical methods for detecting and describing gestalt structures. IEEE Trans. Computers, C-20(1):68–86, 1971.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fred, A. (2001). Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-48219-9_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42284-6

  • Online ISBN: 978-3-540-48219-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics