Abstract
The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, over multiple clusterings obtained by random initializations of the K-means algorithm. This produces a mapping of the clusterings into a new similarity measure between patterns. The final data partition is obtained by applying the single-link method over this similarity matrix. In this paper we further explore and extend this idea, by proposing: (a) the combination of multiple K-means clusterings using variable k; (b) using cluster lifetime as the criterion for extracting the final clusters; and (c) the adaptation of this approach to string patterns. This leads to a more robust clustering technique, with fewer design parameters than the previous approach and potential applications in a wider range of problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
T. A. Bailey and R. Dubes. Cluster validity profiles. Pattern Recognition, 15(2):61–83, 1982.
J. Buhmann and M. Held. Unsupervised learning without overfitting: Empirical risk approximation as an induction principle for reliable clustering. In Sameer Singh, editor, International Conference on Advances in Pattern Recognition, pages 167–176. Springer Verlag, 1999.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, second edition, 2001.
Y. El-Sonbaty and M. A. Ismail. On-line hierarchical clustering. Pattern Recognition Letters, pages 1285–1291, 1998.
M. Figueiredo and A. K. Jain. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(3):381–396, 2002.
B. Fischer, T. Zoller, and J. Buhmann. Path based pairwise data clustering with application to texture segmentation. In M. Figueiredo, J. Zerubia, and A. K. Jain, editors, Energy Minimization Methods in Computer Vision and Pattern Recogni-tion, volume 2134 of LNCS, pages 235–266. Springer Verlag, 2001.
A. L. Fred. Finding consistent clusters in data partitions. In Josef Kittler and Fabio Roli, editors, Multiple Classifier Systems, volume LNCS 2096, pages 309–318. Springer, 2001.
A. L. Fred and J. Leitão. Clustering under a hypothesis of smooth dissimilarity increments. In Proc. of the 15th Int’l Conference on Pattern Recognition, volume 2, pages 190–194, Barcelona, 2000.
A. L. Fred, J. S. Marques, and P. M. Jorge. Hidden markov models vs syntactic modeling in object recognition. In ICIP’97, 1997.
M. Har-Even and V. L. Brailovsky. Probabilistic validation approach for clustering. Pattern Recognition, 16:1189–1196, 1995.
A. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, 1989.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
A.K. Jain, M. N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.
J. Kittler, M. Hatef, R. P Duin, and J. Matas. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.
R. Kothari and D. Pitts. On finding the number of clusters. Pattern Recognition Letters, 20:405–416, 1999.
Y. Man and I. Gath. Detection and separation of ring-shaped clusters using fuzzy clusters. IEEE Trans. Pattern Analysis and Machine Intelligence, 16(8):855–861, August 1994.
A. Marzal and E. Vidal. Computation of normalized edit distance and applications. IEEE Trans. Pattern Analysis and Machine Intelligence, 2(15):926–932, 1993.
G. McLachlan and K. Basford. Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York, 1988.
B. Mirkin. Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999.
N. R. Pal and J. C. Bezdek. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Systems, 3:370–379, 1995.
E. J. Pauwels and G. Frederix. Fiding regions of interest for content-extraction. In Proc. of IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases VII, volume SPIE Vol. 3656, pages 501–510, San Jose, January 1999.
E. S. Ristad and P. N. Yianilos. Learning string-edit distance. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(5):522–531, May 1998.
S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to gaus-sian mixture modelling. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(11), November 1998.
D. Stanford and A. E. Raftery. Principal curve clustering with noise. Technical report, University of Washington, http://www.stat.washington.edu/raftery, 1997.
H. Tenmoto, M. Kudo, and M. Shimbo. MDL-based selection of the number of components in mixture models for pattern recognition. In Adnan Amin, Dov Dori, Pavel Pudil, and Herbert Freeman, editors, Advances in Pattern Recognition,volume 1451 of Lecture Notes in Computer Science, pages 831–836. Springer Verlag, 1998.
C. Zahn. Graph-theoretical methods for detecting and describing gestalt structures. IEEE Trans. Computers, C-20(1):68–86, 1971.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fred, A., Jain, A.K. (2002). Evidence Accumulation Clustering Based on the K-Means Algorithm. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2002. Lecture Notes in Computer Science, vol 2396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70659-3_46
Download citation
DOI: https://doi.org/10.1007/3-540-70659-3_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44011-6
Online ISBN: 978-3-540-70659-5
eBook Packages: Springer Book Archive