Finding Consistent Clusters in Data Partitions

Fred, Ana

doi:10.1007/3-540-48219-9_31

Finding Consistent Clusters in Data Partitions

Ana Fred⁶

Conference paper
First Online: 01 January 2001

1322 Accesses
147 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2096))

Abstract

Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm – voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Bischof and A. Leonardis. Vector quantization and minimum description length. In Sameer Singh, editor, International Conference on Advances on Pattern Recognition, pages 355–364. Springer Verlag, 1999.
Google Scholar
J. Buhmann and M. Held. Unsupervised learning without overfitting: Empirical risk approximation as an induction principle for reliable clustering. In Sameer Singh, editor, International Conference on Advances in Pattern Recognition, pages 167–176. Springer Verlag, 1999.
Google Scholar
T. Dietterich. Ensemble methods in machine learning. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer, 2000.
Chapter Google Scholar
Y. El-Sonbaty and M. A. Ismail. On-line hierarchical clustering. Pattern Recognition Letters, pages 1285–1291, 1998.
Google Scholar
A. L. Fred and J. Leitão. Clustering under a hypothesis of smooth dissimilarity increments. In Proc. of the 15th Int’l Conference on Pattern Recognition, volume 2, pages 190–194, Barcelona, 2000.
Article Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
Google Scholar
A.K. Jain, R. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 22:4–37, January 2000.
Article Google Scholar
A.K. Jain, M. N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.
Article Google Scholar
J. Kittler. Pattern classification: Fusion of information. In S. Singh, editor, Int. Conf. on Advances in Pattern Recognition, pages 13–22, Plymouth, UK, November 1998. Springer.
Google Scholar
J. Kittler, M. Hatef, R.P Duin, and J. Matas. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.
Article Google Scholar
L. Lam. Classifier combinations: Implementations and theoretical issues. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 78–86. Springer, 2000.
Chapter Google Scholar
L. Lam and C. Y. Suen. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Systems, Man, and Cybernetics, 27(5):553–568, 1997.
Article Google Scholar
G. McLachlan and K. Basford. Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York, 1988.
Google Scholar
B. Mirkin. Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999.
Article MATH MathSciNet Google Scholar
E. J. Pauwels and G. Frederix. Fiding regions of interest for content-extraction. In Proc. of IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases VII, volume SPIE Vol. 3656, pages 501–510, San Jose, January 1999.
Google Scholar
S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to gaussian mixture modelling. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(11), November 1998.
Google Scholar
H. Tenmoto, M. Kudo, and M. Shimbo. Mdl-based selection of the number of components in mixture models for pattern recognition. In Adnan Amin, Dov Dori, Pavel Pudil, and Herbert Freeman, editors, Advances in Pattern Recognition, volume 1451 of Lecture Notes in Computer Science, pages 831–836. Springer Verlag, 1998.
Chapter Google Scholar
C. Zahn. Graph-theoretical methods for detecting and describing gestalt structures. IEEE Trans. Computers, C-20(1):68–86, 1971.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior Técnico, Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred

Authors

Ana Fred
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Josef Kittler
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fred, A. (2001). Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_31

Download citation

DOI: https://doi.org/10.1007/3-540-48219-9_31
Published: 22 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42284-6
Online ISBN: 978-3-540-48219-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics