Minimum-Entropy Data Clustering Using Reversible Jump Markov Chain Monte Carlo

Roberts, Stephen J.; Holmes, Christopher; Denison, Dave

doi:10.1007/3-540-44668-0_15

Stephen J. Roberts⁷,
Christopher Holmes⁸ &
Dave Denison⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

International Conference on Artificial Neural Networks

4089 Accesses
4 Citations

Abstract

Many problems in data analysis, especially in signal and image processing, require the unsupervised partitioning of data into a set of ‘self-similar’ classes or clusters. An ideal partitioning unambiguously assigns each datum to a single class and one thinks of the data as being generated by a number of data generators, one for each class. Many algorithms have been proposed for such analysis and for the estimation of the optimal number of partitions. The majority of popular and computationally feasible techniques rely on assuming that classes are hyper-ellipsoidal in shape. In the case of Gaussian mixture modelling [15,6] this is explicit; in the case of dendogram linkage methods (which typically rely on the L ₂ norm) it is implicit [9]. For some data sets this leads to an over-partitioning. Alternative methods, based on valley seeking [6] or maxima-tracking in scale-space [16,18,13] for example, have the advantage that they are free from such assumptions. They can be, however, sensitive to noise and computationally intensive in high-dimensional spaces. In this paper we re-consider the issue of data partitioning from an information-theoretic viewpoint and show that minimisation of partition entropy may be used to evaluate the most probable set of data generators. Rather than formulate the problem as one of a traditional model-order estimation to infer the most probable number of classes we employ a reversible jump mechanism in a Markov-chain Monte Carlo (MCMC) sampler which explores the space of different model sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Aeberhard, D. Coomans, and O. de Vel. Comparative-Analysis of Statistical Pattern-Recognition Methods in High-Dimensional Settings. Pattern Recognition, 27(8):1065–1077, 1994.
Article Google Scholar
C. Andrieu, N. de Freitas, and A. Doucet. Sequential MCMC for Bayesian Model Selection. IEEE Signal Processing Workshop on Higher Order Statistics. Ceasarea, Israel, June 14–16., 1999.
Google Scholar
J.M. Bernardo and A.F.M. Smith. Bayesian Theory. John Wiley, 1994.
Google Scholar
C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., 39(1):1–38, 1977.
MATH MathSciNet Google Scholar
K. Fukunaga. An Introduction to Statistical Pattern Recognition. Academic Press, 1990.
Google Scholar
P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995.
Article MATH MathSciNet Google Scholar
C. Holmes and B.K. Mallick. Bayesian Radial Basis Functions of variable dimension. Neural Computation, 10:1217–1233, 1998.
Article Google Scholar
A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
Google Scholar
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorisation. Nature, 401:788–791, October 1999.
Google Scholar
R.M. Neal. Bayesian learning for neural networks. Lecture notes in statistics. Springer, Berlin, 1996.
MATH Google Scholar
S. Richardson and P.J. Green. On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society (Series B), 59(4):731–758, 1997.
Article MATH MathSciNet Google Scholar
S.J. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30(2):261–272, 1997.
Article Google Scholar
S.J. Roberts, R. Everson, and I. Rezek. Maximum Certainty Data Partitioning. Pattern Recognition, 33(5):833–839, 2000.
Article Google Scholar
S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian Approaches To Mixture Modelling. IEEE Transaction on Pattern Analysis and Machine Intelligence, 20(11):1133–1142, 1998.
Article Google Scholar
K. Rose, E. Gurewitz, and G.C. Foz. A Deterministic Annealing Approach to Clustering. Pattern Recognition Letters, 11(9):589–594, September 1990.
Google Scholar
L. Tierney. Markov Chains for exploring Posterior Distributions. Annals of Statistics, 22:1701–1762, 1994.
Article MATH MathSciNet Google Scholar
R. Wilson and M. Spann. A New Approach to Clustering. Pattern Recognition, 23(12):1413–1425, 1990.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Research, Department of Engineering Science, University of Oxford, UK
Stephen J. Roberts
Department of Mathematics, Imperial College, London, UK
Christopher Holmes & Dave Denison

Authors

Stephen J. Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Dave Denison
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Mecidal Cybernetics and Artificial Intelligence, University of Vienna, Freyung 6/2, 1010, Vienna, Austria
Georg Dorffner
Institute for Computer Aided Automation Pattern Recognition and Image Processing Group, Technical University of Vienna, Favoritenstr. 9/1832, 1040, Vienna, Austria
Horst Bischof
Institut für Statistik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090, Wien, Austria
Kurt Hornik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roberts, S.J., Holmes, C., Denison, D. (2001). Minimum-Entropy Data Clustering Using Reversible Jump Markov Chain Monte Carlo. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_15

Download citation

DOI: https://doi.org/10.1007/3-540-44668-0_15
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics