Abstract
In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [6]. Here, we extend this idea by describing an advanced multiobjective clustering algorithm, MOCK, with the capacity to identify good solutions from the Pareto front, and to automatically determine the number of clusters in a data set. The algorithm has been subject to a thorough comparison with alternative clustering techniques and we briefly summarize these results. We then present investigations into the mechanisms at the heart of MOCK: we discuss a simple example demonstrating the synergistic effects at work in multiobjective clustering, which explain its superiority to single-objective clustering techniques, and we analyse how MOCK’s Pareto fronts compare to the performance curves obtained by single-objective algorithms run with a range of different numbers of clusters specified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Supporting material for MOCK, http://dbk.ch.umist.ac.uk/handl/mock/
Branke, J., Deb, K., Dierolf, H., Osswald, M.: Finding knees in multi-objective optimization. In: Proceedings of the Eighth International Conference on Parallel Problem Solving from Nature, pp. 722–731. Springer, Heidelberg (2004)
Corne, D.W., Knowles, J.D., Oates, M.J.: PESA-II: Region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 283–290. Morgan Kaufmann, San Francisco (2001)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Son Ltd, Chichester (1998)
Fleurya, G., Hero, A., Zareparsi, S., Swaroop, A.: Gene discovery using Pareto depth sampling distributions. Special Number on Genomics, Signal Processing and Statistics, Journal of the Franklin Institute 341(1–2), 55–75 (2004)
Handl, J., Knowles, J.: Evolutionary multiobjective clustering. In: Proceedings of the Eighth International Conference on Parallel Problem Solving from Nature, pp. 1081–1091. Springer, Heidelberg (2004)
Handl, J., Knowles, J.: Multiobjective clustering with automatic determination of the number of clusters. Technical Report COMPYSYBIO-TR-2004-02, Department of Chemistry, UMIST, UK (August 2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6, 531–556 (2002)
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 15th Conference on Neural Information Processing Systems (2002), http://www.cs.cornell.edu/home/kleinber/nips15.ps
Law, M.H.C.: Multiobjective data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 424–430. IEEE Press, Los Alamitos (2004)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 33, 1455–1465 (2000)
Pan, H., Zhu, J., Han, D.: Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, Proteomics & Bioinformatics 1(4) (2003)
Park, Y.-J., Song, M.-S.: A genetic algorithm for clustering problems. In: Proceedings of the Third Annual Conference on Genetic Programming, pp. 568–575. Morgan Kaufmann, San Francisco (1998)
Pena, J.M., Lozana, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k -means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research 3, 583–617 (2002)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the Gap statistic. Technical Report 208, Department of Statistics, Stanford University, USA (2000)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Handl, J., Knowles, J. (2005). Exploiting the Trade-off — The Benefits of Multiple Objectives in Data Clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds) Evolutionary Multi-Criterion Optimization. EMO 2005. Lecture Notes in Computer Science, vol 3410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31880-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-31880-4_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24983-2
Online ISBN: 978-3-540-31880-4
eBook Packages: Computer ScienceComputer Science (R0)