Exploiting the Trade-off — The Benefits of Multiple Objectives in Data Clustering

Handl, Julia; Knowles, Joshua

doi:10.1007/978-3-540-31880-4_38

Julia Handl¹⁹ &
Joshua Knowles¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3410))

Included in the following conference series:

International Conference on Evolutionary Multi-Criterion Optimization

7387 Accesses
57 Citations

Abstract

In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [6]. Here, we extend this idea by describing an advanced multiobjective clustering algorithm, MOCK, with the capacity to identify good solutions from the Pareto front, and to automatically determine the number of clusters in a data set. The algorithm has been subject to a thorough comparison with alternative clustering techniques and we briefly summarize these results. We then present investigations into the mechanisms at the heart of MOCK: we discuss a simple example demonstrating the synergistic effects at work in multiobjective clustering, which explain its superiority to single-objective clustering techniques, and we analyse how MOCK’s Pareto fronts compare to the performance curves obtained by single-objective algorithms run with a range of different numbers of clusters specified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Supporting material for MOCK, http://dbk.ch.umist.ac.uk/handl/mock/
Branke, J., Deb, K., Dierolf, H., Osswald, M.: Finding knees in multi-objective optimization. In: Proceedings of the Eighth International Conference on Parallel Problem Solving from Nature, pp. 722–731. Springer, Heidelberg (2004)
Chapter Google Scholar
Corne, D.W., Knowles, J.D., Oates, M.J.: PESA-II: Region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 283–290. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Son Ltd, Chichester (1998)
Google Scholar
Fleurya, G., Hero, A., Zareparsi, S., Swaroop, A.: Gene discovery using Pareto depth sampling distributions. Special Number on Genomics, Signal Processing and Statistics, Journal of the Franklin Institute 341(1–2), 55–75 (2004)
Google Scholar
Handl, J., Knowles, J.: Evolutionary multiobjective clustering. In: Proceedings of the Eighth International Conference on Parallel Problem Solving from Nature, pp. 1081–1091. Springer, Heidelberg (2004)
Chapter Google Scholar
Handl, J., Knowles, J.: Multiobjective clustering with automatic determination of the number of clusters. Technical Report COMPYSYBIO-TR-2004-02, Department of Chemistry, UMIST, UK (August 2004)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6, 531–556 (2002)
MATH Google Scholar
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 15th Conference on Neural Information Processing Systems (2002), http://www.cs.cornell.edu/home/kleinber/nips15.ps
Law, M.H.C.: Multiobjective data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 424–430. IEEE Press, Los Alamitos (2004)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 33, 1455–1465 (2000)
Article Google Scholar
Pan, H., Zhu, J., Han, D.: Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, Proteomics & Bioinformatics 1(4) (2003)
Google Scholar
Park, Y.-J., Song, M.-S.: A genetic algorithm for clustering problems. In: Proceedings of the Third Annual Conference on Genetic Programming, pp. 568–575. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Pena, J.M., Lozana, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k -means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research 3, 583–617 (2002)
Article MathSciNet Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the Gap statistic. Technical Report 208, Department of Statistics, Stanford University, USA (2000)
Google Scholar
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
Google Scholar
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Chemistry, University of Manchester, Faraday Building, Sackville Street, PO Box 88, Manchester, M60 1QD
Julia Handl & Joshua Knowles

Authors

Julia Handl
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Knowles
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Computación, CINVESTAV-IPN (Evolutionary Computation Group), 07300, México, D.F., México
Carlos A. Coello Coello
Centro de Investigación en Matemáticas (CIMAT), A.P. 402, Guanajuato, 36000, Mexico
Arturo Hernández Aguirre
Computer Engineering and Networks Laboratory, ETH, Zurich, Switzerland
Eckart Zitzler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Handl, J., Knowles, J. (2005). Exploiting the Trade-off — The Benefits of Multiple Objectives in Data Clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds) Evolutionary Multi-Criterion Optimization. EMO 2005. Lecture Notes in Computer Science, vol 3410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31880-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-31880-4_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24983-2
Online ISBN: 978-3-540-31880-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics