Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions

Budka, Marcin

doi:10.1007/978-3-642-34300-1_17

Marcin Budka⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 457))

1277 Accesses
3 Citations

Abstract

This paper is a reflection upon a common practice of solving various types of learning problems by optimizing arbitrarily chosen criteria in the hope that they are well correlated with the criterion actually used for assessment of the results. This issue has been investigated using clustering as an example, hence a unified view of clustering as an optimization problem is first proposed, stemming from the belief that typical design choices in clustering, like the number of clusters or similarity measure can be, and often are suboptimal, also from the point of view of clustering quality measures later used for algorithm comparison and ranking. In order to illustrate our point we propose a generalized clustering framework and provide a proof-of-concept using standard benchmark datasets and two popular clustering methods for comparison.

Download to read the full chapter text

Chapter PDF

Optimizing Clustering with Cuttlefish Algorithm

Clustering Evaluation in High-Dimensional Data

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Keywords

References

Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Google Scholar
Birge, B.: PSOt – a particle swarm optimization toolbox for use with Matlab. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium SIS03 Cat No03EX706, pp. 182–186 (2003)
Google Scholar
Budka, M., Gabrys, B.: Correntropy-based density-preserving data sampling as an alternative to standard cross-validation. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (July 2010)
Google Scholar
Budka, M., Gabrys, B.: Ridge regression ensemble for toxicity prediction. Procedia Computer Science 1(1), 193–201 (2010)
Article Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2), 224–227 (1979)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Dubes, R.: How many clusters are best?-an experiment. Pattern Recognition 20(6), 645–663 (1987)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
MATH Google Scholar
Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S.: PR–Tools 4.1. A MATLAB Toolbox for Pattern Recognition (2007), http://prtools.org
Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4(1), 95–104 (1974)
Article MathSciNet Google Scholar
Fletcher, R.: Practical methods of optimization, 2nd edn. Wiley (2000)
Google Scholar
Fraser, A.: Simulation of genetic systems by automatic digital computers vi. epistasis. Australian Journal of Biological Sciences 13(2), 150–162 (1960)
Google Scholar
Hamming, R.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
MathSciNet Google Scholar
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura (1901)
Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Jenssen, R., Erdogmus, D., Hild, K.E., Príncipe, J.C., Eltoft, T.: Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 34–45. Springer, Heidelberg (2005)
Chapter Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Google Scholar
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., Xu, Y.: Trustworthy online controlled experiments: Five puzzling outcomes explained. In: KDD 2012, Beijing China, August 12-16 (2012)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, p. 14 (1967)
Google Scholar
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Bournemouth University, BH12 5BB, Poole, UK
Marcin Budka

Authors

Marcin Budka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Budka .

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Wyb. Wyspianskiego 27, Wroclaw, 50-370, Poland
Ngoc Thanh Nguyen
Institute of Informatics, Wroclaw University of Technology, Wyb. Wyspianskiego 27, Wroclaw, 50-370, Poland
Bogdan Trawiński
Institute of Informatics, Wroclaw University of Technology, Wyb. Wyspianskiego 27, Wroclaw, 50-370, Poland
Radosław Katarzyniak
, Dept. of Computer Science & Engineering, INHA University, #253 YongHyun-dong, Nam-Ku, Inchon, 402-751, Korea, Republic of (South Korea)
Geun-Sik Jo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Budka, M. (2013). Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions. In: Nguyen, N., Trawiński, B., Katarzyniak, R., Jo, GS. (eds) Advanced Methods for Computational Collective Intelligence. Studies in Computational Intelligence, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34300-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-34300-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34299-8
Online ISBN: 978-3-642-34300-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Clustering with Cuttlefish Algorithm

Clustering Evaluation in High-Dimensional Data

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Clustering with Cuttlefish Algorithm

Clustering Evaluation in High-Dimensional Data

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation