Abstract
This paper is a reflection upon a common practice of solving various types of learning problems by optimizing arbitrarily chosen criteria in the hope that they are well correlated with the criterion actually used for assessment of the results. This issue has been investigated using clustering as an example, hence a unified view of clustering as an optimization problem is first proposed, stemming from the belief that typical design choices in clustering, like the number of clusters or similarity measure can be, and often are suboptimal, also from the point of view of clustering quality measures later used for algorithm comparison and ranking. In order to illustrate our point we propose a generalized clustering framework and provide a proof-of-concept using standard benchmark datasets and two popular clustering methods for comparison.
Chapter PDF
Similar content being viewed by others
Keywords
References
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Birge, B.: PSOt – a particle swarm optimization toolbox for use with Matlab. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium SIS03 Cat No03EX706, pp. 182–186 (2003)
Budka, M., Gabrys, B.: Correntropy-based density-preserving data sampling as an alternative to standard cross-validation. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (July 2010)
Budka, M., Gabrys, B.: Ridge regression ensemble for toxicity prediction. Procedia Computer Science 1(1), 193–201 (2010)
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2), 224–227 (1979)
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
Dubes, R.: How many clusters are best?-an experiment. Pattern Recognition 20(6), 645–663 (1987)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S.: PR–Tools 4.1. A MATLAB Toolbox for Pattern Recognition (2007), http://prtools.org
Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4(1), 95–104 (1974)
Fletcher, R.: Practical methods of optimization, 2nd edn. Wiley (2000)
Fraser, A.: Simulation of genetic systems by automatic digital computers vi. epistasis. Australian Journal of Biological Sciences 13(2), 150–162 (1960)
Hamming, R.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura (1901)
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)
Jenssen, R., Erdogmus, D., Hild, K.E., Príncipe, J.C., Eltoft, T.: Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 34–45. Springer, Heidelberg (2005)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., Xu, Y.: Trustworthy online controlled experiments: Five puzzling outcomes explained. In: KDD 2012, Beijing China, August 12-16 (2012)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, p. 14 (1967)
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Budka, M. (2013). Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions. In: Nguyen, N., Trawiński, B., Katarzyniak, R., Jo, GS. (eds) Advanced Methods for Computational Collective Intelligence. Studies in Computational Intelligence, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34300-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-34300-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34299-8
Online ISBN: 978-3-642-34300-1
eBook Packages: EngineeringEngineering (R0)