Abstract
One of the biggest challenges in clustering is finding a robust and versatile criterion to evaluate the quality of clustering results. In this paper, we investigate the extent to which unsupervised criteria can be used to obtain clusters highly correlated to external labels. We show that the usefulness of these criteria is data-dependent and for most data sets multiple criteria are required in order to identify the best performing clustering algorithm. We present a multi-objective evolutionary clustering algorithm capable of finding a set of high-quality solutions. For the real world data sets examined the Pareto front can offer better clusterings than simply optimizing a single unsupervised criterion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining. ICDM 2006, pp. 107–118. IEEE Computer Society, Washington, DC (2006)
Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective Data Clustering. In: CVPR, vol. 2, pp. 424–430 (2004)
MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)
Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Global optimization, meta clustering and consensus clustering for class prediction. In: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1463–1470. IEEE Press, Piscataway (2009)
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Bartoň, T., Kordík, P.: Encoding time series data for better clustering results. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS 2012-ICEUTE 2012-SOCO 2012. AISC, vol. 189, pp. 467–475. Springer, Heidelberg (2013)
Hubert, L., Levin, J.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)
Milligan, G.W.: A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.: The Elements of Statistical Learning. Springer, Dordrecht (2009)
Albatineh, A., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23(2), 301–313 (2006)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Faceli, K., de Souto, M.C.P., de Araujo, D.S.A., de Carvalho, A.C.P.L.F.: Multi-objective clustering ensemble for gene expression data analysis. Neurocomputing 72(13–15), 2763–2774 (2009)
Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)
Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern Recogn. Lett. 29(14), 1947–1953 (2008)
He, Z., Xu, X., Deng, S.: k-ANMI: A mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008)
Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004)
Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Milligan, G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)
Milligan, G., Cooper, M.: A study of standardization of variables in cluster analysis. J. Classif. 5(2), 181–204 (1988)
Acknowledgements
We would like to thank Petr Bart\(\mathop {\mathrm{{u}}}\limits ^{\tiny \circ }\)něk, Ph.D. from the IMG CAS institute for supporting our research and letting us publish all details of our work. This research is partially supported by CTU grant SGS15/117/OHK3/1T/18 New data processing methods for data mining and Program NPU I (LO1419) by Ministry of Education, Youth and Sports of Czech Republic.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bartoň, T., Kordík, P. (2015). Evaluation of Relative Indexes for Multi-objective Clustering. In: Onieva, E., Santos, I., Osaba, E., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2015. Lecture Notes in Computer Science(), vol 9121. Springer, Cham. https://doi.org/10.1007/978-3-319-19644-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-19644-2_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19643-5
Online ISBN: 978-3-319-19644-2
eBook Packages: Computer ScienceComputer Science (R0)