Abstract
Unsupervised learning/clustering is one of the most common, yet computationally intense, data analysis problems in data mining. The plethora of clustering algorithms and performance measures makes the choice of optimal clustering algorithm a challenging task. In order to overcome this shortcoming consensus learning methods have been proposed in the literature. These methods try to optimally combine independently obtained clusterings into a single more robust clustering of improved quality. In this chapter we provide a review of unsupervised consensus learning techniques based on their underlying theoretical principles. We present the exact, approximation, and heuristic approaches, the relation of consensus clustering with other well-studied problems, and discuss relevant applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Abello, J., Pardalos, P.M., Resende, M.G.: Handbook of Massive Data Sets, vol. 4. Kluwer Academic, London (2002)
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 23 (2008)
Bakus, J., Hussin, M., Kamel, M.: A som-based document clustering using phrases. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002 (ICONIP’02), vol. 5, pp. 2212–2216. IEEE, Piscataway (2002)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)
Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worthwhile? In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)
Butenko, S., Chaovalitwongse, W.A., Pardalos, P.P.M.: Clustering challenges in biological networks. World Scientific, New Jersey (2009)
Chang, Y., Lee, D.J., Hong, Y., Archibald, J., Liang, D.: A robust color image quantization algorithm based on knowledge reuse of k-means clustering ensemble. J. Multimedia 3(2), 20–27 (2008)
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)
Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. CWI (Centre for Mathematics and Computer Science) Amsterdam, The Netherlands (2000)
Estivill-Castro, V.: Why so many clustering algorithms: a position paper. ACM SIGKDD Explorations Newsl. 4(1), 65–75 (2002)
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. Int. J. Artif. Intell. Tools 13(04), 863–880 (2004)
Forestier, G., Wemmert, C., Gançarski, P.: Collaborative multi-strategical clustering for object-oriented image analysis. In: Supervised and Unsupervised Ensemble Methods and Their Applications, pp. 71–88. Springer, Berlin (2008)
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Fred, A.: Finding consistent clusters in data partitions. In: Multiple Classifier Systems, pp. 309–318. Springer, Berlin (2001)
Gao, C., Pedrycz, W., Miao, D.: Rough subspace-based clustering ensemble for categorical data. Soft. Comput. 17, 1–16 (2013)
Ghosh, J., Strehl, A., Merugu, S.: A consensus framework for integrating distributed clusterings under limited knowledge sharing. In: Proceedings of the NSF Workshop on Next Generation Data Mining, pp. 99–108 (2002). URL http://strehl.com/download/ghosh-ngdm02.pdf
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)
Gonzàlez, E., Turmo, J.: Comparing non-parametric ensemble methods for document clustering. In: Natural Language and Information Systems, pp. 245–256. Springer, Berlin (2008)
Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Program. 45(1–3), 59–96 (1989)
Hornik, K.: A clue for cluster ensembles. J. Stat. Software 14(12), 1–25 (2005). URL http://www.jstatsoft.org/v14/i12
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)
Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 1(3), 231–240 (2011)
K\(\check{\text{r}}\) ivánek, M., Morávek, J.: NP-hard problems in hierarchical-tree clustering. Acta Informatica 23(3), 311–323 (1986)
Lancichinetti, A., Fortunato, S.: Consensus clustering in complex networks. Sci. Rep. 2, 336 (2012). URL http://www.nature.com/srep/2012/120327/srep00336/full/srep00336.html
Li, T., Ding, C.: Weighted consensus clustering. In: Proceedings of the 2008 SIAM International Conference on Data Mining (2008)
Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Seventh IEEE International Conference on Data Mining, 2007 (ICDM 2007), pp. 577–582. IEEE, Los Alamitos (2007)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, p. 14. California (1967)
McQuitty, L.L.: Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ. Psychol. Meas. 17(2), 207–229 (1957)
Meilă, M.: Comparing clusterings – an information based distance. J. Multivariate Anal. 98(5), 873–895 (2007)
Milligan, G.W., Cooper, M.C.: Methodology review: clustering methods. Appl. Psychol. Meas. 11(4), 329–354 (1987)
Mirkin, B.: Mathematical Classification and Clustering: From How to What and Why. Springer, Dordrecht (1998)
Mirkin, B.: Reinterpreting the category utility function. Mach. Learn. 45(2), 219–228 (2001)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)
Saeed, F., Salim, N., Abdo, A.: Voting-based consensus clustering for combining multiple clusterings of chemical structures. J. Cheminformatics 4(1), 1–8 (2012)
Saeed, F., Salim, N., Abdo, A., Hentabli, H.: Combining multiple individual clusterings of chemical structures using cluster-based similarity partitioning algorithm. In: Advanced Machine Learning Technologies and Applications, pp. 276–284. Springer, New York (2012)
Saeed, F., Salim, N., Abdo, A.: Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures. Mol. Inform. 32(7), 591–598 (2013)
Seiler, M., Huang, C.C., Szalma, S., Bhanot, G.: Consensuscluster: a software tool for unsupervised cluster discovery in numerical data. OMICS J. Integr. Biol. 14(1), 109–113 (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. In: Proceedings of the 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 731–737. IEEE, Los Alamitos (1997)
Shinnou, H., Sasaki, M.: Ensemble document clustering using weighted hypergraph generated by nmf. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 77–80. Association for Computational Linguistics, Prague (2007)
Simpson, T.I., Armstrong, J.D., Jarman, A.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)
Smola, A.J., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Sneath, P.H.: The application of computers to taxonomy. J. Gen. Microbiol. 17(1), 201–226 (1957)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526. Boston (2000)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
Sukegawa, N., Yamamoto, Y., Zhang, L.: Lagrangian relaxation and pegging test for the clique partitioning problem. Adv. Data Anal. Classif. 7(4), 363–391 (2013)
van Rijsbergen, C.J.: Foundation of evaluation. J. Doc. 30(4), 365–373 (1974)
Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted cluster ensemble using a kernel consensus function. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 195–202. Springer, Berlin (2008)
Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. Pattern Recognit. 43(8), 2712–2724 (2010)
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell. 25(03), 337–372 (2011)
Viswanath, S., Bloch, B.N., Genega, E., Rofsky, N., Lenkinski, R., Chappelow, J., Toth, R., Madabhushi, A.: A comprehensive segmentation, registration, and cancer detection scheme on 3 tesla in vivo prostate dce-mri. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2008, pp. 662–669. Springer, Berlin (2008)
Wattuya, P., Jiang, X.: Ensemble combination for solving the parameter selection problem in image segmentation. In: Structural, Syntactic, and Statistical Pattern Recognition, pp. 392–401. Springer, Berlin (2008)
Wattuya, P., Rothaus, K., Praßni, J.S., Jiang, X.: A random walker based approach to combining multiple segmentations. In: 19th International Conference on Pattern Recognition, 2008 (ICPR 2008), pp. 1–4. IEEE, Piscataway (2008)
Xanthopoulos, P., Guarracino, M.R., Pardalos, P.M.: Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Ann. Oper. Res. 216(1), 327–342 (2014)
Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B.: Robust Data Mining. Springer, New York (2013)
Xu, S., Lu, Z., Gu, G.: An efficient spectral method for document cluster ensemble. In: The 9th International Conference for Young Computer Scientists, 2008 (ICYCS 2008), pp. 808–813. IEEE, Los Alamitos (2008)
Yu, Z., Wong, H.S., Wang, H.: Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21), 2888–2896 (2007)
Zhang, X., Jiao, L., Liu, F., Bo, L., Gong, M.: Spectral clustering ensemble applied to sar image segmentation. IEEE Trans. Geoscience Remote Sensing 46(7), 2126–2136 (2008)
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. UMN CS 01-040 (2001)
Acknowledgment
The author would like to thank Dr. Sibel B. Sonuç for proofreading the manuscript and providing useful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Xanthopoulos, P. (2014). A Review on Consensus Clustering Methods. In: Rassias, T., Floudas, C., Butenko, S. (eds) Optimization in Science and Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0808-0_26
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0808-0_26
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0807-3
Online ISBN: 978-1-4939-0808-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)