Skip to main content

A Review on Consensus Clustering Methods

  • Chapter
  • First Online:

Abstract

Unsupervised learning/clustering is one of the most common, yet computationally intense, data analysis problems in data mining. The plethora of clustering algorithms and performance measures makes the choice of optimal clustering algorithm a challenging task. In order to overcome this shortcoming consensus learning methods have been proposed in the literature. These methods try to optimally combine independently obtained clusterings into a single more robust clustering of improved quality. In this chapter we provide a review of unsupervised consensus learning techniques based on their underlying theoretical principles. We present the exact, approximation, and heuristic approaches, the relation of consensus clustering with other well-studied problems, and discuss relevant applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.r-project.org.

References

  1. Abello, J., Pardalos, P.M., Resende, M.G.: Handbook of Massive Data Sets, vol. 4. Kluwer Academic, London (2002)

    Book  MATH  Google Scholar 

  2. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 23 (2008)

    Google Scholar 

  3. Bakus, J., Hussin, M., Kamel, M.: A som-based document clustering using phrases. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002 (ICONIP’02), vol. 5, pp. 2212–2216. IEEE, Piscataway (2002)

    Google Scholar 

  4. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)

    Article  MATH  Google Scholar 

  5. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)

    Google Scholar 

  6. Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worthwhile? In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)

    Google Scholar 

  7. Butenko, S., Chaovalitwongse, W.A., Pardalos, P.P.M.: Clustering challenges in biological networks. World Scientific, New Jersey (2009)

    Book  Google Scholar 

  8. Chang, Y., Lee, D.J., Hong, Y., Archibald, J., Liang, D.: A robust color image quantization algorithm based on knowledge reuse of k-means clustering ensemble. J. Multimedia 3(2), 20–27 (2008)

    Google Scholar 

  9. Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. CWI (Centre for Mathematics and Computer Science) Amsterdam, The Netherlands (2000)

    Google Scholar 

  11. Estivill-Castro, V.: Why so many clustering algorithms: a position paper. ACM SIGKDD Explorations Newsl. 4(1), 65–75 (2002)

    Article  MathSciNet  Google Scholar 

  12. Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. Int. J. Artif. Intell. Tools 13(04), 863–880 (2004)

    Article  Google Scholar 

  13. Forestier, G., Wemmert, C., Gançarski, P.: Collaborative multi-strategical clustering for object-oriented image analysis. In: Supervised and Unsupervised Ensemble Methods and Their Applications, pp. 71–88. Springer, Berlin (2008)

    Google Scholar 

  14. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)

    Article  MATH  Google Scholar 

  15. Fred, A.: Finding consistent clusters in data partitions. In: Multiple Classifier Systems, pp. 309–318. Springer, Berlin (2001)

    Google Scholar 

  16. Gao, C., Pedrycz, W., Miao, D.: Rough subspace-based clustering ensemble for categorical data. Soft. Comput. 17, 1–16 (2013)

    Article  Google Scholar 

  17. Ghosh, J., Strehl, A., Merugu, S.: A consensus framework for integrating distributed clusterings under limited knowledge sharing. In: Proceedings of the NSF Workshop on Next Generation Data Mining, pp. 99–108 (2002). URL http://strehl.com/download/ghosh-ngdm02.pdf

  18. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)

    Google Scholar 

  19. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)

    Article  MATH  Google Scholar 

  20. Gonzàlez, E., Turmo, J.: Comparing non-parametric ensemble methods for document clustering. In: Natural Language and Information Systems, pp. 245–256. Springer, Berlin (2008)

    Google Scholar 

  21. Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Program. 45(1–3), 59–96 (1989)

    Article  MATH  Google Scholar 

  22. Hornik, K.: A clue for cluster ensembles. J. Stat. Software 14(12), 1–25 (2005). URL http://www.jstatsoft.org/v14/i12

  23. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Google Scholar 

  24. Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)

    Article  MathSciNet  Google Scholar 

  25. Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 1(3), 231–240 (2011)

    Article  Google Scholar 

  26. K\(\check{\text{r}}\) ivánek, M., Morávek, J.: NP-hard problems in hierarchical-tree clustering. Acta Informatica 23(3), 311–323 (1986)

    Google Scholar 

  27. Lancichinetti, A., Fortunato, S.: Consensus clustering in complex networks. Sci. Rep. 2, 336 (2012). URL http://www.nature.com/srep/2012/120327/srep00336/full/srep00336.html

  28. Li, T., Ding, C.: Weighted consensus clustering. In: Proceedings of the 2008 SIAM International Conference on Data Mining (2008)

    Google Scholar 

  29. Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Seventh IEEE International Conference on Data Mining, 2007 (ICDM 2007), pp. 577–582. IEEE, Los Alamitos (2007)

    Google Scholar 

  30. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, p. 14. California (1967)

    Google Scholar 

  31. McQuitty, L.L.: Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ. Psychol. Meas. 17(2), 207–229 (1957)

    Article  Google Scholar 

  32. Meilă, M.: Comparing clusterings – an information based distance. J. Multivariate Anal. 98(5), 873–895 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  33. Milligan, G.W., Cooper, M.C.: Methodology review: clustering methods. Appl. Psychol. Meas. 11(4), 329–354 (1987)

    Article  Google Scholar 

  34. Mirkin, B.: Mathematical Classification and Clustering: From How to What and Why. Springer, Dordrecht (1998)

    Google Scholar 

  35. Mirkin, B.: Reinterpreting the category utility function. Mach. Learn. 45(2), 219–228 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  36. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)

    Article  MATH  Google Scholar 

  37. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)

    Article  Google Scholar 

  38. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  39. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  40. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)

    Google Scholar 

  41. Saeed, F., Salim, N., Abdo, A.: Voting-based consensus clustering for combining multiple clusterings of chemical structures. J. Cheminformatics 4(1), 1–8 (2012)

    Article  MATH  Google Scholar 

  42. Saeed, F., Salim, N., Abdo, A., Hentabli, H.: Combining multiple individual clusterings of chemical structures using cluster-based similarity partitioning algorithm. In: Advanced Machine Learning Technologies and Applications, pp. 276–284. Springer, New York (2012)

    Google Scholar 

  43. Saeed, F., Salim, N., Abdo, A.: Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures. Mol. Inform. 32(7), 591–598 (2013)

    Article  Google Scholar 

  44. Seiler, M., Huang, C.C., Szalma, S., Bhanot, G.: Consensuscluster: a software tool for unsupervised cluster discovery in numerical data. OMICS J. Integr. Biol. 14(1), 109–113 (2010)

    Article  Google Scholar 

  45. Shi, J., Malik, J.: Normalized cuts and image segmentation. In: Proceedings of the 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 731–737. IEEE, Los Alamitos (1997)

    Google Scholar 

  46. Shinnou, H., Sasaki, M.: Ensemble document clustering using weighted hypergraph generated by nmf. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 77–80. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  47. Simpson, T.I., Armstrong, J.D., Jarman, A.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)

    Article  Google Scholar 

  48. Smola, A.J., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  49. Sneath, P.H.: The application of computers to taxonomy. J. Gen. Microbiol. 17(1), 201–226 (1957)

    Article  Google Scholar 

  50. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526. Boston (2000)

    Google Scholar 

  51. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)

    MATH  MathSciNet  Google Scholar 

  52. Sukegawa, N., Yamamoto, Y., Zhang, L.: Lagrangian relaxation and pegging test for the clique partitioning problem. Adv. Data Anal. Classif. 7(4), 363–391 (2013)

    Article  MathSciNet  Google Scholar 

  53. van Rijsbergen, C.J.: Foundation of evaluation. J. Doc. 30(4), 365–373 (1974)

    Article  Google Scholar 

  54. Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted cluster ensemble using a kernel consensus function. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 195–202. Springer, Berlin (2008)

    Google Scholar 

  55. Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. Pattern Recognit. 43(8), 2712–2724 (2010)

    Article  MATH  Google Scholar 

  56. Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell. 25(03), 337–372 (2011)

    Article  MathSciNet  Google Scholar 

  57. Viswanath, S., Bloch, B.N., Genega, E., Rofsky, N., Lenkinski, R., Chappelow, J., Toth, R., Madabhushi, A.: A comprehensive segmentation, registration, and cancer detection scheme on 3 tesla in vivo prostate dce-mri. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2008, pp. 662–669. Springer, Berlin (2008)

    Google Scholar 

  58. Wattuya, P., Jiang, X.: Ensemble combination for solving the parameter selection problem in image segmentation. In: Structural, Syntactic, and Statistical Pattern Recognition, pp. 392–401. Springer, Berlin (2008)

    Google Scholar 

  59. Wattuya, P., Rothaus, K., Praßni, J.S., Jiang, X.: A random walker based approach to combining multiple segmentations. In: 19th International Conference on Pattern Recognition, 2008 (ICPR 2008), pp. 1–4. IEEE, Piscataway (2008)

    Google Scholar 

  60. Xanthopoulos, P., Guarracino, M.R., Pardalos, P.M.: Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Ann. Oper. Res. 216(1), 327–342 (2014)

    Article  MathSciNet  Google Scholar 

  61. Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B.: Robust Data Mining. Springer, New York (2013)

    Book  MATH  Google Scholar 

  62. Xu, S., Lu, Z., Gu, G.: An efficient spectral method for document cluster ensemble. In: The 9th International Conference for Young Computer Scientists, 2008 (ICYCS 2008), pp. 808–813. IEEE, Los Alamitos (2008)

    Google Scholar 

  63. Yu, Z., Wong, H.S., Wang, H.: Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21), 2888–2896 (2007)

    Article  Google Scholar 

  64. Zhang, X., Jiao, L., Liu, F., Bo, L., Gong, M.: Spectral clustering ensemble applied to sar image segmentation. IEEE Trans. Geoscience Remote Sensing 46(7), 2126–2136 (2008)

    Article  Google Scholar 

  65. Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. UMN CS 01-040 (2001)

    Google Scholar 

Download references

Acknowledgment

The author would like to thank Dr. Sibel B. Sonuç for proofreading the manuscript and providing useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petros Xanthopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Xanthopoulos, P. (2014). A Review on Consensus Clustering Methods. In: Rassias, T., Floudas, C., Butenko, S. (eds) Optimization in Science and Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0808-0_26

Download citation

Publish with us

Policies and ethics