A weighted framework for unsupervised ensemble learning based on internal quality measures

Ünlü, Ramazan; Xanthopoulos, Petros

doi:10.1007/s10479-017-2716-8

A weighted framework for unsupervised ensemble learning based on internal quality measures

S.I.: Computational Biomedicine
Published: 21 November 2017

Volume 276, pages 229–247, (2019)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

523 Accesses
12 Citations
Explore all metrics

Abstract

Unsupervised ensemble, or consensus clustering, consists in finding the optimal combination strategy of individual clusterings that is robust with respect to the selection of an algorithmic clustering pool. Recently an approach was proposed based on the concept of consensus graph that has profound advantages over its predecessors. Despite its robust properties this approach assigns the same weight to the contribution of each clustering to the final solution. In this paper, we propose a weighting policy for this problem that is based on internal clustering quality measures and compare against other popular approaches. Results on publicly available datasets show that weights can significantly improve the accuracy performance while retaining the robust properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Article 29 December 2017

Consensus Clustering with Robust Evidence Accumulation

LWMC: A Locally Weighted Meta-Clustering Algorithm for Ensemble Clustering

References

Abawajy, J. H., Kelarev, A. V., & Chowdhury, M. (2013). Multistage approach for clustering and classification of ecg data. Computer Methods and Programs in Biomedicine, 112(3), 720–730.
Article Google Scholar
Abello, J., Pardalos, P. M., & Resende, M. G. (2013). Handbook of massive data sets (Vol. 4). Berlin: Springer.
Google Scholar
Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 20th international conference on Pattern recognition (ICPR), 2010 (pp. 3121–3124). IEEE.
Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3(1), 1–27.
Article Google Scholar
Chang, H., & Yeung, D.-Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191–203.
Article Google Scholar
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 224–227.
Article Google Scholar
Deodhar, M., & Ghosh, J. (2006). Consensus clustering for detection of overlapping clusters in microarray data. In ICDM workshops (pp. 104–108).
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd, 96, 226–231.
Google Scholar
Fodeh, S. J., Brandt, C., Luong, T. B., Haddad, A., Schultz, M., Murphy, T., et al. (2013). Complementary ensemble clustering of biomedical data. Journal of Biomedical Informatics, 46(3), 436–443.
Article Google Scholar
Fred, A. (2001). Finding consistent clusters in data partitions. In Multiple classifier systems (pp. 309–318). Springer.
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.
Article Google Scholar
Fu, L., & Medico, E. (2007). Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics, 8(1), 3.
Article Google Scholar
Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4.
Article Google Scholar
Goder, A., & Filkov, V. (2008). Consensus clustering algorithms: Comparison and refinement. In Alenex (Vol. 8, pp. 109–117). SIAM.
Haghtalab, S., Xanthopoulos, P., & Madani, K. (2015). A robust unsupervised consensus control chart pattern recognition framework. Expert Systems with Applications, 42, 6767–6776.
Article Google Scholar
Halkidi, M., & Vazirgiannis, M. (2001). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings IEEE international conference on data mining, 2001. ICDM 2001 (pp. 187–194). IEEE.
Halkidi, M., Vazirgiannis, M., Batistakis, Y. (2000). Quality scheme assessment in the clustering process. In Proceedings of the 4th European conference on principles of data mining and knowledge discovery, PKDD ’00 (pp. 265–276) London, UK: Springer. ISBN 3-540-41066-X. URL http://dl.acm.org/citation.cfm?id=645804.669820. Accessed 20 Nov 2017.
Jang, J.-S. R., Sun, C.-T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing : A computational approach to learning and machine intelligence. New Jersey, NJ: Prentice Hall.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.
Article Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.
Google Scholar
Kovács, F., Legány, C., & Babos, A. (2005). Cluster validity measurement techniques. In 6th International symposium of hungarian researchers on computational intelligence.
Křivánek, M., & Morávek, J. (1986). Np-hard problems in hierarchical-tree clustering. Acta Informatica, 23(3), 311–323.
Article Google Scholar
Kuncheva, L. I., Hadjitodorov, S. T., & Todorova, L. P. (2006). Experimental comparison of cluster ensemble methods. In 9th International conference on information fusion, 2006 (pp. 1–7). IEEE.
Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks. Scientific Reports, 2, 336.
Article Google Scholar
Lawlor, N., Fabbri, A., Guan, P., George, J., & Karuturi, R. K. M. (2016). multiclust: An r-package for identifying biologically relevant clusters in cancer transcriptome profiles. Cancer Informatics, 15, 103.
Article Google Scholar
LeCun, Y., & Cortes, C. (2010). Mnist handwritten digit database. AT&T Labs[Online]. http://yann.lecun.com/exdb/mnist. Accessed 20 Nov 2017.
Li, T., & Ding, C. (2008). 2008 SIAM international conference on data mining (p. 12), 24–26 April 2008, Atlanta, GA.
Li, T., Ogihara, M., & Zhu, S. (2006). Integrating features from different sources for music information retrieval. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 372–381). IEEE,
Lichman, M. (2013). UCI machine learning repository. URL http://archive.ics.uci.edu/ml. Accessed 20 Nov 2017.
Liu, H., Cheng, G., & Wu, J. (2015). Consensus clustering on big data. In 12th International conference on service systems and service management (ICSSSM), 2015 (pp. 1–6). IEEE.
Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In IEEE 10th international conference on data mining (ICDM), 2010 (pp. 911–916). IEEE.
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA (Vol. 1, pp. 281–297).
Mangasarian, O. L., Nick Street, W., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570–577.
McLachlan, G., & Peel, D. (2000). Multivariate normal mixtures. In Finite Mixture Models. Hoboken, NJ: Wiley. https://doi.org/10.1002/0471721182.ch3.
McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educational and Psychological Measurement, 17, 207–229.
Article Google Scholar
Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.
Google Scholar
Nguyen, N., & Caruana, R. (2007). Consensus clusterings. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 607–612). IEEE
Race, S. L. (2014). Iterative consensus clustering. Raleigh: North Carolina State University.
Google Scholar
Rajaraman, A., Ullman, J. D., Ullman, J. D., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge: Cambridge University Press.
Google Scholar
Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.
Google Scholar
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Article Google Scholar
Sharma, S. (1996). Applied multivariate techniques. New York, NY: Wiley.
Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Sneath, P. H. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17(1), 201–226.
Article Google Scholar
Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.
Google Scholar
Sukegawa, N., Yamamoto, Y., & Zhang, L. (2013). Lagrangian relaxation and pegging test for the clique partitioning problem. Advances in Data Analysis and Classification, 7(4), 363–391.
Article Google Scholar
Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881.
Article Google Scholar
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03), 337–372.
Article Google Scholar
Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Francisco, CA: Morgan Kaufmann Publishers, Inc.
Weng, C. G., & Poon, J. (2008). A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian data mining conference (Vol. 87, pp. 27–32). Australian Computer Society, Inc.
Xanthopoulos, P. (2014). A review on consensus clustering methods. In T. M. Rassias, C. A. Floudas & S. Butenko (Eds.), Optimization in Science and Engineering (pp. 553–566). New York: Springer.
Yu, X., Yu, G., & Wang, J. (2017). Clustering cancer gene expression data by projective clustering ensemble. PloS One, 12(2), e0171429.
Article Google Scholar
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100(1), 68–86.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Central Florida, 12800 Pegasus Dr., Orlando, FL, 32816, USA
Ramazan Ünlü
Decision and Information Science Department, Stetson University, 421 N. Woodland Blvd., Deland, FL, 32723, USA
Petros Xanthopoulos

Authors

Ramazan Ünlü
View author publications
You can also search for this author in PubMed Google Scholar
Petros Xanthopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petros Xanthopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ünlü, R., Xanthopoulos, P. A weighted framework for unsupervised ensemble learning based on internal quality measures. Ann Oper Res 276, 229–247 (2019). https://doi.org/10.1007/s10479-017-2716-8

Download citation

Published: 21 November 2017
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s10479-017-2716-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A weighted framework for unsupervised ensemble learning based on internal quality measures

Abstract

Access this article

Similar content being viewed by others

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Consensus Clustering with Robust Evidence Accumulation

LWMC: A Locally Weighted Meta-Clustering Algorithm for Ensemble Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A weighted framework for unsupervised ensemble learning based on internal quality measures

Abstract

Access this article

Similar content being viewed by others

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Consensus Clustering with Robust Evidence Accumulation

LWMC: A Locally Weighted Meta-Clustering Algorithm for Ensemble Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation