Convex clustering for binary data

Choi, Hosik; Lee, Seokho

doi:10.1007/s11634-018-0350-1

Convex clustering for binary data

Regular Article
Published: 14 November 2018

Volume 13, pages 991–1018, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Hosik Choi¹ &
Seokho Lee²

894 Accesses
5 Citations
Explore all metrics

Abstract

We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical $\ell _1$ pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an $\ell _2$ fused penalty on the fusion parameters and an $\ell _1$ penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian sparse convex clustering via global-local shrinkage priors

Article Open access 05 April 2021

Hard and Fuzzy c-means Algorithms with Pairwise Constraints by Non-metric Terms

Robust convex clustering

Article 08 November 2019

References

Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2:183–202
Article MathSciNet Google Scholar
Böhning D (1992) Multinomial logistic regression algorithm. Ann Inst Stat Math 44:197–200
Article Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trend${}^{\textregistered }$ Mach Learn 3:1–122
MATH Google Scholar
Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Graph Stat 24:994–1013
Article MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MathSciNet Google Scholar
Finch H (2005) Comparison of distance measures in cluster analysis with dichotomous data. J Data Sci 3:85–100
Google Scholar
Goldstein T, O’Donoghue B, Setzer S, Baraniuk R (2014) Fast alternating direction optimization methods. SIAM J Imaging Sci 7:1588–1623
Article MathSciNet Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Hallac D, Leskovec J, Boyd S (2015) Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 387–396
Hocking TD, Joullin A, Bach F, Vert J-P (2011) Cluterpath: an algorithm for clustering using convex fusion penalties. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 754–762
Jolliffe IT (2012) Principal component analysis, 2nd edn. Springer, New York
MATH Google Scholar
Lange K (2004) Optimization. Springer, New York
Book Google Scholar
Lee S, Huang JZ (2014) A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat Comput 24:429–441
Article MathSciNet Google Scholar
Lee S, Huang JZ, Hu J (2010) Sparse logistic principal component analysis for binary data. Ann Appl Stat 4:1579–1601
Article MathSciNet Google Scholar
Li T (2006) A unified view on clustering binary data. Mach Learn 62:199–215
Article Google Scholar
Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine
Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14:1865–1889
MathSciNet MATH Google Scholar
Polson NG, Scott JG, Willard BT (2015) Proximal algorithms in statistics and machine learning. Stat Sci 30:559–581
Article MathSciNet Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Article Google Scholar
Shen X, Huang HC (2010) Grouping pursuit through a regularization solution surface. J Am Stat Assoc 105:727–739
Article MathSciNet Google Scholar
Shen X, Pan W (2012) Simultaneous supervised clustering and feature selection over a graph. Biometrika 99:899–914
Article MathSciNet Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63:411–423
Article MathSciNet Google Scholar
Turlach BA, Venables W (2005) Simultaneous variable selection. Technometrics 47:349–363
Article MathSciNet Google Scholar
Witten DM, Tibshirani R (2010) A framework for feature selection in clustering. J Am Stat Assoc 105:713–726
Article MathSciNet Google Scholar
Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17:1–25
MathSciNet MATH Google Scholar
Yang H, Liu X (2017) Studies on the clustering algorithm for analyzing gene expression data with a bidirectional penalty. J Comput Biol 24:689–698
Article MathSciNet Google Scholar
Yang Y, Guan X, You J (2002) CLOPE: a fast and effective clustering algorithm for transactional data. In: SIGKDD ’02 Edmonton, Alberta, Canada, pp 682–687
Zhang Z, Li T, Ding C, Zhang X (2007) Binary matrix factorization with applications. In: IEEE international conference on data mining, pp 391–400

Download references

Acknowledgements

Hosik Choi was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1B05028565). Seokho Lee was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A1B04030695).

Author information

Authors and Affiliations

Kyonggi University, Suwon, Republic of Korea
Hosik Choi
Hankuk University of Foreign Studies, Seoul, Republic of Korea
Seokho Lee

Authors

Hosik Choi
View author publications
You can also search for this author in PubMed Google Scholar
Seokho Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seokho Lee.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 104 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, H., Lee, S. Convex clustering for binary data. Adv Data Anal Classif 13, 991–1018 (2019). https://doi.org/10.1007/s11634-018-0350-1

Download citation

Received: 27 February 2018
Revised: 28 October 2018
Accepted: 07 November 2018
Published: 14 November 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11634-018-0350-1

Keywords

Mathematics Subject Classification

62H30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convex clustering for binary data

Abstract

Access this article

Similar content being viewed by others

Bayesian sparse convex clustering via global-local shrinkage priors

Hard and Fuzzy c-means Algorithms with Pairwise Constraints by Non-metric Terms

Robust convex clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 104 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Convex clustering for binary data

Abstract

Access this article

Similar content being viewed by others

Bayesian sparse convex clustering via global-local shrinkage priors

Hard and Fuzzy c-means Algorithms with Pairwise Constraints by Non-metric Terms

Robust convex clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 104 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation