Efficient, certifiably optimal clustering with applications to latent variable graphical models

Eisenach, Carson; Liu, Han

doi:10.1007/s10107-019-01375-2

Efficient, certifiably optimal clustering with applications to latent variable graphical models

Full Length Paper
Series B
Published: 16 February 2019

Volume 176, pages 137–173, (2019)
Cite this article

Mathematical Programming Submit manuscript

447 Accesses
1 Citation
Explore all metrics

Abstract

Motivated by the task of clustering either d variables or d points into K groups, we investigate efficient algorithms to solve the Peng–Wei (P–W) K-means semi-definite programming (SDP) relaxation. The P–W SDP has been shown in the literature to have good statistical properties in a variety of settings, but remains intractable to solve in practice. To this end we propose FORCE, a new algorithm to solve this SDP relaxation. Compared to off-the-shelf interior point solvers, our method reduces the computational complexity of solving the SDP from ${\widetilde{{\mathcal {O}}}}(d^7\log \epsilon ^{-1})$ to ${\widetilde{{\mathcal {O}}}}(d^{6}K^{-2}\epsilon ^{-1})$ arithmetic operations for an $\epsilon $-optimal solution. Our method combines a primal first-order method with a dual optimality certificate search, which when successful, allows for early termination of the primal method. We show for certain variable clustering problems that, with high probability, FORCE is guaranteed to find the optimal solution to the SDP relaxation and provide a certificate of exact optimality. As verified by our numerical experiments, this allows FORCE to solve the P–W SDP with dimensions in the hundreds in only tens of seconds. For a variation of the P–W SDP where K is not known a priori a slight modification of FORCE reduces the computational complexity of solving this problem as well: from ${\widetilde{{\mathcal {O}}}}(d^7\log \epsilon ^{-1})$ using a standard SDP solver to ${\widetilde{{\mathcal {O}}}}(d^{4}\epsilon ^{-1})$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Article 02 August 2021

An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem

Optimal Bayesian estimators for latent variable cluster models

Article Open access 31 October 2017

Notes

If an event occurs with probability q(d) for dimension d, it is said to occur with high probability if $q(d)\ge 1 - C/d$ for all d sufficiently large.
Note the switch to $\delta $, with which we denote an additive error; $\epsilon $, used to quantify the error of FORCE, more properly corresponds to a type of relative additive error (Sect. 3).
The notation ${{\widetilde{{\mathcal {O}}}}}$ is used to suppress poly-log factors of d.
In this context, $\epsilon $ is a multiplicative error.
We also did implement a MMW algorithm for P–W SDP, but found it did not converge in practice; we suspect this is due to the presence $d^2$ equality constraints since at each iteration of MMW these are not satisfied, but we did not investigate this further.
Actually Renegar [17] works in the setting ${\mathbf {F}}={\mathbf {I}}$; what we present here is a slightly modified version and later we use the results of the corresponding, adjusted theoretical analysis.
The equalities are inexact because we make no assumptions on the mean of ${{\widehat{\varvec{\varGamma }}}}$, only its convergence rate.
The authors have made the code available on-line at http://bpames.people.ua.edu/software.html.
Similar results can be obtained for other graph structures, such as Band or Hub graphs.

References

Bunea, F., Giraud, C., Royer, M., Verzelen, N.: PECOK: a convex optimization approach to variable clustering. arXiv:1606.05100 (2016)
Dasgupta, S.: The hardness of k-means clustering. University of California, San Diego, Tech. rep. (2008)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012). https://doi.org/10.1016/j.tcs.2010.05.034
Article MathSciNet MATH Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Kumar, A., Kannan, R.: Clustering with spectral norm and the k-means algorithm. In: FOCS. arXiv:1004.1823v1 (2010)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA (2007)
Peng, J., Wei, Y.: Approximating K-means-type clustering via semidefinite programming. SIAM J. Optim. 18(1), 186–205 (2007). https://doi.org/10.1137/050641983
Article MathSciNet MATH Google Scholar
Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)
MATH Google Scholar
Awasthi, P., Bandeira, A.S.: Relax, no need to round: integrality of clustering formulations. In: ITCS, p. 27, https://doi.org/10.1145/2688073.2688116. arXiv:1408.4045 (2015)
Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: Probably certifiably correct k-means clustering. Mathematical Programming pp 1–29. https://doi.org/10.1007/s10107-016-1097-0. arXiv:1509.07983 (2016)
Bunea, F., Giraud, C., Luo, X., Royer, M., Verzelen, N.: Model assisted variable clustering: minimax-optimal recovery and algorithms. arXiv:1508.01939 (2018)
Bunea, F., Ning, Y., Wegkamp, M.: Overlapping variable clustering with statistical guarantees. arXiv:1704.06977v1 (2017)
Bandeira, A.S.: A note on probably certifiably correct algorithms. arXiv:1509.00824v1 (2015)
Ames, B.P.W.: Guaranteed clustering and biclustering via semidefinite programming. Math. Program. Ser. A 147(1–2), 429–465 (2014). https://doi.org/10.1007/s10107-013-0729-x. arXiv:1202.3663
Article MathSciNet MATH Google Scholar
Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: On the tightness of an SDP relaxation of k-means. arXiv:1505.04778 (2015)
Renegar, J.: Efficient first-order methods for linear programming and semidefinite programming. arXiv:1409.5832 (2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press. https://doi.org/10.1080/10556781003625177. arXiv:1111.6189v1 (2004)
Arora, S., Hazan, E., Kale, S.: Fast algorithms for approximate semidefinite programming using the multiplicative weights update method. In: FOCS (2005)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). https://doi.org/10.1561/2200000016. arXiv:1408.2927
Article MATH Google Scholar
Awasthi, P., Sheffet, O.: Improved spectral-norm bounds for clustering. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 37–49. arXiv:1206.3204v2 (2012)
Li, X., Chen, Y., Xu, J.: Convex relaxation methods for community detection. arXiv:1810.00315 (2018)
Abbe, E., Bandeira, A.S., Hall, G.: Exact recovery in the stochastic block model. IEEE: Trans. Inf. Theory 62(1) (2016)
Pirinen, A., Ames, B.: Clustering of sparse and approximately sparse graphs by semidefinite programming. arXiv:1603.05296 (2016)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer US. https://doi.org/10.1007/978-1-4419-8853-9. arXiv:1011.1669v3 (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. Ser. A 103, 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Oper. Res. 259(1991), 245–259 (2007)
MathSciNet MATH Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015). https://doi.org/10.1561/2200000050. arXiv:1405.4980v2
Article MATH Google Scholar
Ketchen, D., Shook, C.: The application of cluster analysis in strategic management research: an analysis and critique. Strategic Manag. J. 17(6), 441–458 (1996)
Article Google Scholar
Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999). https://doi.org/10.1006/NIMG.1998.0391
Article Google Scholar
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015). https://doi.org/10.1007/s10208-013-9150-3. arXiv:1204.3982
Article MathSciNet MATH Google Scholar
Andersen, E.D., Andersen, K.D.: The Mosek interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In: High Performance Optimization, Springer, pp. 197–232. https://doi.org/10.1007/978-1-4757-3216-0_8 (2000)
Sun, D., Toh, K.C., Yuan, Y., Zhao, X.Y.: SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0). arXiv:1710.10604 (2017)
Rudelson, M., Vershynin, R.: Hanson-Wright inequality and sub-Gaussian concentration. arXiv:1306.2872 (2013)
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. https://doi.org/10.1017/CBO9780511794308.006. arXiv:1011.3027 (2011)

Download references

Author information

Authors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544, USA
Carson Eisenach
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, 60208, USA
Han Liu

Authors

Carson Eisenach
View author publications
You can also search for this author in PubMed Google Scholar
Han Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carson Eisenach.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs omitted in Sect. 3.2

First we have a lemma regarding the concentration of the noise terms ${\mathbf {E}}$ about their mean. Sometimes rather than state these concentration results in terms of d, we state them in terms of $t \ge d$ to allow for more precise control of constants in our main theorems. We let ${\mathcal {E}}$ denote the event that $||{{\widehat{\varvec{\varGamma }}}} - \varvec{\varGamma }^*||_{\infty } \le p_1||\varvec{\varGamma }^*||_{\max }\sqrt{\frac{\log d}{n}}$.

Lemma 9

Under the notation and assumptions from previous sections, if $t\ge d$ then

$$\begin{aligned} \left| \sum _{j=1}^n {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}- {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right| \le c_0 ||\varvec{\varGamma }^*||_{\infty }\sqrt{|G^*_{i}|^2 n \log t}, \end{aligned}$$

with probability at least $1-\frac{2}{t}$, where $c_0 = c'(1+\sqrt{p_0})$ is a constant that depends only on $p_0$ and the absolute constant $c'$ from Proposition 11. Similarly with probability at least $1-\frac{2}{t}$, for $a \in G^*_{i}$,

$$\begin{aligned} \left| \sum _{j=1}^n {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j E^j_{a} - \gamma ^*_a \right| \le c_0 ||\varvec{\varGamma }^*||_{\infty }\sqrt{|G^*_{i}| n \log t} , \end{aligned}$$

Proof

To obtain the result, we observe that

$$\begin{aligned} \sum _{j=1}^n {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}- {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\end{aligned}$$

is a quadratic form of a $n|G^*_{i}|$-dimensional Gaussian random vector with independent entries. In particular, if we define ${\mathbf {M}}$ to be block diagonal with the ith $n\times n$ diagonal block as $(\varvec{\varGamma }^*_{G^*_{i},G^*_{i}})^{1/2}{\mathbf {1}}{\mathbf {1}}^T(\varvec{\varGamma }^*_{G^*_{i},G^*_{i}})^{1/2}$, then we can apply Corollary 1 with matrix ${\mathbf {M}}$. Because $||{\mathbf {M}}||_2 \le ||\varvec{\varGamma }^*||_{\infty }|G^*_{i}|$ and $||{\mathbf {M}}||_F \le ||\varvec{\varGamma }^*||_{\infty }|G^*_{i}|\sqrt{n}$, applying the corollary gives

$$\begin{aligned} \left| \sum _{j=1}^n {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}- {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right| \le c' ||\varvec{\varGamma }^*||_{\infty }\left( \sqrt{|G^*_{i}|^2 n \log t} + |G^*_{i}|\log t \right) , \end{aligned}$$

with probability at least $1-\frac{2}{t}$. Using the assumption $\log d \le p_0 n$ gives the desired result. The proof of the second statement follows similarly, taking instead the diagonal blocks of ${\mathbf {M}}$ as $(\varvec{\varGamma }^*_{G^*_{i},G^*_{i}})^{1/2}{\mathbf {1}}{\mathbf {e}}_a^T(\varvec{\varGamma }^*_{G^*_{i},G^*_{i}})^{1/2}$, giving $||{\mathbf {M}}||_2 \le ||\varvec{\varGamma }^*||_{\infty }\sqrt{|G^*_{i}|}$ and $||{\mathbf {M}}||_F \le ||\varvec{\varGamma }^*||_{\infty }\sqrt{n|G^*_{i}|}$. $\square $

1.1 Proof of Lemma 6

Step 1 For notation, $c_i$ will be used to denote absolute constants. The first step is to decompose ${\mathbf {Q}}_{i}^{\perp }(\varvec{X})$. Recall that under the G-Latent model, ${\mathbf {D}}= -{{\widehat{\varvec{\varSigma }}}} + {{\widehat{\varvec{\varGamma }}}}$. Substituting that into the expression for ${\mathbf {Q}}_{i}^{\perp }(\varvec{X})$ gives

$$\begin{aligned} {\mathbf {Q}}_{i}^{\perp }(\varvec{X})&= \mathop {\underbrace{-\frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T{\widehat{\varvec{\varSigma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T + \frac{1}{|G^*_{i}|}\left( {\mathbf {1}}{\mathbf {1}}^T {\widehat{\varvec{\varSigma }}}_{G^*_{i},G^*_{i}} + {\widehat{\varvec{\varSigma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}{\mathbf {1}}^T\right) - {\widehat{\varvec{\varSigma }}}_{G^*_{i},G^*_{i}}}}\limits _{\mathrm{(i)}}\\&\quad + \mathop {\underbrace{\frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T - \frac{1}{|G^*_{i}|}\left( {\mathbf {1}}{\mathbf {1}}^T {\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}} + {\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}{\mathbf {1}}^T\right) + {\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}}}\limits _{\mathrm{(ii)}}. \end{aligned}$$

For (i), we recall that by the definition of the G-Latent model that

$$\begin{aligned} {\widehat{\varvec{\varSigma }}}_{G^*_{i},G^*_{i}} = \frac{1}{n}\sum _{j=1}^n\varvec{X}_{G^*_{i}}^j\varvec{X}_{G^*_{i}}^{jT} = \sum _{j=1}^n(Z_i^j + \varvec{E}_{G^*_{i}}^j)(Z_i^j + \varvec{E}_{G^*_{i}}^j)^T. \end{aligned}$$

Plugging this into (i) and simplifying gives us that

$$\begin{aligned} \text {(i)} = \frac{1}{n}\sum _{j=1}^n\left( -\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}}{|G^*_{i}|^2} {\mathbf {1}}{\mathbf {1}}^T + \frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j}{|G^*_{i}|}\left( {\mathbf {1}}\varvec{E}_{G^*_{i}}^{jT} + \varvec{E}_{G^*_{i}}^j{\mathbf {1}}^T\right) - \varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT} \right) . \end{aligned}$$

Now we see that, again, the expression for ${\mathbf {Q}}_{i}^{\perp }(\varvec{X})$ has eight terms. We first show that each concentrates to its mean at the desired rate, and then use the triangle inequality to obtain the final result. Fortunately, we can subtract the mean for each of the 8 terms to the expression for ${\mathbf {Q}}_{i}^{\perp }(\varvec{X})$ as the means for (i) are offset by the means for (ii). To give the new decomposition of ${\mathbf {Q}}_{i}^{\perp }(\varvec{X})$ explicitly,

$$\begin{aligned} {\mathbf {Q}}_{i}^{\perp }(\varvec{X})&= -\mathop {\underbrace{\sum _{j=1}^n\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}}{n|G^*_{i}|^2} {\mathbf {1}}{\mathbf {1}}^T}}\limits _{\mathrm{(i).a}} + \mathop {\underbrace{\sum _{j=1}^n\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j}{n|G^*_{i}|}{\mathbf {1}}\varvec{E}_{G^*_{i}}^{jT}}}\limits _{\mathrm{(i).b}} + \mathop {\underbrace{\sum _{j=1}^n\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j}{n|G^*_{i}|}\varvec{E}_{G^*_{i}}^j{\mathbf {1}}^T}}\limits _{\mathrm{(i).c}} - \mathop {\underbrace{\frac{1}{n}\sum _{j=1}^n\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}}}\limits _{\mathrm{(i).d}}\nonumber \\&\quad + \mathop {\underbrace{\frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T}}\limits _{\mathrm{(ii).a}} - \mathop {\underbrace{\frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T {\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}}}\limits _{\mathrm{(ii).b}} + \mathop {\underbrace{\frac{1}{|G^*_{i}|}{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}{\mathbf {1}}^T}}\limits _{\mathrm{(ii).c}} + \mathop {\underbrace{{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}}}\limits _{\mathrm{(ii).d}}. \end{aligned}$$

(30)

Step 2 For the term (i).a, we can directly apply Lemma 9. Doing so, it follows immediately that with probability at least $1-\frac{2}{t}$

$$\begin{aligned} \left| \left| {\sum _{j=1}^n\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}}{n|G^*_{i}|^2} {\mathbf {1}}{\mathbf {1}}^T - \frac{1}{|G^*_{i}|^2}\left( {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T}_2\right| \right| \le c_0 ||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log t}{n}}. \end{aligned}$$

For the term (i).c (and so by symmetry (i).b), we observe that has the form ${\mathbf {u}}{\mathbf {v}}^T$ and that $||{\mathbf {u}}{\mathbf {v}}^T||_2 = ||{\mathbf {u}}||_2||{\mathbf {v}}||_2$. Therefore, we can apply Lemma 9 and obtain that with probability at least $1-2|G^*_{i}|/t^2$,

$$\begin{aligned} \bigg |\bigg |\sum _{j=1}^n\frac{{\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j}{n|G^*_{i}|}\varvec{E}_{G^*_{i}}^j{\mathbf {1}}^T - \frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T \varvec{\varGamma }^*_{G^*_{i},G^*_{i}}\bigg |\bigg |_2 \le c_0 ||\varvec{\varGamma }^*||_{\infty } \sqrt{\frac{2\log t}{n}}. \end{aligned}$$

Step 3 Now we control the term (i).d, the sample covariance matrix of the errors. We can directly apply Corollary 3 to obtain that with probability at least $1-2/t$

$$\begin{aligned}&\left| \left| {\frac{1}{n}\sum _{j=1}^n\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT} - \varvec{\varGamma }^*_{G^*_{i},G^*_{i}}}\right| \right| _2 \\&\quad \le ||\varvec{\varGamma }^*||_{\infty }\left( \frac{|G^*_{i}|}{n} + 2\frac{\sqrt{2|G^*_{i}|\log t}}{n} + 2\sqrt{\frac{|G^*_{i}|}{n}} + (2+\sqrt{p_0})\sqrt{\frac{2\log t}{n}}\right) \\&\quad \le ||\varvec{\varGamma }^*||_{\infty }\left( \frac{d}{n} + (2+2\sqrt{2p_0})\sqrt{\frac{d}{n}} + (2+\sqrt{p_0})\sqrt{\frac{2\log t}{n}}\right) . \end{aligned}$$

Step 4 For the terms in (ii), consider first (ii).a. We see that

$$\begin{aligned} \bigg |\bigg |\left( {\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T - \left( {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T\bigg |\bigg |_{\max } \le |G^*_{i}|||{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}} - \varvec{\varGamma }^*_{G^*_{i},G^*_{i}} ||_{\infty } \end{aligned}$$

Conditional on event ${\mathcal {E}}$,

$$\begin{aligned} \bigg |\bigg |\frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T{{\widehat{\varvec{\varGamma }}}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T - \frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T\bigg |\bigg |_{\max } \le \frac{p_1||\varvec{\varGamma }^*||_{\infty }}{|G^*_{i}|}\sqrt{\frac{\log d}{n}}. \end{aligned}$$

Because the matrices above are a multiple of ${\mathbf {1}}{\mathbf {1}}^T$, it follows that

$$\begin{aligned} \bigg |\bigg |\frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T{{\widehat{\varvec{\varGamma }}}}_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T - \frac{1}{|G^*_{i}|^2} \left( {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) {\mathbf {1}}{\mathbf {1}}^T\bigg |\bigg |_{2} \le p_1 ||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log d}{n}}. \end{aligned}$$

Next for (ii).b (and (ii).c by symmetry), we can see that

$$\begin{aligned} \bigg |\bigg |\frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T {{\widehat{\varvec{\varGamma }}}}_{G^*_{i},G^*_{i}} - \frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T \varvec{\varGamma }^*_{G^*_{i},G^*_{i}}\bigg |\bigg |_{2} = \frac{1}{|G^*_{i}|}\bigg |\bigg |{\mathbf {1}}{\mathbf {1}}^T \left( {{\widehat{\varvec{\varGamma }}}}_{G^*_{i},G^*_{i}} - \varvec{\varGamma }^*_{G^*_{i},G^*_{i}} \right) \bigg |\bigg |_{2}. \end{aligned}$$

(31)

Because ${\widehat{\varvec{\varGamma }}}$ and $\varvec{\varGamma }^*$ are diagonal, we can use event ${\mathcal {E}}$ and the fact that for matrices of the form ${\mathbf {u}}{\mathbf {v}}^T$, $||{\mathbf {u}}{\mathbf {v}}^T||_2 = ||{\mathbf {u}}||_2||{\mathbf {v}}||_2$, to obtain

$$\begin{aligned} \bigg |\bigg |\frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T {{\widehat{\varvec{\varGamma }}}}_{G^*_{i},G^*_{i}} - \frac{1}{|G^*_{i}|}{\mathbf {1}}{\mathbf {1}}^T \varvec{\varGamma }^*_{G^*_{i},G^*_{i}}\bigg |\bigg |_{2} \le p_1 |\varvec{\varGamma }^*|_{\infty }\sqrt{\frac{\log d}{n}} \end{aligned}$$

The same result is immediate for (ii).a by (4). Therefore by combining the above, applying the triangle inequality to (30), using that ${\mathcal {E}}$ occurs with probability at least $1-p_2/d^2$, and choosing $t=d^2$, we find that with probability at least $1-\frac{c_2}{d^2}$

$$\begin{aligned} ||{\mathbf {Q}}_{i}^{\perp }(\varvec{X})||_2 \le c_1||\varvec{\varGamma }^*||_{\infty }\left( \frac{d}{n} +\sqrt{\frac{d}{n}} + \sqrt{\frac{\log d}{n}} \right) , \end{aligned}$$

concluding the proof. $\square $

1.1.1 Proof of Lemma 7

Under the G-Latent model,

$$\begin{aligned} y'_{a,b}({\mathbf {X}},y_T) = -\mathop {\underbrace{{\widehat{\varSigma }}_{ a, b }}}\limits _{\mathrm{(i)}} + \mathop {\underbrace{y_a({\mathbf {X}},y_T)}}\limits _{\mathrm{(ii)}} + \mathop {\underbrace{y_b({\mathbf {X}},y_T)}}\limits _{\mathrm{(iii)}} \end{aligned}$$

Above, we saw that

$$\begin{aligned} y_a(\varvec{X},y_T) = \frac{1}{2|G^*_{i}|^2}{\mathbf {1}}^T{\mathbf {D}}_{G^*_{i},G^*_{i}}{\mathbf {1}}- \frac{1}{|G^*_{i}|}{\mathbf {D}}_{a,G^*_{i}}{\mathbf {1}}- \frac{1}{2|G^*_{i}|}y_T, \end{aligned}$$

and likewise for $y_b$. Below we denote by $\sigma _1 = \max _i C_{i,i}^*$ and $\sigma _2 = \max \{\max _i C_{i,i}^*,||\varvec{\varGamma }^*||_{\infty }\}$. Following the same decomposition as in Lemma 6, we get that

$$\begin{aligned} y_a(\varvec{X},y_T)&= -\frac{1}{2|G^*_{i}|^2}{\mathbf {1}}^T{\widehat{\varvec{\varSigma }}}_{ G^*_{i},G^*_{i} }{\mathbf {1}}+ \frac{1}{2|G^*_{i}|^2}{\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}+ \frac{1}{|G^*_{i}|}{\widehat{\varvec{\varSigma }}}_{ a,G^*_{i} } {\mathbf {1}}- {\widehat{\varGamma }}_{a,a} - \frac{1}{2|G^*_{i}|}y_T \\&= \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^n \frac{1}{2}(Z_i^l)^2}}\limits _{\mathrm{(ii).a}} - \mathop {\underbrace{\frac{1}{2n|G^*_{i}|^2}\sum _{l=1}^n ({\mathbf {1}}^T\varvec{E}_{G^*_{i}}^l)^2}}\limits _{\mathrm{(ii).b}} + \mathop {\underbrace{\frac{1}{n|G^*_{i}|}\sum _{l=1}^n E_a^l {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^l}}\limits _{\mathrm{(ii).c}} + \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^n E^l_a Z_i^l}}\limits _{\mathrm{(ii).d}}\\&\quad + \mathop {\underbrace{\frac{1}{2|G^*_{i}|^2}{\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}}}\limits _{\mathrm{(ii).e}} - \mathop {\underbrace{\frac{1}{|G^*_{i}|}{\widehat{\varGamma }}_{a,a}}}\limits _{\mathrm{(ii).f}} - \frac{1}{2|G^*_{i}|}y_T. \end{aligned}$$

As in the proof of Lemma 6, the means of (ii).b and (ii).c offset the means of (ii).e and (ii).f. To control terms (ii).b and (ii).c, by Lemma 9 with probability at least $1-1/t$,

$$\begin{aligned} \frac{1}{2n|G^*_{i}|^2}\sum _{j=1}^n\left( {\mathbf {1}}^T\varvec{E}_{G^*_{i}}^j\varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}- {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right) \le \frac{c_0 ||\varvec{\varGamma }^*||_{\infty }}{2} \sqrt{\frac{\log t}{n |G^*_{i}|^2 }}. \end{aligned}$$

Likewise, by Lemma 9,

$$\begin{aligned} \frac{1}{n|G^*_{i}|}\sum _{i=1}^n \left( E_a \varvec{E}_{G^*_{i}}^{jT}{\mathbf {1}}- \gamma _{a}^*\right) \ge -c_0 ||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log t}{n |G^*_{i}| }}, \end{aligned}$$

with probability at least $1-1/t$. Conditional on event ${\mathcal {E}}$, (4) shows that

$$\begin{aligned} \frac{1}{2|G^*_{i}|^2}\left( {\mathbf {1}}^T{\widehat{\varvec{\varGamma }}}_{G^*_{i},G^*_{i}}{\mathbf {1}}- {\mathbf {1}}^T\varvec{\varGamma }^*_{G^*_{i},G^*_{i}}{\mathbf {1}}\right)&\ge -p_1 ||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log d}{n |G^*_{i}| }},\\ \frac{1}{|G^*_{i}|}\left( {\widehat{\varGamma }}_{a,a} - \varGamma ^*_{a,a} \right)&\le p_1||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log d}{n |G^*_{i}| }}. \end{aligned}$$

Lastly, if we denote by $\sigma _1= \max _i C_{i,i}^*$, term (ii).d can be bounded by using Corollary 1, which gives that

$$\begin{aligned} \frac{1}{n}\sum _{l=1}^n E^l_a Z_i^l \ge - c_0||\varvec{\varGamma }^*||_{\infty }^{1/2}\sigma _1^{1/2} \sqrt{\frac{\log t}{n}}, \end{aligned}$$

(32)

with probability at least $1-1/t$. The same results can be obtained for $y_b$. For the terms in (i), we expand as before:

$$\begin{aligned} {\widehat{\varSigma }}_{ a,b, \, }= \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^n Z_i^lZ_j^l}}\limits _{\mathrm{(i).a}} + \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^nE_a^lZ_j^l}}\limits _{\mathrm{(i).b}} + \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^nE_b^lZ_i^l}}\limits _{\mathrm{(i).c}} + \mathop {\underbrace{\frac{1}{n}\sum _{l=1}^n E_a^l E_b^l}}\limits _{\mathrm{(i).d}}. \end{aligned}$$

Terms (i).b and (i).c can be bounded in the same way as (32). Term (i).d can be bounded by Corollary 1, giving that

$$\begin{aligned} \frac{1}{n}\sum _{l=1}^n E_a^l E_b^l \ge -c_0||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log t}{n}}, \end{aligned}$$

with probability at least $1-1/t$. All that remains is to bound the terms (i).a, (ii).a and (iii).a. Fortunately, these correspond to the population quantity $\varDelta {\mathbf {C}}_{ }^*$. Observing that this is just a quadratic form of 2n-dimensional Gaussian vector, we can applying Lemma 9. Doing so gives that

$$\begin{aligned} \frac{1}{2n}\left( \sum _{l=1}^n (Z_i^l)^2 + \sum _{l=1}^n (Z_j^l)^2 - 2\sum _{l=1}^n Z_i^lZ_j^l \right) \ge \frac{1}{2}\left( C_{i,i}^* + C_{j,j}^* - C_{i,j}^*\right) - 2c_0 \sigma _1\sqrt{\frac{\log t}{n}} \end{aligned}$$

with probability at least $1-1/t$. Combining all the bounds for (i)-(iii), using that ${\mathcal {E}}$ occurs with probability at least $1-p_2/d^3$, and selecting $t=d^3$, we can see that, with probability at least $1-c_1/d^3$

$$\begin{aligned} y'_{a,b}&\ge \frac{1}{2}(C_{i,i}^* + C_{j,j}^* - 2C_{i,j}^*) - \frac{1}{2|G^*_{i}|}y_T - \frac{1}{2|G^*_{j}|}y_T - c_1||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log d}{n |G^*_{i}| }} - c_2\sigma \sqrt{\frac{\log d}{n}}\\&\ge \frac{1}{2}\varDelta ({\mathbf {C}}_{ }^*) - \frac{1}{2|G^*_{i}|}y_T - \frac{1}{2|G^*_{j}|}y_T - c_1||\varvec{\varGamma }^*||_{\infty }\sqrt{\frac{\log d}{n |G^*_{i}| }} - c_2\sigma \sqrt{\frac{\log d}{n}}. \end{aligned}$$

$\square $

Appendix B: Some technical lemmas

Lemma 10

Let ${\mathbf {M}}$ be a $d\times d$ real, symmetric matrix of the form

$$\begin{aligned} {\mathbf {M}}= a{\mathbf {I}}+ b{\mathbf {1}}{\mathbf {1}}^T. \end{aligned}$$

where $a,b \in {\mathbb {R}}$ then ${\mathbf {M}}$ has eigenvalues $a+b$ with multiplicity 1 and a with multiplicity $d-1$. If $a,b > 0$, then ${\mathbf {M}}$ also has the property that

$$\begin{aligned} {\mathbf {M}}^{1/2} =\,&\sqrt{a}{\mathbf {I}}+ \frac{\sqrt{a+db} - \sqrt{a}}{d}{\mathbf {1}}{\mathbf {1}}^T,\\ {\mathbf {M}}^{-1} =\,&\frac{1}{a}{\mathbf {I}}- \frac{b}{a^2+abd}{\mathbf {1}}{\mathbf {1}}^T,\\ {\mathbf {M}}^{-1/2} =\,&\frac{1}{\sqrt{a}}{\mathbf {I}}- \frac{\sqrt{a+db} - \sqrt{a}}{d\sqrt{a^2 + dab}}{\mathbf {1}}{\mathbf {1}}^T. \end{aligned}$$

Proof

Using the Sherman-Morrison formula, a matrix of the form ${\mathbf {M}}= a{\mathbf {I}}+ b{\mathbf {1}}{\mathbf {1}}^T$, where $a,b > 0$ has the inverse

$$\begin{aligned} {\mathbf {M}}^{-1} = \frac{1}{a}{\mathbf {I}}- \frac{b}{a^2+abd}{\mathbf {1}}{\mathbf {1}}^T. \end{aligned}$$

Because ${\mathbf {M}}\succ 0$, all eigenvalues are strictly positive and denote by $\lambda _i$ and $q_i$ the eigenvalues and corresponding eigenvectors. Without loss of generality, let $q_i$ be orthonormal. Then we can write ${\mathbf {M}}= \sum _i \lambda _i {\mathbf {q}}_i{\mathbf {q}}_i^T$. By the form of ${\mathbf {M}}$, clearly $\frac{1}{\sqrt{d}}{\mathbf {1}}$ is always an eigenvector of ${\mathbf {M}}$ with eigenvalue $a+db$, so we can take $q_1 = \frac{1}{\sqrt{d}}{\mathbf {1}}$ and $\lambda _1 = 1$. The remaining $q_i$ span $({\mathbf {1}}{\mathbf {1}}^T)^{\perp }$ and have corresponding eigenvalues $\lambda _i = a$. Therefore,

$$\begin{aligned} {\mathbf {M}}^{1/2} = \frac{\sqrt{a+db}}{\sqrt{d}}{\mathbf {1}}{\mathbf {1}}^T + \sum _{i=2}^d \sqrt{a}{\mathbf {q}}_i{\mathbf {q}}_i^T. \end{aligned}$$

Because this eigen-decomposition is unique, the above gives

$$\begin{aligned} {\mathbf {M}}^{1/2} = \sqrt{a}{\mathbf {I}}+ \frac{\sqrt{a+db} - \sqrt{a}}{d}{\mathbf {1}}{\mathbf {1}}^T. \end{aligned}$$

Using the expression for ${\mathbf {M}}^{-1}$ given above, it follows that

$$\begin{aligned} {\mathbf {M}}^{-1/2} = \frac{1}{\sqrt{a}}{\mathbf {I}}- \frac{\sqrt{a+db} - \sqrt{a}}{d\sqrt{a^2 + dab}}{\mathbf {1}}{\mathbf {1}}^T. \end{aligned}$$

$\square $

The following result for quadratic forms of standard multivariate Gaussian random variables can be found in many forms in the literature (for example, Rudelson and Vershynin [34]).

Lemma 11

(Hanson-Wright inequality for Gaussian random variables) Let $\varvec{X}\sim N(0,{\mathbf {I}})$ be a d-dimensional random vector and let ${\mathbf {A}}$ be a $d \times d$ matrix in ${\mathbb {R}}^{d \times d}$. Then

$$\begin{aligned} {\mathbb {P}}\left( |\varvec{X}^T {\mathbf {A}}\varvec{X}- {\mathbb {E}}\left[ \varvec{X}^T {\mathbf {A}}\varvec{X}\right] | \ge t \right) \le 2 \exp \left( -c \min \left\{ \frac{t^2}{||{\mathbf {A}}||_F^2},\frac{t}{||{\mathbf {A}}||_2} \right\} \right) , \end{aligned}$$

for some absolute constant c.

In particular, the following corollary is useful.

Corollary 1

Let $\varvec{X}\sim N(0,{\mathbf {I}})$ be a d-dimensional random vector and let ${\mathbf {A}}$ be a $d \times d$ matrix in ${\mathbb {R}}^{d \times d}$. Then

$$\begin{aligned} {\mathbb {P}}\left( |\varvec{X}^T {\mathbf {A}}\varvec{X}- {\mathbb {E}}\left[ \varvec{X}^T {\mathbf {A}}\varvec{X}\right] | \ge ||{\mathbf {A}}||_F\sqrt{t} + ||{\mathbf {A}}||_2 t \right) \le 2 \exp \left( -c t \right) , \end{aligned}$$

for some absolute constant c. Equivalently,

$$\begin{aligned} |\varvec{X}^T {\mathbf {A}}\varvec{X}- {\mathbb {E}}\left[ \varvec{X}^T {\mathbf {A}}\varvec{X}\right] | \le c'\left( ||{\mathbf {A}}||_F\sqrt{\log t} + ||{\mathbf {A}}||_2 \log t \right) \end{aligned}$$

with probability at least $1-2/t$ for some absolute constant $c'$.

Below we are concerned with the rate of concentration in the spectral norm of a sample covariance matrix to its mean: $||{{\widehat{\varvec{\varSigma }}}} - \varvec{\varSigma }^*||_2$. If we write ${{\widehat{\varvec{\varSigma }}}} = \frac{1}{n}\varvec{X}^T\varvec{X}$, where $\varvec{X}$ refers to the $n\times d$ matrix in which the rows are the observations $\varvec{X}_i$, we see how such a result is directly applicable to the problem at hand. We repeat the statement of Gordon’s Theorem given in Vershynin [35] below as Proposition 1. We use the notation from Vershynin [35] of $s_{\min }$ and $s_{\max }$ to denote the smallest and largest singular values, respectively.

Proposition 1

Let $\varvec{X}$ be an $n \times d$ matrix whose entries are independent standard normal random variables. Then

$$\begin{aligned} \sqrt{n} - \sqrt{d} \le {\mathbb {E}}[s_{\min }(\varvec{X})] \le {\mathbb {E}}[s_{\max }(\varvec{X})] \le \sqrt{n} + \sqrt{d} \end{aligned}$$

Using the result on sub-Gaussian concentration of a Lipschitz function of independent random variables, we immediately obtain the following corollary (also given in Vershynin [35]).

Corollary 2

Let $\varvec{X}$ be an $n \times d$ matrix whose entries are independent standard normal random variables, then for every $t \ge 0$

$$\begin{aligned} \sqrt{n} - \sqrt{d} -t \le s_{\min }(\varvec{X}) \le s_{\max }(\varvec{X}) \le \sqrt{n} + \sqrt{d} + t \end{aligned}$$

with probability at least $1-2\exp (-t^2/2)$.

Proof

Observing that the functions $s_{\min }$ and $s_{\max }$ are 1-Lipschitz and using the sub-Gaussian tail bound, the result is immediate from the above. $\square $

Corollary 3

Let $\varvec{X}_i$, for $i=1,\dots ,n$, be a d-dimensional random vector sampled from $N(0,\varvec{\varSigma })$. Denoting ${\widehat{\varvec{\varSigma }}}_{ }:= n^{-1}\sum _{i=1}^n\varvec{X}_i\varvec{X}_i^\top $, we have that

$$\begin{aligned} \lambda _{\min }\left( {\widehat{\varvec{\varSigma }}}_{ }-\varvec{\varSigma }\right)&\ge \lambda _{\min }(\varvec{\varSigma })\left( \frac{d}{n} + \frac{2t\sqrt{d}}{n} + \frac{t^2}{n} - \frac{2(\sqrt{d} + t)}{\sqrt{n}} \right) ,\\ \lambda _{\max }\left( {\widehat{\varvec{\varSigma }}}_{ }-\varvec{\varSigma }\right)&\le \lambda _{\max }(\varvec{\varSigma })\left( \frac{d}{n} + \frac{2t\sqrt{d}}{n} + \frac{t^2}{n} + \frac{2(\sqrt{d} + t)}{\sqrt{n}} \right) , \end{aligned}$$

with probability at least $1-2\exp (-t^2/2)$.

Proof

This follows directly from Corollary 2. $\square $

Appendix C: Numerical results for unknown K

In this section we consider the exact same setup as in Sect. 6, but now we solve (22) using FORCE. For K unknown we do not have a readily available baseline comparison in high dimensions—prior work considers an ADMM based approach for (2), but not (22)—so we present only the results from FORCE and FORCE-P.

The main differences between the results displayed in Tables 3 and 4 and those in Sect. 6 are that (1) SDPNAL reached its iteration limit in some cases, leading to a higher average error than for K known, and (2) for the setting $(d,k,\rho ,\gamma ) = (500,100,0.3,3.0)$ the SNR was too low and FORCE did not converge. Otherwise, the results shown below, are very similar to those when K is known, and we refer the reader to the discussion in Sect. 6. The fact that we do not observe any significant difference in empirical performance between K fixed and K unknown—indeed K unknown appears to take longer in practice as intuition suggests—indicates that Theorem 1 either may not be tight or instances on which it achieves the worst-case bound are encountered with low probability.

Table 3 Benchmark results for low dimensional designs comparing FORCE and FORCE-P with MOSEK and SDPNAL+ with unknown K

Full size table

Table 4 Benchmark results for high dimensional designs comparing FORCE and FORCE-P for K unknown

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eisenach, C., Liu, H. Efficient, certifiably optimal clustering with applications to latent variable graphical models. Math. Program. 176, 137–173 (2019). https://doi.org/10.1007/s10107-019-01375-2

Download citation

Received: 11 December 2017
Accepted: 07 February 2019
Published: 16 February 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10107-019-01375-2

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient, certifiably optimal clustering with applications to latent variable graphical models

Abstract

Access this article

Similar content being viewed by others

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem

Optimal Bayesian estimators for latent variable cluster models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs omitted in Sect. 3.2

Lemma 9

Proof

1.1 Proof of Lemma 6

1.1.1 Proof of Lemma 7

Appendix B: Some technical lemmas

Lemma 10

Proof

Lemma 11

Corollary 1

Proposition 1

Corollary 2

Proof

Corollary 3

Proof

Appendix C: Numerical results for unknown K

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Efficient, certifiably optimal clustering with applications to latent variable graphical models

Abstract

Access this article

Similar content being viewed by others

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem

Optimal Bayesian estimators for latent variable cluster models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs omitted in Sect. 3.2

Lemma 9

Proof

1.1 Proof of Lemma 6

1.1.1 Proof of Lemma 7

Appendix B: Some technical lemmas

Lemma 10

Proof

Lemma 11

Corollary 1

Proposition 1

Corollary 2

Proof

Corollary 3

Proof

Appendix C: Numerical results for unknown K

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation