# Certifying Global Optimality of Graph Cuts via Semidefinite Relaxation: A Performance Guarantee for Spectral Clustering

## Abstract

Spectral clustering has become one of the most widely used clustering techniques when the structure of the individual clusters is non-convex or highly anisotropic. Yet, despite its immense popularity, there exists fairly little theory about performance guarantees for spectral clustering. This issue is partly due to the fact that spectral clustering typically involves two steps which complicated its theoretical analysis: First, the eigenvectors of the associated graph Laplacian are used to embed the dataset, and second, k-means clustering algorithm is applied to the embedded dataset to get the labels. This paper is devoted to the theoretical foundations of spectral clustering and graph cuts. We consider a convex relaxation of graph cuts, namely ratio cuts and normalized cuts, that makes the usual two-step approach of spectral clustering obsolete and at the same time gives rise to a rigorous theoretical analysis of graph cuts and spectral clustering. We derive deterministic bounds for successful spectral clustering via a *spectral proximity condition* that naturally depends on the algebraic connectivity of each cluster and the inter-cluster connectivity. Moreover, we demonstrate by means of some popular examples that our bounds can achieve near optimality. Our findings are also fundamental to the theoretical understanding of kernel k-means. Numerical simulations confirm and complement our analysis.

## Keywords

Semidefinite programming Graph partition Unsupervised learning Spectral clustering Community detection Graph Laplacian## Mathematics Subject Classification

90C34 90C27 90C46 60B20## Notes

### Acknowledgements

S.L. thanks Afonso S. Bandeira for fruitful discussions about stochastic block models. The authors are also grateful to the anonymous referees for their careful reading of this paper and suggestions.

## References

- 1.E. Abbe. Community detection and stochastic block models: recent developments.
*The Journal of Machine Learning Research*, 18(1):6446–6531, 2017.MathSciNetGoogle Scholar - 2.E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model.
*IEEE Transactions on Information Theory*, 62(1):471–487, 2016.MathSciNetzbMATHGoogle Scholar - 3.N. Agarwal, A. S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model using semidefinite programming. In
*Compressed Sensing and its Applications*, pages 125–162. Springer, 2017.Google Scholar - 4.D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering.
*Machine learning*, 75(2):245–248, 2009.zbMATHGoogle Scholar - 5.A. A. Amini and E. Levina. On semidefinite relaxations for the block model.
*The Annals of Statistics*, 46(1):149–179, 2018.MathSciNetzbMATHGoogle Scholar - 6.S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning.
*Journal of the ACM (JACM)*, 56(2):5, 2009.MathSciNetzbMATHGoogle Scholar - 7.D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In
*Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms*, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.Google Scholar - 8.P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. In
*Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science*, pages 191–200. ACM, 2015.Google Scholar - 9.P. Awasthi and O. Sheffet. Improved spectral-norm bounds for clustering. In
*APPROX-RANDOM*, pages 37–49. Springer, 2012.Google Scholar - 10.A. S. Bandeira. Random laplacian matrices and convex relaxations.
*Foundations of Computational Mathematics*, 18(2):345–379, Apr 2018.Google Scholar - 11.M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In
*Advances in Neural Information Processing Systems*, pages 585–591, 2002.Google Scholar - 12.M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.
*Neural Computation*, 15(6):1373–1396, 2003.zbMATHGoogle Scholar - 13.M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In
*International Conference on Computational Learning Theory*, pages 486–500. Springer, 2005.Google Scholar - 14.A. Ben-Tal and A. Nemirovski.
*Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications*. SIAM, 2001.Google Scholar - 15.J. A. Bondy, U. S. R. Murty, et al.
*Graph Theory with Applications*, volume 290. Macmillan London, 1976.Google Scholar - 16.S. Boyd and L. Vandenberghe.
*Convex Optimization*. Cambridge University Press, 2004.Google Scholar - 17.A. E. Brouwer and W. H. Haemers.
*Spectra of Graphs*. Springer Science+Business Media, 2011.Google Scholar - 18.F. R. Chung. Spectral Graph Theory, volume 92. American Mathematical Society, 1997.Google Scholar
- 19.R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.MathSciNetzbMATHGoogle Scholar
- 20.R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps.
*Proceedings of the National Academy of Sciences of the United States of America*, 102(21):7426–7431, 2005.zbMATHGoogle Scholar - 21.C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii.
*SIAM Journal on Numerical Analysis*, 7(1):1–46, 1970.MathSciNetzbMATHGoogle Scholar - 22.I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In
*Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, pages 551–556. ACM, 2004.Google Scholar - 23.M. P. Do Carmo.
*Riemannian Geometry*. Birkhauser, 1992.Google Scholar - 24.G. H. Golub and C. F. Van Loan.
*Matrix Computations*. The Johns Hopkins University Press, 3rd edition, 1996.Google Scholar - 25.T. H. Grönwall. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations.
*Annals of Mathematics*, pages 292–296, 1919.Google Scholar - 26.L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering.
*IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 11(9):1074–1085, 1992.Google Scholar - 27.T. Hastie, R. Tibshirani, and J. Friedman. Unsupervised learning. In
*The Elements of Statistical Learning*, pages 485–585. Springer, 2009.Google Scholar - 28.T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clustering.
*Mathematical Programming*, 165(2):605–642, 2017.MathSciNetzbMATHGoogle Scholar - 29.A. K. Jain. Data clustering: 50 years beyond k-means.
*Pattern Recognition Letters*, 31(8):651–666, 2010.Google Scholar - 30.A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In
*Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on*, pages 299–308. IEEE, 2010.Google Scholar - 31.J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models.
*The Annals of Statistics*, 43(1):215–237, 2015.MathSciNetzbMATHGoogle Scholar - 32.D. A. Levin, Y. Peres, and E. L. Wilmer.
*Markov Chains and Mixing Times*, volume 107. American Mathematical Society, 2017.Google Scholar - 33.X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together? k-means, proximity, and conic programming.
*Mathematical Programming*, pages 1–47, 2018.Google Scholar - 34.S. Lloyd. Least squares quantization in PCM.
*IEEE Transactions on Information Theory*, 28(2):129–137, 1982.MathSciNetzbMATHGoogle Scholar - 35.M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In
*International Workshop on Algorithms and Computation*, pages 274–285. Springer, 2009.Google Scholar - 36.D. G. Mixon, S. Villar, and R. Ward. Clustering subgaussian mixtures by semidefinite programming.
*Information and Inference: A Journal of the IMA*, 6(4):389–415, 2017.MathSciNetzbMATHGoogle Scholar - 37.A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In
*Advances in Neural Information Processing Systems*, pages 849–856, 2002.Google Scholar - 38.J. Peng and Y. Wei. Approximating k-means-type clustering via semidefinite programming.
*SIAM Journal on Optimization*, 18(1):186–205, 2007.MathSciNetzbMATHGoogle Scholar - 39.K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.
*The Annals of Statistics*, pages 1878–1915, 2011.Google Scholar - 40.J. Shi and J. Malik. Normalized cuts and image segmentation.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 22(8):888–905, 2000.Google Scholar - 41.A. Singer. From graph to manifold Laplacian: The convergence rate.
*Applied and Computational Harmonic Analysis*, 21(1):128–134, 2006.MathSciNetzbMATHGoogle Scholar - 42.A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from random samples.
*Information and Inference: A Journal of the IMA*, 6(1):58–123, 2016.MathSciNetzbMATHGoogle Scholar - 43.G. W. Stewart. Perturbation theory for the singular value decomposition. Technical Report CS-TR-2539, University of Maryland, Sep 1990.Google Scholar
- 44.M. Tepper, A. M. Sengupta, and D. Chklovskii. Clustering is semidefinitely not that hard: Nonnegative sdp for manifold disentangling.
*The Journal of Machine Learning Research*, 19(1):3208–3237, 2018.MathSciNetzbMATHGoogle Scholar - 45.N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace-Beltrami operator.
*arXiv preprint*arXiv:1801.10108, 2018. - 46.N. G. Trillos and D. Slepčev. A variational approach to the consistency of spectral clustering.
*Applied and Computational Harmonic Analysis*, 45(2):239–281, 2018.MathSciNetzbMATHGoogle Scholar - 47.J. A. Tropp. User-friendly tail bounds for sums of random matrices.
*Foundations of Computational Mathematics*, 12(4):389–434, 2012.MathSciNetzbMATHGoogle Scholar - 48.R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok, editors,
*Compressed Sensing: Theory and Applications*, chapter 5. Cambridge University Press, 2012.Google Scholar - 49.U. Von Luxburg. A tutorial on spectral clustering.
*Statistics and Computing*, 17(4):395–416, 2007.MathSciNetGoogle Scholar - 50.U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering.
*The Annals of Statistics*, pages 555–586, 2008.Google Scholar - 51.D. Wagner and F. Wagner. Between min cut and graph bisection. In
*International Symposium on Mathematical Foundations of Computer Science*, pages 744–750. Springer, 1993.Google Scholar - 52.W. Walter.
*Ordinary Differential Equations*, volume 1(182). Springer Science and Media, 1998.Google Scholar - 53.E. P. Xing and M. I. Jordan. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley, Jun 2003.Google Scholar
- 54.B. Yan, P. Sarkar, and X. Cheng. Provable estimation of the number of blocks in block models. In
*Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics*, volume 84 of*Proceedings of Machine Learning Research*, pages 1185–1194. PMLR, 09–11 Apr 2018.Google Scholar