Advertisement

Certifying Global Optimality of Graph Cuts via Semidefinite Relaxation: A Performance Guarantee for Spectral Clustering

  • Shuyang LingEmail author
  • Thomas Strohmer
Article

Abstract

Spectral clustering has become one of the most widely used clustering techniques when the structure of the individual clusters is non-convex or highly anisotropic. Yet, despite its immense popularity, there exists fairly little theory about performance guarantees for spectral clustering. This issue is partly due to the fact that spectral clustering typically involves two steps which complicated its theoretical analysis: First, the eigenvectors of the associated graph Laplacian are used to embed the dataset, and second, k-means clustering algorithm is applied to the embedded dataset to get the labels. This paper is devoted to the theoretical foundations of spectral clustering and graph cuts. We consider a convex relaxation of graph cuts, namely ratio cuts and normalized cuts, that makes the usual two-step approach of spectral clustering obsolete and at the same time gives rise to a rigorous theoretical analysis of graph cuts and spectral clustering. We derive deterministic bounds for successful spectral clustering via a spectral proximity condition that naturally depends on the algebraic connectivity of each cluster and the inter-cluster connectivity. Moreover, we demonstrate by means of some popular examples that our bounds can achieve near optimality. Our findings are also fundamental to the theoretical understanding of kernel k-means. Numerical simulations confirm and complement our analysis.

Keywords

Semidefinite programming Graph partition Unsupervised learning Spectral clustering Community detection Graph Laplacian 

Mathematics Subject Classification

90C34 90C27 90C46 60B20 

Notes

Acknowledgements

S.L. thanks Afonso S. Bandeira for fruitful discussions about stochastic block models. The authors are also grateful to the anonymous referees for their careful reading of this paper and suggestions.

References

  1. 1.
    E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.MathSciNetGoogle Scholar
  2. 2.
    E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.MathSciNetzbMATHGoogle Scholar
  3. 3.
    N. Agarwal, A. S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model using semidefinite programming. In Compressed Sensing and its Applications, pages 125–162. Springer, 2017.Google Scholar
  4. 4.
    D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245–248, 2009.zbMATHGoogle Scholar
  5. 5.
    A. A. Amini and E. Levina. On semidefinite relaxations for the block model. The Annals of Statistics, 46(1):149–179, 2018.MathSciNetzbMATHGoogle Scholar
  6. 6.
    S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):5, 2009.MathSciNetzbMATHGoogle Scholar
  7. 7.
    D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.Google Scholar
  8. 8.
    P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200. ACM, 2015.Google Scholar
  9. 9.
    P. Awasthi and O. Sheffet. Improved spectral-norm bounds for clustering. In APPROX-RANDOM, pages 37–49. Springer, 2012.Google Scholar
  10. 10.
    A. S. Bandeira. Random laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, Apr 2018.Google Scholar
  11. 11.
    M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, pages 585–591, 2002.Google Scholar
  12. 12.
    M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.zbMATHGoogle Scholar
  13. 13.
    M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In International Conference on Computational Learning Theory, pages 486–500. Springer, 2005.Google Scholar
  14. 14.
    A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM, 2001.Google Scholar
  15. 15.
    J. A. Bondy, U. S. R. Murty, et al. Graph Theory with Applications, volume 290. Macmillan London, 1976.Google Scholar
  16. 16.
    S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.Google Scholar
  17. 17.
    A. E. Brouwer and W. H. Haemers. Spectra of Graphs. Springer Science+Business Media, 2011.Google Scholar
  18. 18.
    F. R. Chung. Spectral Graph Theory, volume 92. American Mathematical Society, 1997.Google Scholar
  19. 19.
    R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.MathSciNetzbMATHGoogle Scholar
  20. 20.
    R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–7431, 2005.zbMATHGoogle Scholar
  21. 21.
    C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.MathSciNetzbMATHGoogle Scholar
  22. 22.
    I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM, 2004.Google Scholar
  23. 23.
    M. P. Do Carmo. Riemannian Geometry. Birkhauser, 1992.Google Scholar
  24. 24.
    G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996.Google Scholar
  25. 25.
    T. H. Grönwall. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Annals of Mathematics, pages 292–296, 1919.Google Scholar
  26. 26.
    L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9):1074–1085, 1992.Google Scholar
  27. 27.
    T. Hastie, R. Tibshirani, and J. Friedman. Unsupervised learning. In The Elements of Statistical Learning, pages 485–585. Springer, 2009.Google Scholar
  28. 28.
    T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clustering. Mathematical Programming, 165(2):605–642, 2017.MathSciNetzbMATHGoogle Scholar
  29. 29.
    A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010.Google Scholar
  30. 30.
    A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 299–308. IEEE, 2010.Google Scholar
  31. 31.
    J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.MathSciNetzbMATHGoogle Scholar
  32. 32.
    D. A. Levin, Y. Peres, and E. L. Wilmer. Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017.Google Scholar
  33. 33.
    X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together? k-means, proximity, and conic programming. Mathematical Programming, pages 1–47, 2018.Google Scholar
  34. 34.
    S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.MathSciNetzbMATHGoogle Scholar
  35. 35.
    M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In International Workshop on Algorithms and Computation, pages 274–285. Springer, 2009.Google Scholar
  36. 36.
    D. G. Mixon, S. Villar, and R. Ward. Clustering subgaussian mixtures by semidefinite programming. Information and Inference: A Journal of the IMA, 6(4):389–415, 2017.MathSciNetzbMATHGoogle Scholar
  37. 37.
    A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856, 2002.Google Scholar
  38. 38.
    J. Peng and Y. Wei. Approximating k-means-type clustering via semidefinite programming. SIAM Journal on Optimization, 18(1):186–205, 2007.MathSciNetzbMATHGoogle Scholar
  39. 39.
    K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, pages 1878–1915, 2011.Google Scholar
  40. 40.
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.Google Scholar
  41. 41.
    A. Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.MathSciNetzbMATHGoogle Scholar
  42. 42.
    A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.MathSciNetzbMATHGoogle Scholar
  43. 43.
    G. W. Stewart. Perturbation theory for the singular value decomposition. Technical Report CS-TR-2539, University of Maryland, Sep 1990.Google Scholar
  44. 44.
    M. Tepper, A. M. Sengupta, and D. Chklovskii. Clustering is semidefinitely not that hard: Nonnegative sdp for manifold disentangling. The Journal of Machine Learning Research, 19(1):3208–3237, 2018.MathSciNetzbMATHGoogle Scholar
  45. 45.
    N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace-Beltrami operator. arXiv preprint arXiv:1801.10108, 2018.
  46. 46.
    N. G. Trillos and D. Slepčev. A variational approach to the consistency of spectral clustering. Applied and Computational Harmonic Analysis, 45(2):239–281, 2018.MathSciNetzbMATHGoogle Scholar
  47. 47.
    J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, 2012.MathSciNetzbMATHGoogle Scholar
  48. 48.
    R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications, chapter 5. Cambridge University Press, 2012.Google Scholar
  49. 49.
    U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.MathSciNetGoogle Scholar
  50. 50.
    U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.Google Scholar
  51. 51.
    D. Wagner and F. Wagner. Between min cut and graph bisection. In International Symposium on Mathematical Foundations of Computer Science, pages 744–750. Springer, 1993.Google Scholar
  52. 52.
    W. Walter. Ordinary Differential Equations, volume 1(182). Springer Science and Media, 1998.Google Scholar
  53. 53.
    E. P. Xing and M. I. Jordan. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley, Jun 2003.Google Scholar
  54. 54.
    B. Yan, P. Sarkar, and X. Cheng. Provable estimation of the number of blocks in block models. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1185–1194. PMLR, 09–11 Apr 2018.Google Scholar

Copyright information

© SFoCM 2019

Authors and Affiliations

  1. 1.Courant Institute of Mathematical SciencesNew York UniversityNew YorkUSA
  2. 2.Department of MathematicsUniversity of California DavisDavisUSA

Personalised recommendations