Advertisement

Worst-case complexity of cyclic coordinate descent: \(O(n^2)\) gap with randomized version

  • Ruoyu SunEmail author
  • Yinyu Ye
Full Length Paper Series A
  • 25 Downloads

Abstract

This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method, Kaczmarz method and projection onto convex sets (POCS) in this simple setting. We observe that the known provable complexity of C-CD can be \(\mathcal {O}(n^2)\) times slower than randomized coordinate descent (R-CD), but no example was proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least \(\mathcal {O}(n^4 \kappa _{\text {CD}} \log \frac{1}{\epsilon })\) operations, where \(\kappa _{\text {CD}}\) is related to Demmel’s condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be \(\mathcal {O}(n^2)\) times slower than R-CD, which has complexity \(\mathcal {O}( n^2 \kappa _{\text {CD}} \log \frac{1}{\epsilon })\). Note that for this example, the gap exists for any fixed update order, not just a particular order. An immediate consequence is that for Gauss–Seidel method, Kaczmarz method and POCS, there is also an \(\mathcal {O}(n^2) \) gap between the cyclic versions and randomized versions (for solving linear systems). One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a lower bound for the convergence rate. Finally, we design some numerical experiments to show that the size of the off-diagonal entries is an important indicator of the practical performance of C-CD.

Mathematics Subject Classification

65F10 65F15 65K05 90C30 

Notes

References

  1. 1.
    Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar
  4. 4.
    Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408–415. ACM (2008)Google Scholar
  5. 5.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)Google Scholar
  6. 6.
    Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for l1-regularized loss minimization. arXiv preprint arXiv:1105.5379 (2011)
  7. 7.
    Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106(495), 1125–1138 (2011)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Baligh, H., Hong, M., Liao, W.-C., Luo, Z.-Q., Razaviyayn, M., Sanjabi, M., Sun, R.: Cross-layer provision of future cellular networks: a WMMSE-based approach. IEEE Signal Process. Mag. 31(6), 56–68 (2014)Google Scholar
  10. 10.
    Sun, R., Baligh, H., Luo, Z.-Q.: Long-term transmit point association for coordinated multipoint transmission by stochastic optimization. In: 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 330–334. IEEE (2013)Google Scholar
  11. 11.
    Hong, M., Sun, R., Baligh, H., Luo, Z.-Q.: Joint base station clustering and beamformer design for partial coordinated transmission in heterogeneous networks. IEEE J. Sel. Areas Commun. 31(2), 226–240 (2013)Google Scholar
  12. 12.
    Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12(5), 963–972 (2003)Google Scholar
  13. 13.
    Bouman, C.A., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 5(3), 480–492 (1996)Google Scholar
  14. 14.
    Greenbaum, A.: Iterative Methods for Solving Linear Systems, vol. 17. SIAM, Philadelphia (1997)zbMATHGoogle Scholar
  15. 15.
    Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Anjos, M., Lasserre, J. (eds.) Handbook on Semidefinite, Conic and Polynomial Optimization. International Series in Operations Research & Management Science, vol. 166. Springer, Boston, MAGoogle Scholar
  16. 16.
    Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via nonconvex factorization. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 270–289. IEEE (2015)Google Scholar
  17. 17.
    Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  19. 19.
    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 103(9), 475–494 (2001)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Luo, Z.-Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Nesterov, Y.: Efficiency of coordiate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Zhaosong, L., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Qu, Z., Richtárik, P., Zhang, T.: Randomized dual coordinate ascent with arbitrary sampling. arXiv preprint arXiv:1411.5873 (2014)
  28. 28.
    Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to regularized empirical risk minimization. arXiv preprint arXiv:1407.1296 (2014)
  29. 29.
    Zhang, Y., Xiao, L.: Stochastic primal-dual coordinate method for regularized empirical risk minimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 353–361 (2015)Google Scholar
  30. 30.
    Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16(1), 285–322 (2015)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. 61(1), 19–46 (2015)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Hsieh, C.-J., Yu, H.-Fu, Dhillon, I.S.: PASSCoDe: parallel asynchronous stochastic dual co-ordinate descent. arXiv preprint arXiv:1504.01365 (2015)
  34. 34.
    Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pp. 147–156. IEEE (2013)Google Scholar
  35. 35.
    Allen-Zhu, Z., Orecchia, L.: Nearly-linear time positive LP solver with faster convergence rate. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pp. 229–236. ACM (2015)Google Scholar
  36. 36.
    Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput. 5(2), 201–226 (2013)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Sun, R., Luo, Z.-Q., Ye, Y.: On the expected convergence of randomly permuted ADMM. arXiv preprint arXiv:1503.06387 (2015)
  38. 38.
    Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. In: Advances in Neural Information Processing Systems, pp. 1306–1314 (2015)Google Scholar
  39. 39.
    Lee, C.-P., Wright, S.J.: Random permutations fix a worst case for cyclic coordinate descent. arXiv preprint arXiv:1607.08320 (2016)
  40. 40.
    Xiao, L., Yu, A.W., Lin, Q., Chen, W.: DSCOVR: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization. arXiv preprint arXiv:1710.05080 (2017)
  41. 41.
    Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Beck, A.: On the convergence of alternating minimization with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent method. SIAM J. Optim. 23(1), 576–601 (2013)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Hong, M., Wang, X., Razaviyayn, M., Luo, Z.-Q.: Iteration complexity analysis of block coordinate descent methods. Preprint, arXiv:1310.6957 (2013)
  45. 45.
    Oswald, P.: On the convergence rate of SOR: a worst case estimate. Computing 52(3), 245–255 (1994)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Deutsch, F.: The method of alternating orthogonal projections. In: Singh, S.P. (eds.) Approximation Theory, Spline Functions and Applications. NATO ASI Series (Series C: Mathematical and Physical Sciences), vol. 356. Springer, DordrechtGoogle Scholar
  47. 47.
    Von Neumann, J.: Functional Operators. Volume II, the Geometry of Orthogonal Spaces (1950). This is a reprint of mimeographed lecture notes first distributed in (1933)Google Scholar
  48. 48.
    Smith, K.T., Solmon, D.C., Wagner, S.L.: Practical and mathematical aspects of the problem of reconstructing objects from radiographs. Bull. Am. Math. Soc. 83(6), 1227–1270 (1977)MathSciNetzbMATHGoogle Scholar
  49. 49.
    Kayalar, S., Weinert, H.L.: Error bounds for the method of alternating projections. Math. Control Signals Syst. (MCSS) 1(1), 43–59 (1988)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Deutsch, F., Hundal, H.: The rate of convergence for the method of alternating projections, II. J. Math. Anal. Appl. 205(2), 381–405 (1997)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Bauschke, H.H., Borwein, J.M., Lewis, A.S.: The method of cyclic projections for closed convex sets in hilbert space. Contemp. Math. 204, 1–38 (1997)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Escalante, R., Raydan, M.: Alternating Projection Methods. SIAM, Philadelphia (2011)zbMATHGoogle Scholar
  53. 53.
    Galántai, A.: Projectors and Projection Methods, vol. 6. Springer, Berlin (2013)zbMATHGoogle Scholar
  54. 54.
    Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de lAcademie Polonaise des Sciences et des Lettres 35, 355–357 (1937)zbMATHGoogle Scholar
  55. 55.
    Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009)MathSciNetzbMATHGoogle Scholar
  56. 56.
    Von Neumann, J.: On rings of operators. Reduction theory. Ann. Math. 50(2), 401–485 (1949)zbMATHGoogle Scholar
  57. 57.
    Halperin, I.: The product of projection operators. Acta Sci. Math. (Szeged) 23(1–2), 96–99 (1962)MathSciNetzbMATHGoogle Scholar
  58. 58.
    Young, D.: Iterative methods for solving partial difference equations of elliptic type. Trans. Am. Math. Soc. 76(1), 92–111 (1954)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Widlund, O.B.: On the effects of scaling of the Peaceman–Rachford method. Math. Comput. 25(113), 33–41 (1971)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Galántai, A.: On the rate of convergence of the alternating projection method in finite dimensional spaces. J. Math. Anal. Appl. 310(1), 30–44 (2005)MathSciNetzbMATHGoogle Scholar
  61. 61.
    Necoara, I., Richtarik, P., Patrascu, A.: Randomized projection methods for convex feasibility problems: conditioning and convergence rates. arXiv preprint arXiv:1801.04873 (2018)
  62. 62.
    Angelos, J.R., Cowen, C.C., Narayan, S.K.: Triangular truncation and finding the norm of a Hadamard multiplier. Linear Algebra Appl. 170, 117–135 (1992)MathSciNetzbMATHGoogle Scholar
  63. 63.
    Uherka, D.J., Sergott, A.M.: On the continuous dependence of the roots of a polynomial on its coefficients. Am. Math. Mon. 84(5), 368–370 (1977)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of Industrial and Enterprise Systems EngineeringUniveristy of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.Department of Management Science and EngineeringStanford UniversityStanfordUSA

Personalised recommendations