Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis


In this paper we study nonconvex and nonsmooth multi-block optimization over Euclidean embedded (smooth) Riemannian submanifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. By utilizing the embedding structure, we develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings, where the duality is with respect to the embedded Euclidean spaces. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of \(\epsilon \)-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms possess an iteration complexity of \(O(1/\epsilon ^2)\) to reach an \(\epsilon \)-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the \(\ell _q\) regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.

This is a preview of subscription content, access via your institution.


  1. 1.

    Absil, P.A., Baker, C.G., Gallivan, K.A.: Convergence analysis of Riemannian trust-region methods. Technical report (2006)

  2. 2.

    Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)

    MathSciNet  MATH  Google Scholar 

  3. 3.

    Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  4. 4.

    Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)

    MathSciNet  MATH  Google Scholar 

  5. 5.

    Ballani, J., Grasedyck, L., Kluge, M.: Black box approximation of tensors in hierarchical Tucker format. Linear Algebra Appl. 438(2), 639–657 (2013)

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. (2016)

  7. 7.

    Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas–Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2018)

    MathSciNet  MATH  Google Scholar 

  9. 9.

    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)

    MathSciNet  MATH  Google Scholar 

  10. 10.

    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

    MathSciNet  MATH  Google Scholar 

  11. 11.

    Chen, Y., Li, X., Xu, J.: Convexified modularity maximization for degree-corrected stochastic block models. arXiv preprint arXiv:1512.08425 (2015)

  12. 12.

    Clarke, F.H.: Nonsmooth analysis and optimization. Proc. Int. Congr. Math. 5, 847–853 (1983)

    Google Scholar 

  13. 13.

    De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

    MathSciNet  MATH  Google Scholar 

  14. 14.

    Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: NIPS, vol. 18 (2005)

  15. 15.

    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Edelman, A., Arias, T.A., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    MathSciNet  MATH  Google Scholar 

  17. 17.

    Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)

    MathSciNet  MATH  Google Scholar 

  18. 18.

    Frieze, A., Jerrum, M.: Improved approximation algorithms for MAX k-CUT and MAX bisection. Algorithmica 18(1), 67–81 (1997)

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)

    MathSciNet  Google Scholar 

  20. 20.

    Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    MathSciNet  MATH  Google Scholar 

  21. 21.

    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Ghosh, S., Lam, H.: Computing worst-case input models in stochastic simulation. arXiv preprint arXiv:1507.05609 (2015)

  24. 24.

    Ghosh, S., Lam, H.: Mirror descent stochastic approximation for computing worst-case stochastic input models. In: Winter Simulation Conference, 2015, pp. 425–436. IEEE (2015)

  25. 25.

    Grant, M., Boyd, S., Ye, Y.: CVX: MATLAB software for disciplined convex programming (2008)

  26. 26.

    Hong, M.: Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: algorithms, convergence, and applications. arXiv preprint arXiv:1604.00543 (2016)

  27. 27.

    Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

    MathSciNet  MATH  Google Scholar 

  28. 28.

    Hosseini, S., Pouryayevali, M.R.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Fuel Energy Abstr. 74(12), 3884–3895 (2011)

    MathSciNet  MATH  Google Scholar 

  29. 29.

    Huper, K., Trumpf, J.: Newton-like methods for numerical optimization on manifolds. In: Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference, vol. 1, pp. 136–139. IEEE (2004)

  30. 30.

    Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)

  31. 31.

    Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. Preprint arXiv:1705.09059 (2017)

  33. 33.

    Jin, J.: Fast community detection by score. Ann. Stat. 43(1), 57–89 (2015)

    MathSciNet  MATH  Google Scholar 

  34. 34.

    Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic variance reduced gradient on Grassmann manifold. arXiv preprint arXiv:1605.07367 (2016)

  35. 35.

    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Kovnatsky, A., Glashoff, K., Bronstein, M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)

  37. 37.

    Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)

    MathSciNet  MATH  Google Scholar 

  38. 38.

    Lai, Z., Xu, Y., Chen, Q., Yang, J., Zhang, D.: Multilinear sparse principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1942–1950 (2014)

    Google Scholar 

  39. 39.

    Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst. 19, 801 (2007)

    Google Scholar 

  40. 40.

    Lee, J.M.: Introduction to Smooth Manifolds. Springer, New York (2013)

    MATH  Google Scholar 

  41. 41.

    Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    MathSciNet  MATH  Google Scholar 

  42. 42.

    Liu, H., Wu, W., So, A.M.-C.: Quadratic optimization with orthogonality constraints: explicit Lojasiewicz exponent and linear convergence of line-search methods. In: ICML, pp. 1158–1167 (2016)

  43. 43.

    Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)

    Google Scholar 

  44. 44.

    Luenberger, D.G.: The gradient projection method along geodesics. Manag. Sci. 18(11), 620–631 (1972)

    MathSciNet  MATH  Google Scholar 

  45. 45.

    Motreanu, D., Pavel, N.H.: Quasi-tangent vectors in flow-invariance and optimization problems on Banach manifolds. J. Math. Anal. Appl. 88(1), 116–132 (1982)

    MathSciNet  MATH  Google Scholar 

  46. 46.

    Nemirovski, A.: Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program. 109(2), 283–317 (2007)

    MathSciNet  MATH  Google Scholar 

  47. 47.

    Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 9, no. 4, p. 1556. Springer

  48. 48.

    Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)

    MathSciNet  MATH  Google Scholar 

  49. 49.

    Oseledets, I.V., Tyrtyshnikov, E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432(1), 70–88 (2010)

    MathSciNet  MATH  Google Scholar 

  50. 50.

    Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)

    Google Scholar 

  51. 51.

    Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)

  52. 52.

    Rockafellar, R.T.: Clarke’s tangent cones and the boundaries of closed sets in \(\mathbb{R}^n\). Nonlinear Anal. Theory Methods Appl. 3, 145–154 (1979)

    MATH  Google Scholar 

  53. 53.

    Smith, S.T.: Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)

    MathSciNet  MATH  Google Scholar 

  54. 54.

    Srebro, N., Jaakkola, T.: Weighted low-rank approximations. In: ICML, vol. 3, pp. 720–727 (2003)

  55. 55.

    Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere II: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)

    MathSciNet  MATH  Google Scholar 

  56. 56.

    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  57. 57.

    Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)

  58. 58.

    Wang, S., Sun, M., Chen, Y., Pang, E., Zhou, C.: STPCA: sparse tensor principal component analysis for feature extraction. In: 21st International Conference on Pattern Recognition, 2012, pp. 2278–2281. IEEE (2012)

  59. 59.

    Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    MathSciNet  MATH  Google Scholar 

  60. 60.

    Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)

    MathSciNet  MATH  Google Scholar 

  61. 61.

    Wiegele, A.: Biq Mac library—a collection of max-cut and quadratic 0–1 programming instances of medium size. Preprint (2007)

  62. 62.

    Xu, Y.: Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)

    MathSciNet  MATH  Google Scholar 

  63. 63.

    Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)

    MathSciNet  MATH  Google Scholar 

  64. 64.

    Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)

    MathSciNet  MATH  Google Scholar 

  65. 65.

    Ye, Y.: A. 699-approximation algorithm for max-bisection. Math. Program. 90(1), 101–111 (2001)

    MathSciNet  MATH  Google Scholar 

  66. 66.

    Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)

  67. 67.

    Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. arXiv preprint arXiv:1602.06053 (2016)

  68. 68.

    Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)

    MathSciNet  MATH  Google Scholar 

  69. 69.

    Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)

    MathSciNet  MATH  Google Scholar 

  70. 70.

    Zhang, Y., Levina, E., Zhu, J.: Detecting overlapping communities in networks using spectral methods. arXiv preprint arXiv:1412.3432 (2014)

  71. 71.

    Zhu, H., Zhang, X., Chu, D., Liao, L.: Nonconvex and nonsmooth optimization with generalized orthogonality constraints: an approximate augmented Lagrangian method. J. Sci. Comput. 72(1), 331–372 (2017)

    MathSciNet  MATH  Google Scholar 

  72. 72.

    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(2), 301–320 (2005)

    MathSciNet  MATH  Google Scholar 

Download references


The authors would like to thank the associate editor and two anonymous reviewers for insightful and constructive comments that helped improve the presentation of this paper. The work of S. Ma was supported in part by a startup package in the Department of Mathematics at University of California, Davis. The work of S. Zhang was supported in part by the National Science Foundation under Grant CMMI-1462408 and in part by the Shenzhen Fundamental Research Fund under Grant KQTD2015033114415450.

Author information



Corresponding author

Correspondence to Shuzhong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs of the technical lemmas

Proofs of the technical lemmas

Proof of Lemma 3.5


By the global optimality for the subproblems in Step 1 of Algorithm 1, we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) - \frac{1}{2}\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{H_i}. \end{aligned}$$

By Step 2 of Algorithm 1 we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\nonumber \\&+ \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x^k_N-x^{k+1}_N\Vert ^2. \end{aligned}$$

By Step 3, directly substituting \(\lambda ^{k+1}\) into the augmented Lagrangian gives

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})= \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^k) + \frac{1}{\beta }\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

Summing up (70), (71), (72)) and apply Lemma 3.4, we obtain the following inequality,

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$

which further indicates

$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N) \nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{3L^2}{\beta }I}. \end{aligned}$$

To ensure that the right hand side of (21) is negative, we need to choose \(H_i\succ \frac{6L^2}{\beta }I\), and ensure that

$$\begin{aligned} \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }<0. \end{aligned}$$

This can be proved by first viewing it as a quadratic function of \(z = \frac{1}{\gamma }\). To find some \(z>0\) such that

$$\begin{aligned} p(z) = \frac{6}{\beta }z^2 - 13z + \left( \frac{L+\beta }{2}+6\beta +\frac{3}{\beta }L^2\right) <0, \end{aligned}$$

we need the discriminant to be positive, i.e.

$$\begin{aligned} \Delta (\beta ) = \frac{1}{\beta ^2}(13\beta ^2-12\beta L -72L^2)>0. \end{aligned}$$

It is easy to verify that (19) suffices to guarantee (76). Solving \(p(z) = 0\), we find two positive roots

$$\begin{aligned} z_{1} = \frac{13\beta -\sqrt{13\beta ^2-12\beta L -72L^2}}{12}, \text{ and } z_{2} = \frac{13\beta +\sqrt{13\beta ^2-12\beta L -72L^2}}{12}. \end{aligned}$$

Note that \(\gamma \) defined in (20) satisfies \(\frac{1}{z_2}<\gamma <\frac{1}{z_1}\) and thus guarantees (75). This completes the proof. \(\square \)

Proof of Lemma 3.8


For the subproblem in Step 1 of Algorithm 2, since \(x_i^{k+1}\) is the global minimizer, we have

$$\begin{aligned}&\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N), x^{k+1}_i-x^k_i\rangle \\&\qquad -\bigg \langle \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle \\&\qquad +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b\bigg \Vert ^2 + \sum _{j=1}^{i}r_j(x^{k+1}_j)+\sum _{j = i+1}^{N-1}r_j(x^k_j)\\&\quad \le -\bigg \langle \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b\bigg \Vert ^2 \\&\qquad + \sum _{j=1}^{i-1}r_j(x^{k+1}_j)+\sum _{j = i}^{N-1}r_j(x^k_j)-\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2_{H_i}. \end{aligned}$$

By the L-Lipschitz continuity of \(\nabla _i f\), we have

$$\begin{aligned}&f(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N)\\&\quad \le f(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N) +\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i, \ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle \\&\qquad +\frac{L}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Combining the above two inequalities and using the definition of \(\mathcal {L}_{\beta }\) in (16), we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots , x^k_N,\lambda ^k)\nonumber \\&-\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$

Summing (77) over \(i=1,\ldots ,N-1\), we have the following inequality, which is the counterpart of (70):

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_N,\lambda ^k)-\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i \Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$

Besides, since (71) and (72) still hold, by combining (78), (71) and (72) and applying Lemma 3.4, we establish the following two inequalities, which are respectively the counterparts of (73) and (21):

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$


$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)\\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I}. \end{aligned}$$

From the proof of Lemma 3.5, it is easy to see that the right hand side of the above inequality is negative, if \(H_i\succ \left( \frac{6L^2}{\beta }+L\right) I\) and \(\beta \) and \(\gamma \) are chosen according to (19) and (20). \(\square \)

Proof of Lemma 3.11


For the ease of notation, we denote

$$\begin{aligned} G_i^M(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k) = \nabla _i f(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k)+\delta _i^k. \end{aligned}$$

Note that \(\delta _i^k\) is a zero-mean random variable. By Steps 2 and 3 of Algorithm 3 we obtain

$$\begin{aligned} \lambda ^{k+1} = \left( \beta -\frac{1}{\gamma }\right) (x_N^k-x_N^{k+1}) +\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)+\delta _N^k. \end{aligned}$$

Applying (81) for k and \(k+1\), and using (81), we get

$$\begin{aligned} \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2= & {} \bigg \Vert \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)-\left( \beta -\frac{1}{\gamma }\right) (x^{k-1}_N-x^k_N) +(\delta _N^k-\delta _N^{k-1})\\&+(\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)-\nabla _Nf(x^k_1, \ldots ,x^k_{N-1},x^{k-1}_N)\bigg \Vert ^2 \\\le & {} 4\left( \beta -\frac{1}{\gamma }\right) ^2\Vert x^k_N-x^{k+1}_N\Vert ^2+4 \left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 \\&+4L^2\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2 + 4\Vert \delta _N^k -\delta _N^{k-1}\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and using \(\textsf {E} [\langle \delta _N^k,\delta _N^{k-1}\rangle ] = 0\) completes the proof. \(\square \)

Proof of Lemma 3.12


Similar as (77), by further incorporating (80), we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k) -\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N,\lambda ^k)\\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\langle \delta _i^k,x^{k+1}_i-x^k_i\rangle \\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\frac{1}{2}\Vert \delta _i^k\Vert ^2+\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and summing over \(i=1,\ldots ,N-1\), and using (36), we obtain

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)] -\textsf {E} [\mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_N,\lambda ^k)] \nonumber \\&\quad \le -\sum _{i=1}^{N-1}\textsf {E} \left[ \Vert x^{k+1}_i-x^k_i\Vert ^2_{\frac{1}{2} H_i-\frac{L+1}{2}I}\right] +\frac{N-1}{2M}\sigma ^2. \end{aligned}$$

Note that by the Step 2 of Algorithm 3 and the descent lemma we have

$$\begin{aligned} 0= & {} \bigg \langle x_N^k - x_N^{k+1}, \nabla _N f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) + \delta _N^k - \lambda ^k + \beta \left( \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\right) \\&\quad - \frac{1}{\gamma }(x_N^k-x_N^{k+1})\bigg \rangle \\\le & {} f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) - f(x^{k+1}) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x_N^{k+1}-x_N^k\Vert ^2 - \langle \lambda ^k,x_N^k-x_N^{k+1}\rangle \\&+ \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\Vert ^2 - \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^{k+1}-b\Vert ^2 + \langle \delta _N^k,x_N^k-x_N^{k+1}\rangle \\\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k) - \mathcal {L}_{\beta }(x^{k+1},\lambda ^k) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \Vert x^k_N-x^{k+1}_N\Vert ^2 + \frac{1}{2}\Vert \delta _N^k\Vert ^2. \end{aligned}$$

Taking the expectation with respect to all random variables yields

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)]- \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)]\nonumber \\&\le \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \textsf {E} [\Vert x^k_N -x^{k+1}_N\Vert ^2]+\frac{1}{2M}\sigma ^2. \end{aligned}$$

The following equality holds trivially from Step 3 of Algorithm 3:

$$\begin{aligned} \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})]-\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1, \ldots ,x^{k+1}_N,\lambda ^{k})] = \frac{1}{\beta }\textsf {E} [\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2]. \end{aligned}$$

Combining (82), (83), (84) and (38), we obtain

$$\begin{aligned}&\textsf {E} [\Psi _S(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N)] - \textsf {E} [\Psi _S(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)]\nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] \textsf {E} [\Vert x^k_N-x^{k+1}_N\Vert ^2] \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\textsf {E} \left[ \Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{4L^2}{\beta }I -\frac{L+1}{2}I}\right] +\left( \frac{8}{\beta }+\frac{1}{2}+\frac{N-1}{2}\right) \frac{\sigma ^2}{M}. \end{aligned}$$

Choosing \(\beta \) and \(\gamma \) according to (40) and (41), and using the similar arguments in the proof of Lemma 3.5, it is easy to verify that

$$\begin{aligned} \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] <0. \end{aligned}$$

By further choosing \(H_i\succ \left( \frac{8L^2}{\beta }+L+1\right) I\), we know that the right hand side of (85) is negative, and this completes the proof. \(\square \)

Proof of Lemma 3.13


From (81) and (15), we have that

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})\\&\quad = \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}) - \bigg \langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}) + \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N) \\&\qquad + \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})+\delta _N^k\bigg \rangle + \frac{\beta }{2}\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2\\&\quad \ge \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}_1,\ldots ,x^{k+1}_{N-1}, b-\sum _{i=1}^{N-1}A_ix^{k+1}_i) -\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad + \bigg (\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2}\bigg )\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2\\&\quad \ge \sum _{i=1}^{N-1}r_i^*+f^*-\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2 +L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2 \\ \end{aligned}$$

where the first inequality is obtained by applying \(\langle a, b\rangle \le \frac{1}{2}(\frac{1}{\eta }\Vert a\Vert ^2+\eta \Vert b\Vert ^2)\) to terms \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)\rangle \), \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})\rangle \) and \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b,\delta _N^k\rangle \) respectively with \(\eta = \frac{8}{\beta }, \frac{8}{\beta }\) and \(\frac{4}{\beta }\). Note that \(\beta >2L\) according to (40), thus \((\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2})>0\) and the last inequality holds. By rearranging the terms and taking expectation with respect to all random variables completes the proof. \(\square \)

Proof for Theorem 3.19


Through similar argument, one can easily obtain

$$\begin{aligned} \Vert \lambda ^{k+1} - \nabla _N f(x^{k+1}_1,\ldots ,x^{k+1}_N)\Vert ^2\le \kappa _2\theta _k\quad \text{ and } \quad \left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i+x^{k+1}_N-b\right\| ^2 \le \kappa _1\theta _k, \end{aligned}$$

where \(\theta _k = \sum _{i=1}^N(\Vert t_i^{k+1}g_i^{k+1}\Vert ^2+\Vert t_i^kg_i^k\Vert ^2+\Vert t_i^{k-1}g_i^{k-1}\Vert ^2)\). The only remaining task is to guarantee an \(\epsilon \) version of (48). First let us prove that

$$\begin{aligned} \Vert g_i^{k+1}\Vert \le \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}}. \end{aligned}$$

Denote \(h_i(x_i) = \mathcal {L}_{\beta }(x^{k+2}_1,\ldots ,x^{k+2}_{i-1},x_i,x^{k+1}_{i+1},\ldots ,x^{k+1}_N,\lambda ^{k+1})\) and \(Y_i(t) = R(x^{k+1}_i,-tg_i^{k+1})\), then it is not hard to see that \(\nabla h_i(x_i)\) is Lipschitz continuous with parameter \(L+\beta \Vert A_i\Vert _2^2 \le L_3:=L+\beta A_{\max }^2\). Consequently, it yields

$$\begin{aligned} h_i(Y_i(t))\le & {} h_i(Y_i(0)) + \langle \nabla h_i(Y_i(0)), Y_i(t) - Y_i(0) - tY_i'(0) + tY'_i(0)\rangle \\&+ \frac{L_3}{2} \Vert Y_i(t) - Y_i(0)\Vert ^2 \\\le & {} h_i(Y_i(0)) + t\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle + L_2t^2\Vert \nabla h_i(Y_i(0))\Vert \Vert Y'_i(0)\Vert ^2 \\&+ \frac{L_3L_1^2}{2}t^2\Vert Y'_i(0)\Vert ^2 \\= & {} h_i(Y_i(0)) - \left( t-L_2t^2\Vert \nabla h_i(Y_i(0))\Vert - \frac{L_3L_1^2}{2}t^2\right) \Vert Y'_i(0)\Vert ^2, \end{aligned}$$

where the last equality is due to \(\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle = -\langle Y_i'(0),Y_i'(0)\rangle \). Also note the relationship

$$\begin{aligned} \Vert Y_i'(0)\Vert = \Vert g_i^{k+1}\Vert = \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\big \{\nabla h_i(Y_i(0))\big \}\Vert \le \Vert \nabla h_i(Y_i(0))\Vert . \end{aligned}$$

Note that \(\left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i{+}x^{k+1}_N{-}b\right\| \le \sqrt{\kappa _1\theta _k}{\le } \sqrt{\frac{\kappa _1}{\tau }(\Psi _G(x_1^1,\ldots ,x_N^1,\lambda ^1,x_N^0){-}f^*)}.\) Because \(\mathcal {M}_i, i = 1,\ldots ,N-1\) are all compact submanifolds, \(x^{k+1}_i, i = 1,\ldots ,N-1\) are all bounded. Hence the whole sequence \(\{x_N^{k}\}\) is also bounded. By (27) (which also holds in this case),

$$\begin{aligned} \Vert \lambda ^{k+1}\Vert \le |\beta -\frac{1}{\gamma }|\sqrt{\theta _k}+\Vert \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)\Vert . \end{aligned}$$

By the boundedness of \(\{(x^k_1,\ldots ,x^k_N)\}\) and the continuity of \(\nabla f(\cdot )\), the second term is bounded. Combining the boundedness of \(\{\theta _k\}\), we know that whole sequence \(\{\lambda ^k\}\) is bounded. Consequently, there exists a constant \(C>0\) such that \(\Vert \nabla h_i(Y_i(0))\Vert \le C,\) where

$$\begin{aligned} \nabla h_i(Y_i(0))= & {} \nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&+\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg ). \end{aligned}$$

Note that this constant C depends only on the first two iterates \(\{x_1^t,\ldots ,x_N^t,\lambda ^t\}, t = 0,1,\) except for the absolute constants such as \(\Vert A_i\Vert _2,i = 1,\ldots ,N\). Therefore, when

$$\begin{aligned} t\le \frac{2}{2L_2C+\sigma +L_3L_1^2}\le \frac{2}{2L_2\Vert \nabla h_i(Y_i(0))\Vert +\sigma +L_3L_1^2}, \end{aligned}$$

it holds that

$$\begin{aligned} h_i(Y_i(t))\le h_i(x^{k+1}_i) - \frac{\sigma }{2}t^2\Vert g_i^{k+1}\Vert ^2. \end{aligned}$$

Note that \(\sigma >\frac{2\alpha }{s}\), by the terminating rule of the line-search step, we have

$$\begin{aligned} t_i^k\ge \min \left\{ s, \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}\right\} = \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}. \end{aligned}$$

Then by noting

$$\begin{aligned} \frac{2\alpha \Vert g_i^{k+1}\Vert }{2L_2C+\sigma +L_3L_1^2}\le t_i^{k+1}\Vert g_i^{k+1}\Vert \le \sqrt{\theta _k}, \end{aligned}$$

we have (86).

Now let us discuss the issue of (48). By definition,

$$\begin{aligned} g_i^{k+1}= & {} {\mathrm {Proj}}\, _{\mathcal {T}_{x^{k+1}_i}\mathcal {M}_i}\bigg \{\nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&\quad +\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg )\bigg \}. \end{aligned}$$

Consequently, we obtain

$$\begin{aligned}&\biggl \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\biggl \{\nabla _i f(x^{k+1})-A_i^\top \lambda ^{k+1}\biggr \}\biggr \Vert \\&\quad = \left\| {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\left\{ \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1}) + g_i^{k+1} \right. \right. \\&\qquad - \left. \left. \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) + \beta A_i^\top \left( \sum _{j = 1}^{i-1}A_j(x_j^{k+1}-x_j^{k+2})\right) \right\} \right\| \\&\quad \le \Vert \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1})\Vert + \left\| \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) \right\| \\&\qquad + \Vert g_i^{k+1}\Vert +\left\| \beta A_i^\top \left( \sum _{j = i+1}^{N}A_j(x_j^{k+1}-x_j^{k+2}) \right) \right\| \\&\quad \le \left( L+\sqrt{N}\beta A_{\max }^2\right) \max \{L_1,1\}\sqrt{\theta _k} + \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}} + \beta \Vert A_i\Vert _2 \sqrt{\kappa _1\theta _k} \\&\quad \le \sqrt{\kappa _3\theta _{k}}. \end{aligned}$$

\(\square \)

Proof for inequality (60)


First, we need to figure out the Lipschitz constant of \(\bar{f}_{\beta }\).

$$\begin{aligned}&\Vert \nabla \bar{f}_{\beta }(x)-\nabla \bar{f}_{\beta }(y)\Vert \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \left\| \left[ \left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_1,\ldots ,\left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_N\right] \right\| \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \sqrt{N}\max _{1\le i\le N}\Vert A_i\Vert _2\left\| \sum _{j =1}^NA_j(x_j-y_j) \right\| \nonumber \\&\quad \le \left( L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 \right) \Vert x-y\Vert . \end{aligned}$$

So we define \(\hat{L} = L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 \) as the Lipschitz constant for function \(\bar{f}_{\beta }.\) The global optimality of the subproblem (59) yields

$$\begin{aligned}&\langle \nabla _i\bar{f}_{\beta }(x^k_1,\ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle -\langle \lambda ^k,A_ix^{k+1}_i\rangle + r_i(x^{k+1}_i)+\frac{1}{2}\Vert x^{k+1}_i\\&\quad -x^k_i\Vert ^2_{H_i} \le r_i(x^k_i) - \langle \lambda ^k,A_ix^k_i\rangle . \end{aligned}$$

By the descent lemma we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\\&\quad = \bar{f}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) -\left\langle \lambda ^k,\sum _{i=1}^{N}A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i) \\&\quad \le \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N) +\langle \nabla \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N),x^{k+1}-x^k\rangle \\&\qquad \frac{\hat{L}}{2}\Vert x^{k+1}-x^k\Vert ^2-\left\langle \lambda ^k,\sum _{i=1}^{N} A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i). \end{aligned}$$

Combining the above two inequalities yields (60). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Ma, S. & Zhang, S. Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis. Math. Program. 184, 445–490 (2020).

Download citation


  • Nonconvex and nonsmooth optimization
  • Riemannian manifold
  • \(\epsilon \)-Stationary solution
  • ADMM
  • Iteration complexity

Mathematics Subject Classification

  • 90C60
  • 90C90