Advertisement

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Abstract

In this paper we study nonconvex and nonsmooth multi-block optimization over Euclidean embedded (smooth) Riemannian submanifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. By utilizing the embedding structure, we develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings, where the duality is with respect to the embedded Euclidean spaces. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of \(\epsilon \)-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms possess an iteration complexity of \(O(1/\epsilon ^2)\) to reach an \(\epsilon \)-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the \(\ell _q\) regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

References

  1. 1.

    Absil, P.A., Baker, C.G., Gallivan, K.A.: Convergence analysis of Riemannian trust-region methods. Technical report (2006)

  2. 2.

    Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)

  3. 3.

    Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

  4. 4.

    Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)

  5. 5.

    Ballani, J., Grasedyck, L., Kluge, M.: Black box approximation of tensors in hierarchical Tucker format. Linear Algebra Appl. 438(2), 639–657 (2013)

  6. 6.

    Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. https://arxiv.org/pdf/1609.04869.pdf (2016)

  7. 7.

    Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas–Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)

  8. 8.

    Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2018)

  9. 9.

    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)

  10. 10.

    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

  11. 11.

    Chen, Y., Li, X., Xu, J.: Convexified modularity maximization for degree-corrected stochastic block models. arXiv preprint arXiv:1512.08425 (2015)

  12. 12.

    Clarke, F.H.: Nonsmooth analysis and optimization. Proc. Int. Congr. Math. 5, 847–853 (1983)

  13. 13.

    De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

  14. 14.

    Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: NIPS, vol. 18 (2005)

  15. 15.

    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

  16. 16.

    Edelman, A., Arias, T.A., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

  17. 17.

    Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)

  18. 18.

    Frieze, A., Jerrum, M.: Improved approximation algorithms for MAX k-CUT and MAX bisection. Algorithmica 18(1), 67–81 (1997)

  19. 19.

    Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)

  20. 20.

    Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

  21. 21.

    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)

  22. 22.

    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)

  23. 23.

    Ghosh, S., Lam, H.: Computing worst-case input models in stochastic simulation. arXiv preprint arXiv:1507.05609 (2015)

  24. 24.

    Ghosh, S., Lam, H.: Mirror descent stochastic approximation for computing worst-case stochastic input models. In: Winter Simulation Conference, 2015, pp. 425–436. IEEE (2015)

  25. 25.

    Grant, M., Boyd, S., Ye, Y.: CVX: MATLAB software for disciplined convex programming (2008)

  26. 26.

    Hong, M.: Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: algorithms, convergence, and applications. arXiv preprint arXiv:1604.00543 (2016)

  27. 27.

    Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

  28. 28.

    Hosseini, S., Pouryayevali, M.R.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Fuel Energy Abstr. 74(12), 3884–3895 (2011)

  29. 29.

    Huper, K., Trumpf, J.: Newton-like methods for numerical optimization on manifolds. In: Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference, vol. 1, pp. 136–139. IEEE (2004)

  30. 30.

    Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)

  31. 31.

    Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)

  32. 32.

    Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. Preprint arXiv:1705.09059 (2017)

  33. 33.

    Jin, J.: Fast community detection by score. Ann. Stat. 43(1), 57–89 (2015)

  34. 34.

    Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic variance reduced gradient on Grassmann manifold. arXiv preprint arXiv:1605.07367 (2016)

  35. 35.

    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

  36. 36.

    Kovnatsky, A., Glashoff, K., Bronstein, M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)

  37. 37.

    Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)

  38. 38.

    Lai, Z., Xu, Y., Chen, Q., Yang, J., Zhang, D.: Multilinear sparse principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1942–1950 (2014)

  39. 39.

    Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst. 19, 801 (2007)

  40. 40.

    Lee, J.M.: Introduction to Smooth Manifolds. Springer, New York (2013)

  41. 41.

    Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

  42. 42.

    Liu, H., Wu, W., So, A.M.-C.: Quadratic optimization with orthogonality constraints: explicit Lojasiewicz exponent and linear convergence of line-search methods. In: ICML, pp. 1158–1167 (2016)

  43. 43.

    Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)

  44. 44.

    Luenberger, D.G.: The gradient projection method along geodesics. Manag. Sci. 18(11), 620–631 (1972)

  45. 45.

    Motreanu, D., Pavel, N.H.: Quasi-tangent vectors in flow-invariance and optimization problems on Banach manifolds. J. Math. Anal. Appl. 88(1), 116–132 (1982)

  46. 46.

    Nemirovski, A.: Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program. 109(2), 283–317 (2007)

  47. 47.

    Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 9, no. 4, p. 1556. Springer

  48. 48.

    Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)

  49. 49.

    Oseledets, I.V., Tyrtyshnikov, E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432(1), 70–88 (2010)

  50. 50.

    Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)

  51. 51.

    Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)

  52. 52.

    Rockafellar, R.T.: Clarke’s tangent cones and the boundaries of closed sets in \(\mathbb{R}^n\). Nonlinear Anal. Theory Methods Appl. 3, 145–154 (1979)

  53. 53.

    Smith, S.T.: Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)

  54. 54.

    Srebro, N., Jaakkola, T.: Weighted low-rank approximations. In: ICML, vol. 3, pp. 720–727 (2003)

  55. 55.

    Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere II: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)

  56. 56.

    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

  57. 57.

    Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)

  58. 58.

    Wang, S., Sun, M., Chen, Y., Pang, E., Zhou, C.: STPCA: sparse tensor principal component analysis for feature extraction. In: 21st International Conference on Pattern Recognition, 2012, pp. 2278–2281. IEEE (2012)

  59. 59.

    Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

  60. 60.

    Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)

  61. 61.

    Wiegele, A.: Biq Mac library—a collection of max-cut and quadratic 0–1 programming instances of medium size. Preprint (2007)

  62. 62.

    Xu, Y.: Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)

  63. 63.

    Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)

  64. 64.

    Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)

  65. 65.

    Ye, Y.: A. 699-approximation algorithm for max-bisection. Math. Program. 90(1), 101–111 (2001)

  66. 66.

    Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)

  67. 67.

    Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. arXiv preprint arXiv:1602.06053 (2016)

  68. 68.

    Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)

  69. 69.

    Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)

  70. 70.

    Zhang, Y., Levina, E., Zhu, J.: Detecting overlapping communities in networks using spectral methods. arXiv preprint arXiv:1412.3432 (2014)

  71. 71.

    Zhu, H., Zhang, X., Chu, D., Liao, L.: Nonconvex and nonsmooth optimization with generalized orthogonality constraints: an approximate augmented Lagrangian method. J. Sci. Comput. 72(1), 331–372 (2017)

  72. 72.

    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(2), 301–320 (2005)

Download references

Acknowledgements

The authors would like to thank the associate editor and two anonymous reviewers for insightful and constructive comments that helped improve the presentation of this paper. The work of S. Ma was supported in part by a startup package in the Department of Mathematics at University of California, Davis. The work of S. Zhang was supported in part by the National Science Foundation under Grant CMMI-1462408 and in part by the Shenzhen Fundamental Research Fund under Grant KQTD2015033114415450.

Author information

Correspondence to Shuzhong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs of the technical lemmas

Proofs of the technical lemmas

Proof of Lemma 3.5

Proof

By the global optimality for the subproblems in Step 1 of Algorithm 1, we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) - \frac{1}{2}\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{H_i}. \end{aligned}$$
(70)

By Step 2 of Algorithm 1 we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\nonumber \\&+ \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x^k_N-x^{k+1}_N\Vert ^2. \end{aligned}$$
(71)

By Step 3, directly substituting \(\lambda ^{k+1}\) into the augmented Lagrangian gives

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})= \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^k) + \frac{1}{\beta }\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2. \end{aligned}$$
(72)

Summing up (70), (71), (72)) and apply Lemma 3.4, we obtain the following inequality,

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$
(73)

which further indicates

$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N) \nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{3L^2}{\beta }I}. \end{aligned}$$
(74)

To ensure that the right hand side of (21) is negative, we need to choose \(H_i\succ \frac{6L^2}{\beta }I\), and ensure that

$$\begin{aligned} \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }<0. \end{aligned}$$
(75)

This can be proved by first viewing it as a quadratic function of \(z = \frac{1}{\gamma }\). To find some \(z>0\) such that

$$\begin{aligned} p(z) = \frac{6}{\beta }z^2 - 13z + \left( \frac{L+\beta }{2}+6\beta +\frac{3}{\beta }L^2\right) <0, \end{aligned}$$

we need the discriminant to be positive, i.e.

$$\begin{aligned} \Delta (\beta ) = \frac{1}{\beta ^2}(13\beta ^2-12\beta L -72L^2)>0. \end{aligned}$$
(76)

It is easy to verify that (19) suffices to guarantee (76). Solving \(p(z) = 0\), we find two positive roots

$$\begin{aligned} z_{1} = \frac{13\beta -\sqrt{13\beta ^2-12\beta L -72L^2}}{12}, \text{ and } z_{2} = \frac{13\beta +\sqrt{13\beta ^2-12\beta L -72L^2}}{12}. \end{aligned}$$

Note that \(\gamma \) defined in (20) satisfies \(\frac{1}{z_2}<\gamma <\frac{1}{z_1}\) and thus guarantees (75). This completes the proof. \(\square \)

Proof of Lemma 3.8

Proof

For the subproblem in Step 1 of Algorithm 2, since \(x_i^{k+1}\) is the global minimizer, we have

$$\begin{aligned}&\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N), x^{k+1}_i-x^k_i\rangle \\&\qquad -\bigg \langle \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle \\&\qquad +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b\bigg \Vert ^2 + \sum _{j=1}^{i}r_j(x^{k+1}_j)+\sum _{j = i+1}^{N-1}r_j(x^k_j)\\&\quad \le -\bigg \langle \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b\bigg \Vert ^2 \\&\qquad + \sum _{j=1}^{i-1}r_j(x^{k+1}_j)+\sum _{j = i}^{N-1}r_j(x^k_j)-\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2_{H_i}. \end{aligned}$$

By the L-Lipschitz continuity of \(\nabla _i f\), we have

$$\begin{aligned}&f(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N)\\&\quad \le f(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N) +\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i, \ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle \\&\qquad +\frac{L}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Combining the above two inequalities and using the definition of \(\mathcal {L}_{\beta }\) in (16), we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots , x^k_N,\lambda ^k)\nonumber \\&-\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$
(77)

Summing (77) over \(i=1,\ldots ,N-1\), we have the following inequality, which is the counterpart of (70):

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_N,\lambda ^k)-\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i \Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$
(78)

Besides, since (71) and (72) still hold, by combining (78), (71) and (72) and applying Lemma 3.4, we establish the following two inequalities, which are respectively the counterparts of (73) and (21):

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$
(79)

and

$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)\\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I}. \end{aligned}$$

From the proof of Lemma 3.5, it is easy to see that the right hand side of the above inequality is negative, if \(H_i\succ \left( \frac{6L^2}{\beta }+L\right) I\) and \(\beta \) and \(\gamma \) are chosen according to (19) and (20). \(\square \)

Proof of Lemma 3.11

Proof

For the ease of notation, we denote

$$\begin{aligned} G_i^M(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k) = \nabla _i f(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k)+\delta _i^k. \end{aligned}$$
(80)

Note that \(\delta _i^k\) is a zero-mean random variable. By Steps 2 and 3 of Algorithm 3 we obtain

$$\begin{aligned} \lambda ^{k+1} = \left( \beta -\frac{1}{\gamma }\right) (x_N^k-x_N^{k+1}) +\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)+\delta _N^k. \end{aligned}$$
(81)

Applying (81) for k and \(k+1\), and using (81), we get

$$\begin{aligned} \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2= & {} \bigg \Vert \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)-\left( \beta -\frac{1}{\gamma }\right) (x^{k-1}_N-x^k_N) +(\delta _N^k-\delta _N^{k-1})\\&+(\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)-\nabla _Nf(x^k_1, \ldots ,x^k_{N-1},x^{k-1}_N)\bigg \Vert ^2 \\\le & {} 4\left( \beta -\frac{1}{\gamma }\right) ^2\Vert x^k_N-x^{k+1}_N\Vert ^2+4 \left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 \\&+4L^2\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2 + 4\Vert \delta _N^k -\delta _N^{k-1}\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and using \(\textsf {E} [\langle \delta _N^k,\delta _N^{k-1}\rangle ] = 0\) completes the proof. \(\square \)

Proof of Lemma 3.12

Proof

Similar as (77), by further incorporating (80), we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k) -\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N,\lambda ^k)\\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\langle \delta _i^k,x^{k+1}_i-x^k_i\rangle \\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\frac{1}{2}\Vert \delta _i^k\Vert ^2+\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and summing over \(i=1,\ldots ,N-1\), and using (36), we obtain

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)] -\textsf {E} [\mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_N,\lambda ^k)] \nonumber \\&\quad \le -\sum _{i=1}^{N-1}\textsf {E} \left[ \Vert x^{k+1}_i-x^k_i\Vert ^2_{\frac{1}{2} H_i-\frac{L+1}{2}I}\right] +\frac{N-1}{2M}\sigma ^2. \end{aligned}$$
(82)

Note that by the Step 2 of Algorithm 3 and the descent lemma we have

$$\begin{aligned} 0= & {} \bigg \langle x_N^k - x_N^{k+1}, \nabla _N f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) + \delta _N^k - \lambda ^k + \beta \left( \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\right) \\&\quad - \frac{1}{\gamma }(x_N^k-x_N^{k+1})\bigg \rangle \\\le & {} f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) - f(x^{k+1}) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x_N^{k+1}-x_N^k\Vert ^2 - \langle \lambda ^k,x_N^k-x_N^{k+1}\rangle \\&+ \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\Vert ^2 - \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^{k+1}-b\Vert ^2 + \langle \delta _N^k,x_N^k-x_N^{k+1}\rangle \\\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k) - \mathcal {L}_{\beta }(x^{k+1},\lambda ^k) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \Vert x^k_N-x^{k+1}_N\Vert ^2 + \frac{1}{2}\Vert \delta _N^k\Vert ^2. \end{aligned}$$

Taking the expectation with respect to all random variables yields

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)]- \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)]\nonumber \\&\le \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \textsf {E} [\Vert x^k_N -x^{k+1}_N\Vert ^2]+\frac{1}{2M}\sigma ^2. \end{aligned}$$
(83)

The following equality holds trivially from Step 3 of Algorithm 3:

$$\begin{aligned} \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})]-\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1, \ldots ,x^{k+1}_N,\lambda ^{k})] = \frac{1}{\beta }\textsf {E} [\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2]. \end{aligned}$$
(84)

Combining (82), (83), (84) and (38), we obtain

$$\begin{aligned}&\textsf {E} [\Psi _S(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N)] - \textsf {E} [\Psi _S(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)]\nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] \textsf {E} [\Vert x^k_N-x^{k+1}_N\Vert ^2] \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\textsf {E} \left[ \Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{4L^2}{\beta }I -\frac{L+1}{2}I}\right] +\left( \frac{8}{\beta }+\frac{1}{2}+\frac{N-1}{2}\right) \frac{\sigma ^2}{M}. \end{aligned}$$
(85)

Choosing \(\beta \) and \(\gamma \) according to (40) and (41), and using the similar arguments in the proof of Lemma 3.5, it is easy to verify that

$$\begin{aligned} \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] <0. \end{aligned}$$

By further choosing \(H_i\succ \left( \frac{8L^2}{\beta }+L+1\right) I\), we know that the right hand side of (85) is negative, and this completes the proof. \(\square \)

Proof of Lemma 3.13

Proof

From (81) and (15), we have that

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})\\&\quad = \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}) - \bigg \langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}) + \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N) \\&\qquad + \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})+\delta _N^k\bigg \rangle + \frac{\beta }{2}\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2\\&\quad \ge \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}_1,\ldots ,x^{k+1}_{N-1}, b-\sum _{i=1}^{N-1}A_ix^{k+1}_i) -\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad + \bigg (\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2}\bigg )\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2\\&\quad \ge \sum _{i=1}^{N-1}r_i^*+f^*-\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2 +L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2 \\ \end{aligned}$$

where the first inequality is obtained by applying \(\langle a, b\rangle \le \frac{1}{2}(\frac{1}{\eta }\Vert a\Vert ^2+\eta \Vert b\Vert ^2)\) to terms \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)\rangle \), \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})\rangle \) and \(\langle \sum _{i = 1}^NA_ix^{k+1}_i-b,\delta _N^k\rangle \) respectively with \(\eta = \frac{8}{\beta }, \frac{8}{\beta }\) and \(\frac{4}{\beta }\). Note that \(\beta >2L\) according to (40), thus \((\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2})>0\) and the last inequality holds. By rearranging the terms and taking expectation with respect to all random variables completes the proof. \(\square \)

Proof for Theorem 3.19

Proof

Through similar argument, one can easily obtain

$$\begin{aligned} \Vert \lambda ^{k+1} - \nabla _N f(x^{k+1}_1,\ldots ,x^{k+1}_N)\Vert ^2\le \kappa _2\theta _k\quad \text{ and } \quad \left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i+x^{k+1}_N-b\right\| ^2 \le \kappa _1\theta _k, \end{aligned}$$

where \(\theta _k = \sum _{i=1}^N(\Vert t_i^{k+1}g_i^{k+1}\Vert ^2+\Vert t_i^kg_i^k\Vert ^2+\Vert t_i^{k-1}g_i^{k-1}\Vert ^2)\). The only remaining task is to guarantee an \(\epsilon \) version of (48). First let us prove that

$$\begin{aligned} \Vert g_i^{k+1}\Vert \le \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}}. \end{aligned}$$
(86)

Denote \(h_i(x_i) = \mathcal {L}_{\beta }(x^{k+2}_1,\ldots ,x^{k+2}_{i-1},x_i,x^{k+1}_{i+1},\ldots ,x^{k+1}_N,\lambda ^{k+1})\) and \(Y_i(t) = R(x^{k+1}_i,-tg_i^{k+1})\), then it is not hard to see that \(\nabla h_i(x_i)\) is Lipschitz continuous with parameter \(L+\beta \Vert A_i\Vert _2^2 \le L_3:=L+\beta A_{\max }^2\). Consequently, it yields

$$\begin{aligned} h_i(Y_i(t))\le & {} h_i(Y_i(0)) + \langle \nabla h_i(Y_i(0)), Y_i(t) - Y_i(0) - tY_i'(0) + tY'_i(0)\rangle \\&+ \frac{L_3}{2} \Vert Y_i(t) - Y_i(0)\Vert ^2 \\\le & {} h_i(Y_i(0)) + t\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle + L_2t^2\Vert \nabla h_i(Y_i(0))\Vert \Vert Y'_i(0)\Vert ^2 \\&+ \frac{L_3L_1^2}{2}t^2\Vert Y'_i(0)\Vert ^2 \\= & {} h_i(Y_i(0)) - \left( t-L_2t^2\Vert \nabla h_i(Y_i(0))\Vert - \frac{L_3L_1^2}{2}t^2\right) \Vert Y'_i(0)\Vert ^2, \end{aligned}$$

where the last equality is due to \(\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle = -\langle Y_i'(0),Y_i'(0)\rangle \). Also note the relationship

$$\begin{aligned} \Vert Y_i'(0)\Vert = \Vert g_i^{k+1}\Vert = \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\big \{\nabla h_i(Y_i(0))\big \}\Vert \le \Vert \nabla h_i(Y_i(0))\Vert . \end{aligned}$$

Note that \(\left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i{+}x^{k+1}_N{-}b\right\| \le \sqrt{\kappa _1\theta _k}{\le } \sqrt{\frac{\kappa _1}{\tau }(\Psi _G(x_1^1,\ldots ,x_N^1,\lambda ^1,x_N^0){-}f^*)}.\) Because \(\mathcal {M}_i, i = 1,\ldots ,N-1\) are all compact submanifolds, \(x^{k+1}_i, i = 1,\ldots ,N-1\) are all bounded. Hence the whole sequence \(\{x_N^{k}\}\) is also bounded. By (27) (which also holds in this case),

$$\begin{aligned} \Vert \lambda ^{k+1}\Vert \le |\beta -\frac{1}{\gamma }|\sqrt{\theta _k}+\Vert \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)\Vert . \end{aligned}$$

By the boundedness of \(\{(x^k_1,\ldots ,x^k_N)\}\) and the continuity of \(\nabla f(\cdot )\), the second term is bounded. Combining the boundedness of \(\{\theta _k\}\), we know that whole sequence \(\{\lambda ^k\}\) is bounded. Consequently, there exists a constant \(C>0\) such that \(\Vert \nabla h_i(Y_i(0))\Vert \le C,\) where

$$\begin{aligned} \nabla h_i(Y_i(0))= & {} \nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&+\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg ). \end{aligned}$$

Note that this constant C depends only on the first two iterates \(\{x_1^t,\ldots ,x_N^t,\lambda ^t\}, t = 0,1,\) except for the absolute constants such as \(\Vert A_i\Vert _2,i = 1,\ldots ,N\). Therefore, when

$$\begin{aligned} t\le \frac{2}{2L_2C+\sigma +L_3L_1^2}\le \frac{2}{2L_2\Vert \nabla h_i(Y_i(0))\Vert +\sigma +L_3L_1^2}, \end{aligned}$$

it holds that

$$\begin{aligned} h_i(Y_i(t))\le h_i(x^{k+1}_i) - \frac{\sigma }{2}t^2\Vert g_i^{k+1}\Vert ^2. \end{aligned}$$

Note that \(\sigma >\frac{2\alpha }{s}\), by the terminating rule of the line-search step, we have

$$\begin{aligned} t_i^k\ge \min \left\{ s, \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}\right\} = \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}. \end{aligned}$$

Then by noting

$$\begin{aligned} \frac{2\alpha \Vert g_i^{k+1}\Vert }{2L_2C+\sigma +L_3L_1^2}\le t_i^{k+1}\Vert g_i^{k+1}\Vert \le \sqrt{\theta _k}, \end{aligned}$$

we have (86).

Now let us discuss the issue of (48). By definition,

$$\begin{aligned} g_i^{k+1}= & {} {\mathrm {Proj}}\, _{\mathcal {T}_{x^{k+1}_i}\mathcal {M}_i}\bigg \{\nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&\quad +\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg )\bigg \}. \end{aligned}$$

Consequently, we obtain

$$\begin{aligned}&\biggl \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\biggl \{\nabla _i f(x^{k+1})-A_i^\top \lambda ^{k+1}\biggr \}\biggr \Vert \\&\quad = \left\| {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\left\{ \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1}) + g_i^{k+1} \right. \right. \\&\qquad - \left. \left. \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) + \beta A_i^\top \left( \sum _{j = 1}^{i-1}A_j(x_j^{k+1}-x_j^{k+2})\right) \right\} \right\| \\&\quad \le \Vert \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1})\Vert + \left\| \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) \right\| \\&\qquad + \Vert g_i^{k+1}\Vert +\left\| \beta A_i^\top \left( \sum _{j = i+1}^{N}A_j(x_j^{k+1}-x_j^{k+2}) \right) \right\| \\&\quad \le \left( L+\sqrt{N}\beta A_{\max }^2\right) \max \{L_1,1\}\sqrt{\theta _k} + \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}} + \beta \Vert A_i\Vert _2 \sqrt{\kappa _1\theta _k} \\&\quad \le \sqrt{\kappa _3\theta _{k}}. \end{aligned}$$

\(\square \)

Proof for inequality (60)

Proof

First, we need to figure out the Lipschitz constant of \(\bar{f}_{\beta }\).

$$\begin{aligned}&\Vert \nabla \bar{f}_{\beta }(x)-\nabla \bar{f}_{\beta }(y)\Vert \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \left\| \left[ \left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_1,\ldots ,\left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_N\right] \right\| \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \sqrt{N}\max _{1\le i\le N}\Vert A_i\Vert _2\left\| \sum _{j =1}^NA_j(x_j-y_j) \right\| \nonumber \\&\quad \le \left( L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 \right) \Vert x-y\Vert . \end{aligned}$$
(87)

So we define \(\hat{L} = L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 \) as the Lipschitz constant for function \(\bar{f}_{\beta }.\) The global optimality of the subproblem (59) yields

$$\begin{aligned}&\langle \nabla _i\bar{f}_{\beta }(x^k_1,\ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle -\langle \lambda ^k,A_ix^{k+1}_i\rangle + r_i(x^{k+1}_i)+\frac{1}{2}\Vert x^{k+1}_i\\&\quad -x^k_i\Vert ^2_{H_i} \le r_i(x^k_i) - \langle \lambda ^k,A_ix^k_i\rangle . \end{aligned}$$

By the descent lemma we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\\&\quad = \bar{f}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) -\left\langle \lambda ^k,\sum _{i=1}^{N}A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i) \\&\quad \le \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N) +\langle \nabla \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N),x^{k+1}-x^k\rangle \\&\qquad \frac{\hat{L}}{2}\Vert x^{k+1}-x^k\Vert ^2-\left\langle \lambda ^k,\sum _{i=1}^{N} A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i). \end{aligned}$$

Combining the above two inequalities yields (60). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Ma, S. & Zhang, S. Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis. Math. Program. (2019) doi:10.1007/s10107-019-01418-8

Download citation

Keywords

  • Nonconvex and nonsmooth optimization
  • Riemannian manifold
  • \(\epsilon \)-Stationary solution
  • ADMM
  • Iteration complexity

Mathematics Subject Classification

  • 90C60
  • 90C90