Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Zhang, Junyu; Ma, Shiqian; Zhang, Shuzhong

doi:10.1007/s10107-019-01418-8

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Full Length Paper
Series A
Published: 10 August 2019

Volume 184, pages 445–490, (2020)
Cite this article

Mathematical Programming Submit manuscript

1530 Accesses
9 Citations
Explore all metrics

Abstract

In this paper we study nonconvex and nonsmooth multi-block optimization over Euclidean embedded (smooth) Riemannian submanifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. By utilizing the embedding structure, we develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings, where the duality is with respect to the embedded Euclidean spaces. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of $\epsilon $-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms possess an iteration complexity of $O(1/\epsilon ^2)$ to reach an $\epsilon $-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the $\ell _q$ regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tensor theta norms and low rank recovery

Article Open access 27 November 2020

Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis

Article 25 September 2018

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Article 13 July 2022

References

Absil, P.A., Baker, C.G., Gallivan, K.A.: Convergence analysis of Riemannian trust-region methods. Technical report (2006)
Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)
MathSciNet MATH Google Scholar
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
MATH Google Scholar
Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)
MathSciNet MATH Google Scholar
Ballani, J., Grasedyck, L., Kluge, M.: Black box approximation of tensors in hierarchical Tucker format. Linear Algebra Appl. 438(2), 639–657 (2013)
MathSciNet MATH Google Scholar
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. https://arxiv.org/pdf/1609.04869.pdf (2016)
Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas–Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)
MathSciNet MATH Google Scholar
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2018)
MathSciNet MATH Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
MathSciNet MATH Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
MathSciNet MATH Google Scholar
Chen, Y., Li, X., Xu, J.: Convexified modularity maximization for degree-corrected stochastic block models. arXiv preprint arXiv:1512.08425 (2015)
Clarke, F.H.: Nonsmooth analysis and optimization. Proc. Int. Congr. Math. 5, 847–853 (1983)
Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
MathSciNet MATH Google Scholar
Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: NIPS, vol. 18 (2005)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
MathSciNet MATH Google Scholar
Edelman, A., Arias, T.A., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
MathSciNet MATH Google Scholar
Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)
MathSciNet MATH Google Scholar
Frieze, A., Jerrum, M.: Improved approximation algorithms for MAX k-CUT and MAX bisection. Algorithmica 18(1), 67–81 (1997)
MathSciNet MATH Google Scholar
Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
MathSciNet Google Scholar
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)
MathSciNet MATH Google Scholar
Ghosh, S., Lam, H.: Computing worst-case input models in stochastic simulation. arXiv preprint arXiv:1507.05609 (2015)
Ghosh, S., Lam, H.: Mirror descent stochastic approximation for computing worst-case stochastic input models. In: Winter Simulation Conference, 2015, pp. 425–436. IEEE (2015)
Grant, M., Boyd, S., Ye, Y.: CVX: MATLAB software for disciplined convex programming (2008)
Hong, M.: Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: algorithms, convergence, and applications. arXiv preprint arXiv:1604.00543 (2016)
Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
MathSciNet MATH Google Scholar
Hosseini, S., Pouryayevali, M.R.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Fuel Energy Abstr. 74(12), 3884–3895 (2011)
MathSciNet MATH Google Scholar
Huper, K., Trumpf, J.: Newton-like methods for numerical optimization on manifolds. In: Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference, vol. 1, pp. 136–139. IEEE (2004)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)
MathSciNet MATH Google Scholar
Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. Preprint arXiv:1705.09059 (2017)
Jin, J.: Fast community detection by score. Ann. Stat. 43(1), 57–89 (2015)
MathSciNet MATH Google Scholar
Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic variance reduced gradient on Grassmann manifold. arXiv preprint arXiv:1605.07367 (2016)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
MathSciNet MATH Google Scholar
Kovnatsky, A., Glashoff, K., Bronstein, M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)
Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)
MathSciNet MATH Google Scholar
Lai, Z., Xu, Y., Chen, Q., Yang, J., Zhang, D.: Multilinear sparse principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1942–1950 (2014)
Google Scholar
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst. 19, 801 (2007)
Google Scholar
Lee, J.M.: Introduction to Smooth Manifolds. Springer, New York (2013)
MATH Google Scholar
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
MathSciNet MATH Google Scholar
Liu, H., Wu, W., So, A.M.-C.: Quadratic optimization with orthogonality constraints: explicit Lojasiewicz exponent and linear convergence of line-search methods. In: ICML, pp. 1158–1167 (2016)
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)
Google Scholar
Luenberger, D.G.: The gradient projection method along geodesics. Manag. Sci. 18(11), 620–631 (1972)
MathSciNet MATH Google Scholar
Motreanu, D., Pavel, N.H.: Quasi-tangent vectors in flow-invariance and optimization problems on Banach manifolds. J. Math. Anal. Appl. 88(1), 116–132 (1982)
MathSciNet MATH Google Scholar
Nemirovski, A.: Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program. 109(2), 283–317 (2007)
MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 9, no. 4, p. 1556. Springer
Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)
MathSciNet MATH Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432(1), 70–88 (2010)
MathSciNet MATH Google Scholar
Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)
Google Scholar
Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)
Rockafellar, R.T.: Clarke’s tangent cones and the boundaries of closed sets in $\mathbb{R}^n$. Nonlinear Anal. Theory Methods Appl. 3, 145–154 (1979)
MATH Google Scholar
Smith, S.T.: Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)
MathSciNet MATH Google Scholar
Srebro, N., Jaakkola, T.: Weighted low-rank approximations. In: ICML, vol. 3, pp. 720–727 (2003)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere II: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)
MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)
Wang, S., Sun, M., Chen, Y., Pang, E., Zhou, C.: STPCA: sparse tensor principal component analysis for feature extraction. In: 21st International Conference on Pattern Recognition, 2012, pp. 2278–2281. IEEE (2012)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
MathSciNet MATH Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
MathSciNet MATH Google Scholar
Wiegele, A.: Biq Mac library—a collection of max-cut and quadratic 0–1 programming instances of medium size. Preprint (2007)
Xu, Y.: Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)
MathSciNet MATH Google Scholar
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)
MathSciNet MATH Google Scholar
Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
MathSciNet MATH Google Scholar
Ye, Y.: A. 699-approximation algorithm for max-bisection. Math. Program. 90(1), 101–111 (2001)
MathSciNet MATH Google Scholar
Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. arXiv preprint arXiv:1602.06053 (2016)
Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)
MathSciNet MATH Google Scholar
Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)
MathSciNet MATH Google Scholar
Zhang, Y., Levina, E., Zhu, J.: Detecting overlapping communities in networks using spectral methods. arXiv preprint arXiv:1412.3432 (2014)
Zhu, H., Zhang, X., Chu, D., Liao, L.: Nonconvex and nonsmooth optimization with generalized orthogonality constraints: an approximate augmented Lagrangian method. J. Sci. Comput. 72(1), 331–372 (2017)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(2), 301–320 (2005)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the associate editor and two anonymous reviewers for insightful and constructive comments that helped improve the presentation of this paper. The work of S. Ma was supported in part by a startup package in the Department of Mathematics at University of California, Davis. The work of S. Zhang was supported in part by the National Science Foundation under Grant CMMI-1462408 and in part by the Shenzhen Fundamental Research Fund under Grant KQTD2015033114415450.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, USA
Junyu Zhang & Shuzhong Zhang
Department of Mathematics, University of California, Davis, Davis, USA
Shiqian Ma
Institute of Data and Decision Analytics, The Chinese University of Hong Kong, Shenzhen, China
Shuzhong Zhang

Authors

Junyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiqian Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuzhong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs of the technical lemmas

1.1 Proof of Lemma 3.5

Proof

By the global optimality for the subproblems in Step 1 of Algorithm 1, we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) - \frac{1}{2}\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{H_i}. \end{aligned}$$

(70)

By Step 2 of Algorithm 1 we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\nonumber \\&+ \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x^k_N-x^{k+1}_N\Vert ^2. \end{aligned}$$

(71)

By Step 3, directly substituting $\lambda ^{k+1}$ into the augmented Lagrangian gives

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})= \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^k) + \frac{1}{\beta }\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

(72)

Summing up (70), (71), (72)) and apply Lemma 3.4, we obtain the following inequality,

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$

(73)

which further indicates

$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N) \nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{3L^2}{\beta }I}. \end{aligned}$$

(74)

To ensure that the right hand side of (21) is negative, we need to choose $H_i\succ \frac{6L^2}{\beta }I$, and ensure that

$$\begin{aligned} \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }<0. \end{aligned}$$

(75)

This can be proved by first viewing it as a quadratic function of $z = \frac{1}{\gamma }$. To find some $z>0$ such that

$$\begin{aligned} p(z) = \frac{6}{\beta }z^2 - 13z + \left( \frac{L+\beta }{2}+6\beta +\frac{3}{\beta }L^2\right) <0, \end{aligned}$$

we need the discriminant to be positive, i.e.

$$\begin{aligned} \Delta (\beta ) = \frac{1}{\beta ^2}(13\beta ^2-12\beta L -72L^2)>0. \end{aligned}$$

(76)

It is easy to verify that (19) suffices to guarantee (76). Solving $p(z) = 0$, we find two positive roots

$$\begin{aligned} z_{1} = \frac{13\beta -\sqrt{13\beta ^2-12\beta L -72L^2}}{12}, \text{ and } z_{2} = \frac{13\beta +\sqrt{13\beta ^2-12\beta L -72L^2}}{12}. \end{aligned}$$

Note that $\gamma $ defined in (20) satisfies $\frac{1}{z_2}<\gamma <\frac{1}{z_1}$ and thus guarantees (75). This completes the proof. $\square $

1.2 Proof of Lemma 3.8

Proof

For the subproblem in Step 1 of Algorithm 2, since $x_i^{k+1}$ is the global minimizer, we have

$$\begin{aligned}&\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N), x^{k+1}_i-x^k_i\rangle \\&\qquad -\bigg \langle \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle \\&\qquad +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i}A_jx^{k+1}_j+\sum _{j = i+1}^{N}A_jx^k_j-b\bigg \Vert ^2 + \sum _{j=1}^{i}r_j(x^{k+1}_j)+\sum _{j = i+1}^{N-1}r_j(x^k_j)\\&\quad \le -\bigg \langle \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b,\lambda ^k \bigg \rangle +\frac{\beta }{2}\bigg \Vert \sum _{j=1}^{i-1}A_jx^{k+1}_j+\sum _{j = i}^{N}A_jx^k_j-b\bigg \Vert ^2 \\&\qquad + \sum _{j=1}^{i-1}r_j(x^{k+1}_j)+\sum _{j = i}^{N-1}r_j(x^k_j)-\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2_{H_i}. \end{aligned}$$

By the L-Lipschitz continuity of $\nabla _i f$, we have

$$\begin{aligned}&f(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N)\\&\quad \le f(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N) +\langle \nabla _if(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i, \ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle \\&\qquad +\frac{L}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Combining the above two inequalities and using the definition of $\mathcal {L}_{\beta }$ in (16), we have

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k)\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots , x^k_N,\lambda ^k)\nonumber \\&-\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$

(77)

Summing (77) over $i=1,\ldots ,N-1$, we have the following inequality, which is the counterpart of (70):

$$\begin{aligned} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\le \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_N,\lambda ^k)-\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i \Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I}. \end{aligned}$$

(78)

Besides, since (71) and (72) still hold, by combining (78), (71) and (72) and applying Lemma 3.4, we establish the following two inequalities, which are respectively the counterparts of (73) and (21):

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1})- \mathcal {L}_{\beta } (x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^k) \nonumber \\&\quad \le \left[ \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{3}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \nonumber \\&\qquad +\frac{3}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I},\nonumber \\ \end{aligned}$$

(79)

and

$$\begin{aligned}&\Psi _G(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N) - \Psi _G(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)\\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{6}{\beta }\left( \beta -\frac{1}{\gamma } \right) ^2+\frac{3L^2}{\beta }\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad - \sum _{i = 1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i -\frac{L}{2}I-\frac{3L^2}{\beta }I}. \end{aligned}$$

From the proof of Lemma 3.5, it is easy to see that the right hand side of the above inequality is negative, if $H_i\succ \left( \frac{6L^2}{\beta }+L\right) I$ and $\beta $ and $\gamma $ are chosen according to (19) and (20). $\square $

1.3 Proof of Lemma 3.11

Proof

For the ease of notation, we denote

$$\begin{aligned} G_i^M(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k) = \nabla _i f(x_1^{k+1},\ldots ,x_{i-1}^{k+1},x_i^k,\ldots ,x_N^k)+\delta _i^k. \end{aligned}$$

(80)

Note that $\delta _i^k$ is a zero-mean random variable. By Steps 2 and 3 of Algorithm 3 we obtain

$$\begin{aligned} \lambda ^{k+1} = \left( \beta -\frac{1}{\gamma }\right) (x_N^k-x_N^{k+1}) +\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)+\delta _N^k. \end{aligned}$$

(81)

Applying (81) for k and $k+1$, and using (81), we get

$$\begin{aligned} \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2= & {} \bigg \Vert \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)-\left( \beta -\frac{1}{\gamma }\right) (x^{k-1}_N-x^k_N) +(\delta _N^k-\delta _N^{k-1})\\&+(\nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)-\nabla _Nf(x^k_1, \ldots ,x^k_{N-1},x^{k-1}_N)\bigg \Vert ^2 \\\le & {} 4\left( \beta -\frac{1}{\gamma }\right) ^2\Vert x^k_N-x^{k+1}_N\Vert ^2+4 \left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^{k-1}_N-x^k_N\Vert ^2 \\&+4L^2\sum _{i=1}^{N-1}\Vert x^k_i-x^{k+1}_i\Vert ^2 + 4\Vert \delta _N^k -\delta _N^{k-1}\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and using $\textsf {E} [\langle \delta _N^k,\delta _N^{k-1}\rangle ] = 0$ completes the proof. $\square $

1.4 Proof of Lemma 3.12

Proof

Similar as (77), by further incorporating (80), we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i},x^k_{i+1},\ldots ,x^k_N,\lambda ^k) -\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{i-1},x^k_i,\ldots ,x^k_N,\lambda ^k)\\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\langle \delta _i^k,x^{k+1}_i-x^k_i\rangle \\&\quad \le -\Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{H_i}{2}-\frac{L}{2}I} +\frac{1}{2}\Vert \delta _i^k\Vert ^2+\frac{1}{2}\Vert x^{k+1}_i-x^k_i\Vert ^2. \end{aligned}$$

Taking expectation with respect to all random variables on both sides and summing over $i=1,\ldots ,N-1$, and using (36), we obtain

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)] -\textsf {E} [\mathcal {L}_{\beta }(x^k_1,\ldots ,x^k_N,\lambda ^k)] \nonumber \\&\quad \le -\sum _{i=1}^{N-1}\textsf {E} \left[ \Vert x^{k+1}_i-x^k_i\Vert ^2_{\frac{1}{2} H_i-\frac{L+1}{2}I}\right] +\frac{N-1}{2M}\sigma ^2. \end{aligned}$$

(82)

Note that by the Step 2 of Algorithm 3 and the descent lemma we have

$$\begin{aligned} 0= & {} \bigg \langle x_N^k - x_N^{k+1}, \nabla _N f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) + \delta _N^k - \lambda ^k + \beta \left( \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\right) \\&\quad - \frac{1}{\gamma }(x_N^k-x_N^{k+1})\bigg \rangle \\\le & {} f(x_1^{k+1},\ldots ,x_{N-1}^{k+1},x_N^k) - f(x^{k+1}) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }\right) \Vert x_N^{k+1}-x_N^k\Vert ^2 - \langle \lambda ^k,x_N^k-x_N^{k+1}\rangle \\&+ \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^k-b\Vert ^2 - \frac{\beta }{2}\Vert \sum _{j=1}^{N-1}A_jx_j^{k+1}+x_N^{k+1}-b\Vert ^2 + \langle \delta _N^k,x_N^k-x_N^{k+1}\rangle \\\le & {} \mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k) - \mathcal {L}_{\beta }(x^{k+1},\lambda ^k) + \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \Vert x^k_N-x^{k+1}_N\Vert ^2 + \frac{1}{2}\Vert \delta _N^k\Vert ^2. \end{aligned}$$

Taking the expectation with respect to all random variables yields

$$\begin{aligned}&\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^k)]- \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)]\nonumber \\&\le \left( \frac{L+\beta }{2}-\frac{1}{\gamma }+\frac{1}{2}\right) \textsf {E} [\Vert x^k_N -x^{k+1}_N\Vert ^2]+\frac{1}{2M}\sigma ^2. \end{aligned}$$

(83)

The following equality holds trivially from Step 3 of Algorithm 3:

$$\begin{aligned} \textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})]-\textsf {E} [\mathcal {L}_{\beta }(x^{k+1}_1, \ldots ,x^{k+1}_N,\lambda ^{k})] = \frac{1}{\beta }\textsf {E} [\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2]. \end{aligned}$$

(84)

Combining (82), (83), (84) and (38), we obtain

$$\begin{aligned}&\textsf {E} [\Psi _S(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^{k+1}_N,\lambda ^{k+1},x^k_N)] - \textsf {E} [\Psi _S(x^k_1,\ldots ,x^k_{N-1},x^k_N,\lambda ^{k},x^{k-1}_N)]\nonumber \\&\quad \le \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] \textsf {E} [\Vert x^k_N-x^{k+1}_N\Vert ^2] \nonumber \\&\qquad - \sum _{i = 1}^{N-1}\textsf {E} \left[ \Vert x^k_i-x^{k+1}_i\Vert ^2_{\frac{1}{2}H_i-\frac{4L^2}{\beta }I -\frac{L+1}{2}I}\right] +\left( \frac{8}{\beta }+\frac{1}{2}+\frac{N-1}{2}\right) \frac{\sigma ^2}{M}. \end{aligned}$$

(85)

Choosing $\beta $ and $\gamma $ according to (40) and (41), and using the similar arguments in the proof of Lemma 3.5, it is easy to verify that

$$\begin{aligned} \left[ \frac{\beta +L}{2}-\frac{1}{\gamma }+\frac{8}{\beta } \left( \beta -\frac{1}{\gamma }\right) ^2+\frac{4L^2}{\beta }+\frac{1}{2}\right] <0. \end{aligned}$$

By further choosing $H_i\succ \left( \frac{8L^2}{\beta }+L+1\right) I$, we know that the right hand side of (85) is negative, and this completes the proof. $\square $

1.5 Proof of Lemma 3.13

Proof

From (81) and (15), we have that

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_N,\lambda ^{k+1})\\&\quad = \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}) - \bigg \langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}) + \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N) \\&\qquad + \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})+\delta _N^k\bigg \rangle + \frac{\beta }{2}\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2\\&\quad \ge \sum _{i = 1}^{N-1}r_i(x^{k+1}_i) + f(x^{k+1}_1,\ldots ,x^{k+1}_{N-1}, b-\sum _{i=1}^{N-1}A_ix^{k+1}_i) -\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2+L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 \\&\qquad + \bigg (\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2}\bigg )\bigg \Vert \sum _{i = 1}^NA_ix^{k+1}_i-b\bigg \Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2\\&\quad \ge \sum _{i=1}^{N-1}r_i^*+f^*-\frac{4}{\beta }\left[ \left( \beta -\frac{1}{\gamma }\right) ^2 +L^2\right] \Vert x^k_N-x^{k+1}_N\Vert ^2 - \frac{2}{\beta }\Vert \delta _N^k\Vert ^2 \\ \end{aligned}$$

where the first inequality is obtained by applying $\langle a, b\rangle \le \frac{1}{2}(\frac{1}{\eta }\Vert a\Vert ^2+\eta \Vert b\Vert ^2)$ to terms $\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \left( \beta -\frac{1}{\gamma }\right) (x^k_N-x^{k+1}_N)\rangle $, $\langle \sum _{i = 1}^NA_ix^{k+1}_i-b, \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) - \nabla _Nf(x^{k+1})\rangle $ and $\langle \sum _{i = 1}^NA_ix^{k+1}_i-b,\delta _N^k\rangle $ respectively with $\eta = \frac{8}{\beta }, \frac{8}{\beta }$ and $\frac{4}{\beta }$. Note that $\beta >2L$ according to (40), thus $(\frac{\beta }{2}-\frac{\beta }{8}-\frac{\beta }{8}-\frac{L}{2})>0$ and the last inequality holds. By rearranging the terms and taking expectation with respect to all random variables completes the proof. $\square $

1.6 Proof for Theorem 3.19

Proof

Through similar argument, one can easily obtain

$$\begin{aligned} \Vert \lambda ^{k+1} - \nabla _N f(x^{k+1}_1,\ldots ,x^{k+1}_N)\Vert ^2\le \kappa _2\theta _k\quad \text{ and } \quad \left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i+x^{k+1}_N-b\right\| ^2 \le \kappa _1\theta _k, \end{aligned}$$

where $\theta _k = \sum _{i=1}^N(\Vert t_i^{k+1}g_i^{k+1}\Vert ^2+\Vert t_i^kg_i^k\Vert ^2+\Vert t_i^{k-1}g_i^{k-1}\Vert ^2)$. The only remaining task is to guarantee an $\epsilon $ version of (48). First let us prove that

$$\begin{aligned} \Vert g_i^{k+1}\Vert \le \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}}. \end{aligned}$$

(86)

Denote $h_i(x_i) = \mathcal {L}_{\beta }(x^{k+2}_1,\ldots ,x^{k+2}_{i-1},x_i,x^{k+1}_{i+1},\ldots ,x^{k+1}_N,\lambda ^{k+1})$ and $Y_i(t) = R(x^{k+1}_i,-tg_i^{k+1})$, then it is not hard to see that $\nabla h_i(x_i)$ is Lipschitz continuous with parameter $L+\beta \Vert A_i\Vert _2^2 \le L_3:=L+\beta A_{\max }^2$. Consequently, it yields

$$\begin{aligned} h_i(Y_i(t))\le & {} h_i(Y_i(0)) + \langle \nabla h_i(Y_i(0)), Y_i(t) - Y_i(0) - tY_i'(0) + tY'_i(0)\rangle \\&+ \frac{L_3}{2} \Vert Y_i(t) - Y_i(0)\Vert ^2 \\\le & {} h_i(Y_i(0)) + t\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle + L_2t^2\Vert \nabla h_i(Y_i(0))\Vert \Vert Y'_i(0)\Vert ^2 \\&+ \frac{L_3L_1^2}{2}t^2\Vert Y'_i(0)\Vert ^2 \\= & {} h_i(Y_i(0)) - \left( t-L_2t^2\Vert \nabla h_i(Y_i(0))\Vert - \frac{L_3L_1^2}{2}t^2\right) \Vert Y'_i(0)\Vert ^2, \end{aligned}$$

where the last equality is due to $\langle \nabla h_i(Y_i(0)),Y_i'(0)\rangle = -\langle Y_i'(0),Y_i'(0)\rangle $. Also note the relationship

$$\begin{aligned} \Vert Y_i'(0)\Vert = \Vert g_i^{k+1}\Vert = \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\big \{\nabla h_i(Y_i(0))\big \}\Vert \le \Vert \nabla h_i(Y_i(0))\Vert . \end{aligned}$$

Note that $\left\| \sum _{i=1}^{N-1}A_ix^{k+1}_i{+}x^{k+1}_N{-}b\right\| \le \sqrt{\kappa _1\theta _k}{\le } \sqrt{\frac{\kappa _1}{\tau }(\Psi _G(x_1^1,\ldots ,x_N^1,\lambda ^1,x_N^0){-}f^*)}.$ Because $\mathcal {M}_i, i = 1,\ldots ,N-1$ are all compact submanifolds, $x^{k+1}_i, i = 1,\ldots ,N-1$ are all bounded. Hence the whole sequence $\{x_N^{k}\}$ is also bounded. By (27) (which also holds in this case),

$$\begin{aligned} \Vert \lambda ^{k+1}\Vert \le |\beta -\frac{1}{\gamma }|\sqrt{\theta _k}+\Vert \nabla _Nf(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N)\Vert . \end{aligned}$$

By the boundedness of $\{(x^k_1,\ldots ,x^k_N)\}$ and the continuity of $\nabla f(\cdot )$, the second term is bounded. Combining the boundedness of $\{\theta _k\}$, we know that whole sequence $\{\lambda ^k\}$ is bounded. Consequently, there exists a constant $C>0$ such that $\Vert \nabla h_i(Y_i(0))\Vert \le C,$ where

$$\begin{aligned} \nabla h_i(Y_i(0))= & {} \nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&+\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg ). \end{aligned}$$

Note that this constant C depends only on the first two iterates $\{x_1^t,\ldots ,x_N^t,\lambda ^t\}, t = 0,1,$ except for the absolute constants such as $\Vert A_i\Vert _2,i = 1,\ldots ,N$. Therefore, when

$$\begin{aligned} t\le \frac{2}{2L_2C+\sigma +L_3L_1^2}\le \frac{2}{2L_2\Vert \nabla h_i(Y_i(0))\Vert +\sigma +L_3L_1^2}, \end{aligned}$$

it holds that

$$\begin{aligned} h_i(Y_i(t))\le h_i(x^{k+1}_i) - \frac{\sigma }{2}t^2\Vert g_i^{k+1}\Vert ^2. \end{aligned}$$

Note that $\sigma >\frac{2\alpha }{s}$, by the terminating rule of the line-search step, we have

$$\begin{aligned} t_i^k\ge \min \left\{ s, \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}\right\} = \frac{2\alpha }{2L_2C+\sigma +L_3L_1^2}. \end{aligned}$$

Then by noting

$$\begin{aligned} \frac{2\alpha \Vert g_i^{k+1}\Vert }{2L_2C+\sigma +L_3L_1^2}\le t_i^{k+1}\Vert g_i^{k+1}\Vert \le \sqrt{\theta _k}, \end{aligned}$$

we have (86).

Now let us discuss the issue of (48). By definition,

$$\begin{aligned} g_i^{k+1}= & {} {\mathrm {Proj}}\, _{\mathcal {T}_{x^{k+1}_i}\mathcal {M}_i}\bigg \{\nabla _if(x_1^{k+2},\ldots ,x^{k+2}_{i-1},x^{k+1}_i,\ldots ,x^{k+1}_N) - A_i^\top \lambda ^{k+1}\\&\quad +\beta A_i^\top \bigg (\sum _{j=1}^{i-1}A_jx^{k+2}_j+\sum _{j = i}^N A_jx^{k+1}_j - b\bigg )\bigg \}. \end{aligned}$$

Consequently, we obtain

$$\begin{aligned}&\biggl \Vert {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\biggl \{\nabla _i f(x^{k+1})-A_i^\top \lambda ^{k+1}\biggr \}\biggr \Vert \\&\quad = \left\| {\mathrm {Proj}}\, _{\mathcal {T}_{x_i^{k+1}}\mathcal {M}_i}\left\{ \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1}) + g_i^{k+1} \right. \right. \\&\qquad - \left. \left. \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) + \beta A_i^\top \left( \sum _{j = 1}^{i-1}A_j(x_j^{k+1}-x_j^{k+2})\right) \right\} \right\| \\&\quad \le \Vert \nabla _i f(x^{k+1})-\nabla _if(x_1^{k+2},\ldots , x_{i-1}^{k+2},x_{i}^{k+1},\ldots ,x_{N}^{k+1})\Vert + \left\| \beta A_i^\top \left( \sum _{j=1}^NA_jx_j^{k+1}-b\right) \right\| \\&\qquad + \Vert g_i^{k+1}\Vert +\left\| \beta A_i^\top \left( \sum _{j = i+1}^{N}A_j(x_j^{k+1}-x_j^{k+2}) \right) \right\| \\&\quad \le \left( L+\sqrt{N}\beta A_{\max }^2\right) \max \{L_1,1\}\sqrt{\theta _k} + \frac{\sigma +2L_2C+(L+\beta A_{\max }^2)L_1^2}{2\alpha }\sqrt{\theta _{k}} + \beta \Vert A_i\Vert _2 \sqrt{\kappa _1\theta _k} \\&\quad \le \sqrt{\kappa _3\theta _{k}}. \end{aligned}$$

$\square $

1.7 Proof for inequality (60)

Proof

First, we need to figure out the Lipschitz constant of $\bar{f}_{\beta }$.

$$\begin{aligned}&\Vert \nabla \bar{f}_{\beta }(x)-\nabla \bar{f}_{\beta }(y)\Vert \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \left\| \left[ \left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_1,\ldots ,\left( \sum _{j =1}^NA_j(x_j-y_j)\right) ^\top A_N\right] \right\| \nonumber \\&\quad \le L\Vert x-y\Vert + \beta \sqrt{N}\max _{1\le i\le N}\Vert A_i\Vert _2\left\| \sum _{j =1}^NA_j(x_j-y_j) \right\| \nonumber \\&\quad \le \left( L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 \right) \Vert x-y\Vert . \end{aligned}$$

(87)

So we define $\hat{L} = L+\beta N\max _{1\le i\le N}\Vert A_i\Vert _2^2 $ as the Lipschitz constant for function $\bar{f}_{\beta }.$ The global optimality of the subproblem (59) yields

$$\begin{aligned}&\langle \nabla _i\bar{f}_{\beta }(x^k_1,\ldots ,x^k_N),x^{k+1}_i-x^k_i\rangle -\langle \lambda ^k,A_ix^{k+1}_i\rangle + r_i(x^{k+1}_i)+\frac{1}{2}\Vert x^{k+1}_i\\&\quad -x^k_i\Vert ^2_{H_i} \le r_i(x^k_i) - \langle \lambda ^k,A_ix^k_i\rangle . \end{aligned}$$

By the descent lemma we have

$$\begin{aligned}&\mathcal {L}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N,\lambda ^k)\\&\quad = \bar{f}_{\beta }(x^{k+1}_1,\ldots ,x^{k+1}_{N-1},x^k_N) -\left\langle \lambda ^k,\sum _{i=1}^{N}A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i) \\&\quad \le \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N) +\langle \nabla \bar{f}_{\beta }(x^k_1,\ldots ,x^k_{N-1},x^k_N),x^{k+1}-x^k\rangle \\&\qquad \frac{\hat{L}}{2}\Vert x^{k+1}-x^k\Vert ^2-\left\langle \lambda ^k,\sum _{i=1}^{N} A_ix^{k+1}_i-b\right\rangle +\sum _{i=1}^{N-1}r_i(x^{k+1}_i). \end{aligned}$$

Combining the above two inequalities yields (60). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Ma, S. & Zhang, S. Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis. Math. Program. 184, 445–490 (2020). https://doi.org/10.1007/s10107-019-01418-8

Download citation

Received: 29 January 2018
Accepted: 01 August 2019
Published: 10 August 2019
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10107-019-01418-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Abstract

Access this article

Similar content being viewed by others

Tensor theta norms and low rank recovery

Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proofs of the technical lemmas

1.1 Proof of Lemma 3.5

Proof

1.2 Proof of Lemma 3.8

Proof

1.3 Proof of Lemma 3.11

Proof

1.4 Proof of Lemma 3.12

Proof

1.5 Proof of Lemma 3.13

Proof

1.6 Proof for Theorem 3.19

Proof

1.7 Proof for inequality (60)

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Abstract

Access this article

Similar content being viewed by others

Tensor theta norms and low rank recovery

Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proofs of the technical lemmas

Proofs of the technical lemmas

1.1 Proof of Lemma 3.5

Proof

1.2 Proof of Lemma 3.8

Proof

1.3 Proof of Lemma 3.11

Proof

1.4 Proof of Lemma 3.12

Proof

1.5 Proof of Lemma 3.13

Proof

1.6 Proof for Theorem 3.19

Proof

1.7 Proof for inequality (60)

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation