Abstract
A constraint-reduced Mehrotra-predictor-corrector algorithm for convex quadratic programming is proposed. (At each iteration, such algorithms use only a subset of the inequality constraints in constructing the search direction, resulting in CPU savings.) The proposed algorithm makes use of a regularization scheme to cater to cases where the reduced constraint matrix is rank deficient. Global and local convergence properties are established under arbitrary working-set selection rules subject to satisfaction of a general condition. A modified active-set identification scheme that fulfills this condition is introduced. Numerical tests show great promise for the proposed algorithm, in particular for its active-set identification scheme. While the focus of the present paper is on dense systems, application of the main ideas to large sparse systems is briefly discussed.
Similar content being viewed by others
Notes
We however do not fully follow [34]: (i) Equation (8) generalizes (22) of [34] to CQP; (ii) In (9) we explicitly bound \(\varvec{\lambda }^+\) (\(x^+\) in [34]), by \(\lambda ^{\max }\); in the linear case, such boundedness is guaranteed (Lemma 3.3 in [34]); as a side-effect, in (7), we could drop the penultimate term in (24) of [34] (invoked in proving convergence of the x sequence in the proof of Lemma 3.4 of [34]); (iii) We do not restrict the primal step size as done in (25) of [34] (dual step size in the context of [34]), at the expense of a slightly more involved convergence proof: see our Proposition 3 below, to be compared to [34, Lemma 3.7].
In the case that \(q=0\) (Q is empty), \(\gamma \) is chosen to be zero. Note that, in such case, there is no corrector direction, as the right-hand side of (13) vanishes.
The “modified MPC algorithm” outlined in Sect. 2.1 is recovered as a special case by setting \(\varrho ^+=0\) and \(Q=\{1,\ldots ,m\}\) in Step 2 of Algorithm CR-MPC.
For scaling reasons, it may be advisable to set the value of \({\bar{E}}\) to the initial value of \(E({\mathbf {x}},\varvec{\lambda })\) (so that, in Step 2 of the initial iteration, \(\varrho \) is set to 1, and W to \(H+R\)). This was done in the numerical tests reported in Sect. 3.
Here it is implicitly assumed that \({\mathcal {F}}_P^o\) is nonempty. This assumption is subsumed by Assumption 1 below.
It is readily checked that, given the simple form in which \({\mathbf {z}}\) enters the constraints, for dense problems, the cost of forming \(M^{(Q)}\) still dominates and is still approximately \(|Q|n^2/2\), with still, typically, \(|Q|\ll m\).
Nonemptiness and boundedness of \({\mathcal {F}}_P^*\) are equivalent to dual strict feasibility (e.g., [7, Theorem 2.1]).
Equivalently (under the sole assumption that \({\mathcal {F}}_P^*\) is nonempty) \(A_{{\mathcal {A}}({\mathbf {x}})}\) has full row rank at all \({\mathbf {x}}\in {\mathcal {F}}_P\). In fact, while we were not able to carry out the analysis without such (strong) assumption (the difficulty being to rule out convergence to non-optimal stationary points), numerical experimentation suggests that the assumption is immaterial.
In fact, given any known upper bound \({\overline{z}}\) to \(\{{\mathbf {z}}^k\}\), this assumption can be relaxed to merely requiring linear independence of the set \(\{{\mathbf {a}}_i : b_i-{{\overline{z}}} \le {\mathbf {a}}_i^T{\mathbf {x}}\le b_i\}\), which tends to the set of active constraints when \({\overline{z}}\) goes to zero. This can be done, e.g., with \({\overline{z}}=c\Vert {\mathbf {z}}^0\Vert _\infty \), with any \(c>1\), if the constraint \({\mathbf {z}}\le {\overline{z}}{\mathbf {1}}\) is added to the augmented problem.
In particular, if \({\mathbf {x}}^*\) is an unconstrained minimizer, the working set Q is eventually empty, and Algorithm CR-MPC reverts to a simple regularized Newton method (and terminates in one additional iteration if \(H\succ {\mathbf {0}}\) and \(R={\mathbf {0}}\)).
We also ran comparison tests with the constraint-reduced algorithm of [23], for which polynomial complexity was established (as was superlinear convergence) for general semi-definite optimization problems. As was expected, that algorithm could not compete (orders of magnitude slower) with algorithms specifically targeting CQP.
An alternative approach to take care of ill-conditioned \(M_{(Q)}\) is to apply a variant of the Cholesky factorization that handles positive semi-definite matrices, such as the Cholesky-infinity factorization (i.e., cholinc(X,’inf’) in Matlab) or the diagonal pivoting strategy discussed in [36, Chapter 11]. Either implementation does not make notable difference in the numerical results reported in this paper, since the Cholesky factorization fails in fewer than \(1\%\) of the tested problems.
Interestingly, on strongly convex problems, most rules (and especially Rule R) need a smaller number of iterations than Rule All (except for \(n=500\))!
We also ran the tests without noise and with noise of variance between 0 and 1, and the results were very similar to the ones reported here.
It may also be worth pointing out that a short decade ago, in [33], the performance of an early version of a constraint-reduced MPC algorithm (with a more elementary constraint-selection rule than Rule JOT) was compared, on imbalanced filter-design applications (linear optimization), to the “revised primal simplex with partial pricing” algorithm discussed in [2], with encouraging results: on the tested problems, the constraint-reduced code proved competitive with the simplex code on some such problems and superior on others.
References
Altman, A., Gondzio, J.: Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization. Optim. Methods Softw. 11(1–4), 275–302 (1999)
Bertsimas, D., Tsitsiklis, J.: Introduction to Linear Optimization. Athena, Austin (1997)
Cartis, C., Yan, Y.: Active-set prediction for interior point methods using controlled perturbations. Comput. Optim. Appl. 63(3), 639–684 (2016)
Castro, J., Cuesta, J.: Quadratic regularizations in an interior-point method for primal block-angular problems. Math. Program. 130(2), 415–445 (2011)
Chen, L., Wang, Y., He, G.: A feasible active set QP-free method for nonlinear programming. SIAM J. Optim. 17(2), 401–429 (2006)
Dantzig, G.B., Ye, Y.: A build-up interior-point method for linear programming: affine scaling form. Technical report, University of Iowa, Iowa City (1991)
Drummond, L., Svaiter, B.: On well definedness of the central path. J. Optim. Theory Appl. 102(2), 223–237 (1999)
Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active constraints. SIAM J Optim. 9(1), 14–32 (1998)
Gill, P.E., Murray, W., Ponceleón, D.B., Saunders, M.A.: Solving reduced KKT systems in barrier methods for linear programming. In: Watson, G.A., Griffiths, D. (eds.) Numerical Analysis 1993. Pitman Research Notes in Mathematics 303, pp. 89–104. Longmans Press, New York (1994)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer, Berlin (2008)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1 (2014). http://cvxr.com/cvx. Accessed 27 Feb 2019
Hager, W.W., Seetharama Gowda, M.: Stability in the presence of degeneracy and error estimation. Math. Program. 85(1), 181–192 (1999)
He, M.: Infeasible constraint reduction for linear and convex quadratic optimization. Ph.D. thesis, University of Maryland (2011). http://hdl.handle.net/1903/12772. Accessed 27 Feb 2019
He, M.Y., Tits, A.L.: Infeasible constraint-reduced interior-point methods for linear optimization. Optim. Methods Softw. 27(4–5), 801–825 (2012)
Hertog, D., Roos, C., Terlaky, T.: Adding and deleting constraints in the logarithmic barrier method for LP. In: Du, D.Z., Sun, J. (eds.) Advances in Optimization and Approximation, pp. 166–185. Kluwer Academic Publishers, Dordrecht (1994)
Jung, J.H.: Adaptive constraint reduction for convex quadratic programming and training support vector machines. Ph.D. thesis, University of Maryland (2008). http://hdl.handle.net/1903/8020. Accessed 27 Feb 2019
Jung, J.H., O’Leary, D.P., Tits, A.L.: Adaptive constraint reduction for training support vector machines. Electron. Trans. Numer. Anal. 31, 156–177 (2008)
Jung, J.H., O’Leary, D.P., Tits, A.L.: Adaptive constraint reduction for convex quadratic programming. Comput. Optim. Appl. 51(1), 125–157 (2012)
Laiu, M.P.: Positive filtered P\(_{N}\) method for linear transport equations and the associated optimization algorithm. Ph.D. thesis, University of Maryland (2016). http://hdl.handle.net/1903/18732. Accessed 27 Feb 2019
Laiu, M.P., Hauck, C.D., McClarren, R.G., O’Leary, D.P., Tits, A.L.: Positive filtered P\(_{N}\) moment closures for linear kinetic equations. SIAM J. Numer. Anal. 54(6), 3214–3238 (2016)
Mehrotra, S.: On the implementation of a primal–dual interior point method. SIAM J. Optim. 2(4), 575–601 (1992)
Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
Park, S.: A constraint-reduced algorithm for semidefinite optimization problems with superlinear convergence. J. Optim. Theory Appl. 170(2), 512–527 (2016)
Park, S., O’Leary, D.P.: A polynomial time constraint-reduced algorithm for semidefinite optimization problems. J. Optim. Theory Appl. 166(2), 558–571 (2015)
Saunders, M.A., Tomlin, J.A.: Solving regularized linear programs using barrier methods and KKT systems. Technical report, SOL 96-4. Department of Operations Research, Stanford University (1996)
Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Tits, A., Wächter, A., Bakhtiari, S., Urban, T., Lawrence, C.: A primal–dual interior-point method for nonlinear programming with strong global and local convergence properties. SIAM J. Optim. 14(1), 173–199 (2003)
Tits, A.L., Absil, P.A., Woessner, W.P.: Constraint reduction for linear programs with many inequality constraints. SIAM J. Optim. 17(1), 119–146 (2006)
Tits, A.L., Zhou, J.L.: A simple, quadratically convergent algorithm for linear and convex quadratic programming. In: Hager, W., Hearn, D., Pardalos, P. (eds.) Large Scale Optimization: State of the Art, pp. 411–427. Kluwer Academic Publishers, Dordrecht (1994)
Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3: A Matlab software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999)
Tone, K.: An active-set strategy in an interior point method for linear programming. Math. Program. 59(1), 345–360 (1993)
Tütüncü, R.H., Toh, K.C., Todd, M.J.: Solving semidefinite-quadratic-linear programs using SDPT3. Math. Program. 95(2), 189–217 (2003)
Winternitz, L.: Primal–dual interior-point algorithms for linear programming problems with many inequality constraints. Ph.D. thesis, University of Maryland (2010). http://hdl.handle.net/1903/10400. Accessed 27 Feb 2019
Winternitz, L.B., Nicholls, S.O., Tits, A.L., O’Leary, D.P.: A constraint-reduced variant of Mehrotra’s predictor–corrector algorithm. Comput. Optim. Appl. 51(1), 1001–1036 (2012)
Winternitz, L.B., Tits, A.L., Absil, P.A.: Addressing rank degeneracy in constraint-reduced interior-point methods for linear optimization. J. Optim. Theory App. 160(1), 127–157 (2014)
Wright, S.J.: Primal–Dual Interior-Point Methods. SIAM, Philadelphia (1997)
Wright, S.J.: Modifying SQP for degenerate problems. SIAM J. Optim. 13(2), 470–497 (2002)
Ye, Y.: A “build-down” scheme for linear programming. Math. Program. 46(1), 61–72 (1990)
Zhang, Y., Zhang, D.: On polynomiality of the Mehrotra-type predictor–corrector interior-point algorithms. Math. Program. 68(1), 303–318 (1995)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This manuscript has been authored, in part, by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
M. P. Laiu: This author’s research was sponsored by the Office of Advanced Scientific Computing Research and performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725.
Appendices
Appendix
The following results are used in the proofs in Appendices A and B. Here we assume that \(Q\subseteq {\mathbf {m}}\) and W is symmetric, with \(W\succeq H\succ {\mathbf {0}}\). First, from (10) and (13), the approximate MPC search direction \((\varDelta {\mathbf {x}}, \varDelta \varvec{\lambda }_Q, \varDelta {\mathbf {s}}_Q)\) defined in (16) solves
and equivalently, when \({\mathbf {s}}_Q>{\mathbf {0}}\), from (20), (21) and (15),
Next, with \({\tilde{\varvec{\lambda }}}^+\) and \({{\tilde{\varvec{\lambda }}}}^{\text {a},+}\) given by
from the last equation of (33) and from (21), we have
and hence
so that, when in addition \(\varvec{\lambda }>{\mathbf {0}}\), \((\varDelta {\mathbf {x}}^\text {a})^T(A_Q)^T{{\tilde{\varvec{\lambda }}}}^{\text {a},+} =0\) if and only if \(A_Q\varDelta {\mathbf {x}}^\text {a}={\mathbf {0}}\). Also, (20) yields
Since \(W\succeq H\), it follows from (37) that,
In addition, when \(S_Q\succ {\mathbf {0}}\), \(\varLambda _Q\succ {\mathbf {0}}\) and since \(W\succeq {\mathbf {0}}\), the right-hand side of (38) is strictly negative as long as \(W\varDelta {\mathbf {x}}^\text {a}\) and \(A_Q\varDelta {\mathbf {x}}^\text {a}\) are not both zero. In particular, when \([W~(A_Q)^T]\) has full row rank,
Finally, we state and prove two technical lemmas.
Lemma 3
Given an infinite index set K, \(\{\varDelta {\mathbf {x}}^k\}\rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\) if and only if \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\).
Proof
We show that \(\Vert \varDelta {\mathbf {x}}^k\Vert \) is sandwiched between constant multiples of \(\Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert \). We have from the search direction given in (16) that, for all k, \(\Vert \varDelta {\mathbf {x}}^k-\varDelta {\mathbf {x}}^{\text {a},k}\Vert = \Vert \gamma \varDelta {\mathbf {x}}^{\text {c},k}\Vert \le \tau \Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert \), where \(\tau \in (0,1)\) and the inequality follows from (7). Apply triangle inequality leads to \((1-\tau )\Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert \le \Vert \varDelta {\mathbf {x}}^k\Vert \le (1+\tau )\Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert \) for all k, proving the claim. \(\square \)
Lemma 4
Suppose Assumption 1 holds. Let \(Q\subset {\mathbf {m}}\), \({\mathcal {A}}\subseteq Q\), \({\mathbf {x}}\in {\mathcal {F}}_P^o\), \({\mathbf {s}}:=A{\mathbf {x}}-{\mathbf {b}}~(>{\mathbf {0}})\), and \(\varvec{\lambda }>{\mathbf {0}}\) enjoy the following property: With \(\varDelta \varvec{\lambda }_Q\), \(\varDelta \varvec{\lambda }^{\text {a}}_Q\), \(\varDelta {\mathbf {s}}\), and \(\varDelta {\mathbf {s}}^{\text {a}}\) produced by Iteration CR-MPC, \(\lambda _i+\varDelta \lambda _i>0\) for all \(i\in {\mathcal {A}}\) and \(s_i+\varDelta s_i>0\) for all \(i\in Q\setminus {\mathcal {A}}\). Then
and
Proof
If \({\bar{\alpha }}_\text {d}\ge 1\), (41) holds trivially, hence suppose \({\bar{\alpha }}_\text {d}< 1\). Then, from the definition of \({\bar{\alpha }}_\text {d}\) in (24), we know that there exists some index \(i_0\in Q\) such that
Since \(\lambda _i+\varDelta \lambda _i>0\) for all \(i\in {\mathcal {A}}\), we have \(i_0\in Q\setminus {\mathcal {A}}\). Now we consider two cases: \(|\varDelta \lambda _{i_0}^\text {a}|\ge |\varDelta \lambda _{i_0}|\) and \(|\varDelta \lambda _{i_0}^\text {a}|<|\varDelta \lambda _{i_0}|\). If \(|\varDelta \lambda _{i_0}^\text {a}|\ge |\varDelta \lambda _{i_0}|\), then, since the second equation in (21) is equivalently written as \(\lambda _i s_i+s_i\varDelta \lambda _i^\text {a}+\lambda _i\varDelta s_i^\text {a}=0\) for all \(i\in Q\) and since \(\lambda _s s_i>0\) for all \(i\in {\mathbf {m}}\), it follows from (43) that
proving (41). To conclude, suppose now that \(|\varDelta \lambda _{i_0}^\text {a}|<|\varDelta \lambda _{i_0}|\). Since (i) \(s_i+\varDelta s_i>0\) for \(i\in Q\setminus {\mathcal {A}}\); (ii) \(\gamma \), \(\sigma \), and \(\mu _{(Q)}\) in (35) are non-negative; and (iii) \(\varDelta \lambda _{i_0}<0\) (from (43)), (34), (35) yield
Applying this inequality to (43) leads to
where the last inequality holds since \(\gamma \le 1\) and \(|\varDelta \lambda _{i_0}^\text {a}|<|\varDelta \lambda _{i_0}|\). Following a very similar argument that flips the roles of \({\mathbf {s}}\) and \(\varvec{\lambda }\), one can prove that (42) also holds. \(\square \)
A Proof of Theorem 1 and Corollary 2
Parts of this proof are inspired from [16, 18, 29, 34, 35]. Throughout, we assume that the constraint-selection rule used by the algorithm is such that Condition CSR is satisfied and (except in the proof of Lemma 13) we let \(\varepsilon =0\) and assume that the iteration never stops.
A central feature of Algorithm CR-MPC, which plays a key role in the convergence proofs, is that it forces descent with respect of the primal objective function. The next proposition establishes some related facts.
Proposition 2
Suppose \(\varvec{\lambda }>{\mathbf {0}}\) and \({\mathbf {s}}>{\mathbf {0}}\), and W satisfies \(W\succ {\mathbf {0}}\) and \(W\succeq H\). If \(\varDelta {\mathbf {x}}^\text {a}\ne {\mathbf {0}}\), then the following inequalities hold:
Proof
When \(f({\mathbf {x}}+\alpha \varDelta {\mathbf {x}}^\text {a})\) is linear in \(\alpha \), i.e., when \((\varDelta {\mathbf {x}}^\text {a})^T H\varDelta {\mathbf {x}}^\text {a}=0\), then in view of (40), (44), (45) hold trivially. When, on the other hand, \((\varDelta {\mathbf {x}}^\text {a})^T H\varDelta {\mathbf {x}}^\text {a}>0\), \(f({\mathbf {x}}+\alpha \varDelta {\mathbf {x}}^\text {a})\) is quadratic and strictly convex in \(\alpha \) and is minimized at
where we have used (38), (37), and the fact that \(W\succeq H\), and (44), (45) again follow. Next, note that, since \(\omega >0\),
is quadratic and convex. Now, since \(\gamma _1\) satisfies the constraints in its definition (8), we see that \(\psi (\gamma _1)\le 0\), and since \(\omega \le 1\), it follows from (44) that \(\psi (0)= (\omega -1)(f({\mathbf {x}})-f({\mathbf {x}}+\varDelta {\mathbf {x}}^\text {a}))\le 0\). Since \(\gamma \in [0,\gamma _1]\) (see (7)), it follows that \(\psi (\gamma )\le 0\), i.e., since from (16) \(\varDelta {\mathbf {x}}=\varDelta {\mathbf {x}}^\text {a}+\gamma \varDelta {\mathbf {x}}^\text {c}\),
i.e.,
Now, for all \(\alpha \in [0,1]\), invoking (48), (39), and the fact that \(H\succeq {\mathbf {0}}\), we can write
proving (46). Finally, since \(\omega >0\), (47) is a direct consequence of (46) and (44). \(\square \)
In particular, in view of (23), (24), \(\{f({\mathbf {x}}^{k})\}\) is monotonic decreasing. Since the iterates are primal–feasible, an immediate consequence of Proposition 2, stated next, is that, under Assumption 1, the primal sequence is bounded.
Lemma 5
Suppose Assumption 1 holds. Then \(\{{\mathbf {x}}^k\}\) is bounded.
We are now ready to prove a key result, relating two successive iterates, that plays a central role in the remainder of the proof of Theorem 1.
Proposition 3
Suppose Assumptions 1 and 2 hold, and either \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) is bounded away from \({\mathcal {F}}^*\), or Assumption 3 also holds and \(\{{\mathbf {x}}^k\}\) converges to the unique primal solution \({\mathbf {x}}^*\). Let K be an infinite index set such that
Then \(\{\varDelta {\mathbf {x}}^k\}\rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\).
Proof
From Lemma 5, \(\{{\mathbf {x}}^k\}\) is bounded, and hence so is \(\{{\mathbf {s}}^k\}\); by construction, \({\mathbf {s}}^k\) and \(\varvec{\lambda }^k\) have positive components for all k, and \(\{\varvec{\lambda }^k\}\) ((26), (27)) and \(\{W_k\}\) are bounded. Further, for any infinite index set \(K^\prime \) such that (49) holds, (26) and (27) imply that all components of \(\{\varvec{\lambda }^k\}\) are bounded away from zero on \(K^\prime \). Since, in addition, \(Q_k\) can take no more than finitely many different (set) values, it follows that there exist \({\hat{{\mathbf {x}}}}\), \({\hat{\varvec{\lambda }}}>{\mathbf {0}}\), \({\hat{W}}\succeq {\mathbf {0}}\), an index set \({\hat{Q}}\subseteq {\mathbf {m}}\), and some infinite index set \({\hat{K}}\subseteq K^\prime \) such that
Next, under the stated assumptions, \(J({\hat{W}},A_{{\hat{Q}}}, {\hat{{\mathbf {s}}}}_{{\hat{Q}}},{\hat{\varvec{\lambda }}}_{{\hat{Q}}})\) is non-singular. Indeed, if \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) is bounded away from \({\mathcal {F}}^*\), then \(E({\mathbf {x}}^k,\varvec{\lambda }^k)\) is bounded away from zero and since \(H+R\succ {\mathbf {0}}\), \(W_k=H+\varrho _k R = H + \min \left\{ 1,\frac{E({\mathbf {x}}^k,\varvec{\lambda }^k)}{{\bar{E}}}\right\} R\) is bounded away from singularity and the claim follows from Assumption 2 and Lemma 1. On the other hand, if Assumption 3 also holds and \(\{{\mathbf {x}}^k\}\rightarrow {\mathbf {x}}^*\), then the claim follows from Condition CSR(ii) and Lemma 1. As a consequence of this claim, and by continuity of J, it follows from Newton-KKT systems (10) and (32) that there exist \(\varDelta {\hat{{\mathbf {x}}}}^\text {a}\), \(\varDelta {\hat{{\mathbf {x}}}}\), \({\bar{\varvec{\lambda }}}^{\text {a}}_{{\hat{Q}}}\), \({{\bar{\varvec{\lambda }}}}_{{\hat{Q}}}\) such that
The remainder of the proof proceeds by contradiction. Thus suppose that, for the infinite index set K in the statement of this lemma, \(\{\varDelta {\mathbf {x}}^k\}\not \rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\), i.e., for some \(K^{\prime \prime }\subseteq K\), \(\Vert \varDelta {\mathbf {x}}^k\Vert \) is bounded away from zero on \(K^{\prime \prime }\). Use \(K^{\prime \prime }\) as our \(K^\prime \) above, so that (since \({\hat{K}}\subseteq K^\prime \)), \(\Vert \varDelta {\mathbf {x}}^k\Vert \) is bounded away from zero on \({\hat{K}}\). Then, in view of Lemma 3 (w.l.o.g.),
In addition, we have \({\mathcal {A}}({\hat{{\mathbf {x}}}})\subseteq {\hat{Q}}\), an implication of Condition CSR(i) when \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) is bounded away from \({\mathcal {F}}^*\) and of Condition CSR(ii) when Assumption 3 holds and \(\{{\mathbf {x}}^k\}\) converges to \({\mathbf {x}}^*\). With these facts in hand, we next show that the sequence of primal step sizes \(\{\alpha _\text {p}^k\}\) is bounded away from zero for \(k\in {\hat{K}}\). To this end, let us define
so that, for all \(i\in {\mathbf {m}}\) and all k, \({{\tilde{\lambda }}}_i^{\prime ,k+1}>0\) if and only if \(\varDelta s_i^k<0\), and the primal portion of (24) can be written as
Clearly, it is now sufficient to show that, for all i, \(\{{\tilde{\lambda }}_i^{\prime ,k+1}\}\) is bounded above on \({\hat{K}}\). On the one hand, this is clearly so for \(i\not \in {\hat{Q}}\) (whence \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\)), in view of (58) and (54), since \(\{\varvec{\lambda }^k\}\) is bounded and \(\{s_i^k\}\) is bounded away from zero on \({\hat{K}}\) for \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\) (from (50)). On the other hand, in view of (52), subtracting (58) from (35) yields, for all \(k\in {\hat{K}}\),
From (56), \(\{{{\tilde{\varvec{\lambda }}}}^{k+1}_{{\hat{Q}}}\}\) is bounded on \({\hat{K}}\), and clearly the second term in the right-hand side of the above equation is non-positive component-wise. As for the third term, the second equation in (21) gives \((S_{Q_k}^{k})^{-1}\varDelta S_{Q_k}^{\text {a},k} = (\varLambda _{Q_k}^{k})^{-1} {\tilde{\varLambda }}_{Q_k}^{\text {a},k+1}\), so that we have
which is bounded on \({\hat{K}}\) since, from (51), (55), and the definition (34) of \(\{{\tilde{\lambda }}^{\text {a},+}\}\), both \(\{{\tilde{\varLambda }}_{{\hat{Q}}}^{\text {a},k+1}\}\) and \(\{\varDelta \varvec{\lambda }_{{\hat{Q}}}^{\text {a},k}\}\) are bounded, and from (51), \(\{\varvec{\lambda }^k_{{\hat{Q}}}\}\) is bounded away from zero on \({\hat{K}}\). Therefore, \(\{{\tilde{\lambda }}_i^{\prime ,k+1}\}\) is bounded above on \({\hat{K}}\) for \(i\in {\hat{Q}}\) as well, proving that \(\{{\alpha }_\text {p}^k\}\) is bounded away from zero on \({\hat{K}}\), i.e., that there exists \({\underline{\alpha }}>0\) such that \(\alpha _\text {p}^k>{\underline{\alpha }}\), for all \(k\in {\hat{K}}\), as claimed. Without loss of generality, choose \({\underline{\alpha }}\) in (0, 1).
Finally, we show that \(\{f({\mathbf {x}}^k)\}\rightarrow -\infty \) as \(k\rightarrow \infty \) on \({\hat{K}}\), which contradicts boundedness of \(\{{\mathbf {x}}^k\}\) (Lemma 5). For all \(k\in {\hat{K}}\), since \(\varDelta {\mathbf {x}}^{\text {a},k}\ne {\mathbf {0}}\) (by (57)) and \(\alpha _\text {p}^k\in ({\underline{\alpha }}, 1]\), Proposition 2 implies that \(\{f({\mathbf {x}}^k)\}\) is monotonically decreasing and that, for all \(k\in {\hat{K}}\),
Expanding the right-hand side yields
where the sum of the last two terms tends to a strictly negative limit as \(k\rightarrow \infty \), \(k\in {\hat{K}}\). Indeed, in view of (39), the second term is non-positive and (i) if \((\varDelta {\hat{{\mathbf {x}}}}^{\text {a}})^T H\varDelta {\hat{{\mathbf {x}}}}^{\text {a}}>0\), since \({{\underline{\alpha }}}>{{\underline{\alpha }}}^2/2\), from (53) and (57), the third term tends to a negative limit, and (ii) if \((\varDelta {\hat{{\mathbf {x}}}}^{\text {a}})^T H\varDelta {\hat{{\mathbf {x}}}}^{\text {a}}=0\) then the sum of the last two terms tends to \({{\underline{\alpha }}}\nabla f(\hat{{\mathbf {x}}})^T\varDelta {\hat{{\mathbf {x}}}}^{\text {a}}\) which is also strictly negative in view of (40), since we either have \({\hat{W}} \succ {\mathbf {0}}\) (in the case that \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) bounded away from \({\mathcal {F}}^*\)) or at least \([{\hat{W}} \,(A_{\hat{Q}})^T]\) full row rank (in the case that Assumption 3 holds and using the fact that \({\mathcal {A}}({\hat{{\mathbf {x}}}})\subseteq \hat{Q}\)). It follows that, for some \(\delta >0\), \(f({\mathbf {x}}^k+\alpha _\text {p}^k\varDelta {\mathbf {x}}^{\text {a},k})<f({\mathbf {x}}^k)-\delta \) for all \(k\in {\hat{K}}\) large enough. Proposition 2 (Eq. (46)) then gives that \(f({\mathbf {x}}^{k+1}):=f({\mathbf {x}}^k+\alpha _\text {p}^k\varDelta {\mathbf {x}}^k)<f({\mathbf {x}}^k)-\frac{\omega }{2}\delta \) for all \(k\in {\hat{K}}\) large enough, where \(\omega >0\) is an algorithm parameter. Since \(\{f({\mathbf {x}}^k)\}\) is monotonically decreasing, the proof is now complete. \(\square \)
We now conclude the proof of Theorem 1 via a string of eight lemmas, each of which builds on the previous one. First, on any subsequence, if \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\) tends to zero, then \(\{{\mathbf {x}}^k\}\) approaches stationary points. (Here both \(\{{{\tilde{\varvec{\lambda }}}}^{\text {a},k+1}\}\) and \(\{{{\tilde{\varvec{\lambda }}}}^{k+1}\}\) are as defined in (34).)
Lemma 6
Suppose that Assumption 1 holds and that \(\{{\mathbf {x}}^k\}\) converges to some limit point \({\hat{{\mathbf {x}}}}\) on an infinite index set K. If \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\) converges to zero on K, then (i) \({\hat{{\mathbf {x}}}}\) is stationary and
If, in addition, Assumption 2 holds, then (ii) \(\{{{\tilde{\varvec{\lambda }}}}^{\text {a},k+1}\}\) and \(\{{{\tilde{\varvec{\lambda }}}}^{k+1}\}\) converge on K to \({\hat{\varvec{\lambda }}}\), the unique multiplier associated with \({\hat{{\mathbf {x}}}}\).
Proof
Suppose \(\{{\mathbf {x}}^k\}\rightarrow {\hat{{\mathbf {x}}}}\) on K and \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\) on K. Let \({\mathbf {s}}^k:=A{\mathbf {x}}^k-{\mathbf {b}}(>{\mathbf {0}})\) for all \(k\in K\) and \(\hat{{\mathbf {s}}}:=A{\hat{{\mathbf {x}}}}-{\mathbf {b}}(\ge {\mathbf {0}})\), so that \(\{{\mathbf {s}}^k\}\rightarrow \hat{{\mathbf {s}}}\) on K. As a first step toward proving Claim (i), we show that, for any \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\), \(\{{{\tilde{\lambda }}}_i^{\text {a},k+1}\}\rightarrow 0\) on K. For \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\), since \(\hat{s}_i>0\), \(\{s_i^k\}\) is bounded away from zero on K. Since it follows from (34) and (36) that, for all k,
and since \(\{\lambda _i^k\}\) is bounded (by construction) and \(\varDelta {\mathbf {s}}^{\text {a},k}=A\varDelta {\mathbf {x}}^{\text {a},k}\) (by (21)), we have \(\{{{\tilde{\lambda }}}_i^{\text {a},k+1}\}\rightarrow 0\) on K. To complete the proof of Claim (i), note that the first equation of (10) (with H replaced by W) yields
Since (i) \(\{{\tilde{\lambda }}^{\text {a},k+1}_i\}\rightarrow 0\) on K for \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\), (ii) \(\{W_k\}\) is bounded (since \(H\preceq W_k\preceq H+R\)), (iii) \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\) on K, and (iv) by definition (34), \({\tilde{\lambda }}_i^{\text {a},+}=0\) for \(i\in Q^{\text {c}}\), we conclude that (59) holds, hence \(\{\left( A_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\right) ^T{{\tilde{\varvec{\lambda }}}}^{\text {a},k+1}_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\}\) converges (since \(\nabla f({\mathbf {x}}^k)\) does) as \(k\rightarrow \infty \), \(k\in K\), to a point in the range of \(\left( A_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\right) ^T\), say \(\left( A_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\right) ^T {\hat{\varvec{\lambda }}}_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\). We get \(\nabla f({\hat{{\mathbf {x}}}}) - \left( A_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}\right) ^T {\hat{\varvec{\lambda }}}_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}={\mathbf {0}}\), proving Claim (i). Finally, Claim (ii) follows from (59), Assumption 2, and the fact that for \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\), \(\{{{\tilde{\lambda }}}_i^{\text {a},k+1}\}\rightarrow 0\) as \(k\rightarrow \infty \), \(k\in K\), noting that the same argument applies to \(\{{\tilde{\varvec{\lambda }}}^{k+1}\}\), using a modified version of (59), with \({\tilde{\varvec{\lambda }}}\) replacing \({\tilde{\varvec{\lambda }}}^\text {a}\), obtained by starting from the first equation of (32) instead of that of (10) and using the fact, proved next, that \(\{{\tilde{\lambda }}_i^{k+1}\}\rightarrow 0\) on K for all \(i\not \in {\mathcal {A}}(\hat{{\mathbf {x}}})\). From its definition in (34) and the last equation in (33), we have that, for all k,
Since \(\{{{\tilde{\lambda }}}_i^{\text {a},k+1}\}\) converges (to zero) on K, \(\{\varDelta \lambda _i^{\text {a},k}\}\) is bounded on K. Furthermore, from its definition (7), (8) (see also (16)), \(\{\gamma _k\}\) is bounded and \(|\gamma _k\sigma _k\mu _{(Q_k)}^{(k)}|\le \tau \Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert \) for all k. Since \(\varDelta {\mathbf {s}}^{\text {a},k}=A\varDelta {\mathbf {x}}^{\text {a},k}\) and \(\varDelta {\mathbf {s}}^k=A\varDelta {\mathbf {x}}^k\), in view of Lemma 3, it follows that, for \(i\not \in {\mathcal {A}}({\hat{{\mathbf {x}}}})\), \(\{{{\tilde{\lambda }}}_i^{k+1}\}\rightarrow 0\) on K. \(\square \)
Lemma 6, combined with Proposition 3 via a contradiction argument, then implies that (on a subsequence), if \(\{{\mathbf {x}}^k\}\) does not approach \({\mathcal {F}}_P^*\), then \(\{\varDelta {\mathbf {x}}^k\}\) approaches zero.
Lemma 7
Suppose that Assumptions 1 and 2 hold and that \(\{{\mathbf {x}}^k\}\) is bounded away from \({\mathcal {F}}_P^*\) on some infinite index set K. Then \(\{\varDelta {\mathbf {x}}^k\}\rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\).
Proof
Proceeding by contradiction, let K be an infinite index set such that \(\{{\mathbf {x}}^k\}\) is bounded away from \({\mathcal {F}}_P^*\) on K and \(\{\varDelta {\mathbf {x}}^k\}\not \rightarrow {\mathbf {0}}\) as \(k\rightarrow \infty \), \(k\in K\). Then, in view of Proposition 3 and boundedness of \(\{{\mathbf {x}}^k\}\) (Lemma 5), there exist \({\hat{Q}}\subseteq {\mathbf {m}}\), \({\hat{{\mathbf {x}}}}\not \in {\mathcal {F}}_P^*\), and an infinite index set \({\hat{K}}\subseteq K\) such that \(Q_k = {\hat{Q}}\) for all \(k\in {\hat{K}}\) and
On the other hand, from (25), (16) and (7), (8),
which implies that \(\{{\mathbf {x}}^{k-1}\}\rightarrow {\hat{{\mathbf {x}}}}\) as \(k\rightarrow \infty \), \(k\in {\hat{K}}\). It then follows from Lemma 6 that \({\hat{{\mathbf {x}}}}\) is stationary and that \([{{\tilde{\varvec{\lambda }}}}^{\text {a},k}_{{\mathcal {A}}({\hat{{\mathbf {x}}}})}]_+\) converges to the associated multiplier vector. Hence the multipliers are non-negative, contradicting the fact that \({\hat{{\mathbf {x}}}}\not \in {\mathcal {F}}_P^*\). \(\square \)
A contradiction argument based on Lemmas 6 and 7 then shows that \(\{{\mathbf {x}}^k\}\) approaches the set of stationary points of (P).
Lemma 8
Suppose Assumptions 1 and 2 hold. Then the sequence \(\{{\mathbf {x}}^k\}\) approaches the set of stationary points of (P), i.e., there exists a sequence \(\{\hat{{\mathbf {x}}}^k\}\) of stationary points such that \(\Vert {\mathbf {x}}^k-\hat{{\mathbf {x}}}^k\Vert \) goes to zero as \(k\rightarrow \infty \).
Proof
Proceeding by contradiction, suppose the claim does not hold, i.e., (invoking Lemma 5) suppose \(\{{\mathbf {x}}^k\}\) converges to some non-stationary point \({\hat{{\mathbf {x}}}}\) on some infinite index set K. Then \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\) does not converge to zero on K (Lemma 6(i)) and nor does \(\{\varDelta {\mathbf {x}}^k\}\) (Lemma 3). Since \({\hat{{\mathbf {x}}}}\) is non-stationary, this is in contradiction with Lemma 7. \(\square \)
The next technical result, proved in [29, Lemma 3.6], invokes analogues of Lemmas 5, 7 and 8.
Lemma 9
Suppose Assumptions 1 and 2 hold. Suppose \(\{{\mathbf {x}}^k\}\) is bounded away from \({\mathcal {F}}_P^*\). Let \({\hat{{\mathbf {x}}}}\) and \({\hat{{\mathbf {x}}}}^{\prime }\) be limit points of \(\{{\mathbf {x}}^k\}\) and let \({\hat{\varvec{\lambda }}}\) and \({\hat{\varvec{\lambda }}}^\prime \) be the associated KKT multipliers. Then \({\hat{\varvec{\lambda }}} = {\hat{\varvec{\lambda }}}^\prime \).
Convergence of \(\{{\mathbf {x}}^k\}\) to \({\mathcal {F}}_P^*\) ensues, proving Claim (i) of Theorem 1.
Lemma 10
Suppose Assumptions 1 and 2 hold. Then \(\{{\mathbf {x}}^k\}\) converges to \({\mathcal {F}}_P^*\).
Proof
We proceed by contradiction. Thus suppose \(\{{\mathbf {x}}^k\}\) does not converge to \({\mathcal {F}}_P^*\). Then, since \(\{{\mathbf {x}}^k\}\) is bounded (Lemma 5), it has at least one limit point \(\hat{{\mathbf {x}}}\) that is not in \({\mathcal {F}}_P^*\), and since (by Proposition 2) \(\{f({\mathbf {x}}^k)\}\) is a bounded, monotonically decreasing sequence, \(f(\hat{{\mathbf {x}}})=\inf _{k}f({\mathbf {x}}^k)\). Then, by Lemmas 7 and 3, \(\{\varDelta {\mathbf {x}}^k\}\) and \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\) converge to zero as \(k\rightarrow \infty \). It follows from Lemmas 6 and 9 that all limit points of \(\{{\mathbf {x}}^k\}\) are stationary, and that both \(\{{\tilde{\varvec{\lambda }}}^{\text {a},k}\}\) and \(\{{\tilde{\varvec{\lambda }}}^k\}\) converge to \({\hat{\varvec{\lambda }}}\), the common KKT multiplier vector associated to all limit points of \(\{{\mathbf {x}}^k\}\). Since \({\hat{{\mathbf {x}}}}\not \in {\mathcal {F}}_P^*\), there exists \(i_0\) such that \({\hat{\lambda }}_{i_0}<0\), so that, for some \(\hat{k}>0\),
which, in view of Step 8 of the algorithm, implies that \(i_0\in Q_k\) for all \(k>{\hat{k}}\). Then (36) gives
where \(s_{i_0}^k>0\), \(\lambda _{i_0}^k>0\) by construction. Thus, in view of (60), \(\varDelta s_{i_0}^{\text {a},k}>0\) for all \(k>{\hat{k}}\). On the other hand, the last equation of (33) gives
where \(\gamma _k\ge 0\), \(\sigma _k\ge 0\), and \(\mu _{(Q_k)}^k\ge 0\) by construction. Further, for \(k>{\hat{k}}\), \(\varDelta \lambda _{i_0}^{\text {a},k}<0\) since \(\lambda _{i_0}^k>0\) and \({\tilde{\lambda }}_{i_0}^{\text {a},k+1}(=\lambda _{i_0}^k+\varDelta \lambda _{i_0}^{\text {a},k})<0\). It follows that all terms in (61) are non-negative and the first term is positive, so that \(\varDelta s_{i_0}^k>0\) for all \(k>k^\prime \). Moreover, for all \(k>{\hat{k}}\), we have \(s_{i_0}^{k+1} = s_{i_0}^{k} + \alpha _\text {p}^k \varDelta s_{i_0}^k> s_{i_0}^{k} > 0\), where \(\alpha _\text {p}^k>0\) since \({\mathbf {s}}^k > {\mathbf {0}}\). Since \(\{{\mathbf {s}}^k\}\) is bounded (Lemma 5), we then conclude that \(\{s_{i_0}^k\}\rightarrow \hat{s}_{i_0}>0\) so that \(\hat{s}_{i_0}\hat{\lambda }_{i_0}<0\), in contradiction with the stationarity of limit points. \(\square \)
Under strict complementarity, the next lemma then establishes appropriate convergence of the multipliers, setting the stage for the proof of part (ii) of Theorem 1 in the following lemma.
Lemma 11
Suppose Assumptions 1–3 hold and let \(({\mathbf {x}}^*,\varvec{\lambda }^*)\) be the unique primal-dual solution. Then, given any infinite index set K such that \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}_{k\in K}\rightarrow {\mathbf {0}}\), it holds that \(\{\varvec{\lambda }^{k+1}\}_{k\in K}\rightarrow \varvec{\xi }^*\), where \(\xi _i^*:=\min \{\lambda ^*_i,\lambda ^{\max }\}\), for all \(i\in {\mathbf {m}}\).
Proof
Lemma 10 guarantees that \(\{{\mathbf {x}}^k\}\rightarrow {\mathbf {x}}^*\). Let K be an infinite index set such that \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}_{k\in K}\rightarrow {\mathbf {0}}\). Then, in view of Lemma 6(ii), \(\{{\tilde{\varvec{\lambda }}}^{\text {a},k+1}\}_{k\in K}\rightarrow \varvec{\lambda }^*\ge {\mathbf {0}}\). Accordingly, \(\{\chi _k\}=\{\Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert ^\nu +\Vert [{\tilde{\varvec{\lambda }}}^{\text {a},k+1}_{Q_k}]_-\Vert ^\nu \}\rightarrow 0\) as \(k\rightarrow \infty \), \(k\in K\). Hence, in view of (26) and (27), the proof will be complete if we show that \(\{{\breve{\varvec{\lambda }}}^{k+1}\}_{k\in K}\rightarrow \varvec{\lambda }^*\), where
or equivalently, \(\{{\tilde{\varvec{\lambda }}}^{k+1}-{\breve{\varvec{\lambda }}}^{k+1}\}_{k\in K}\rightarrow {\mathbf {0}}\), which we do now.
For every \(Q\subseteq {\mathbf {m}}\), define the index set \(K(Q):=\{k\in K:Q_k=Q\}\), and let \({\mathcal {Q}}:=\{Q\subseteq {\mathbf {m}}:|K(Q)|=\infty \}\). We first show that for all \(Q\in {\mathcal {Q}}\), \(\{{\tilde{\varvec{\lambda }}}^{k+1}_{Q}-{\breve{\varvec{\lambda }}}^{k+1}_Q\}_{k\in K(Q)}\rightarrow {\mathbf {0}}\). For \(Q\in {\mathcal {Q}}\), the definition (34) of \({{\tilde{\varvec{\lambda }}}}^+\) yields
Since boundedness of \(\{\varvec{\lambda }^k_Q\}\) (by construction) and of \(\{{\tilde{\varvec{\lambda }}}^{k+1}_Q\}_{k\in K}\) (=\(\{\varvec{\lambda }^k_Q+\varDelta \varvec{\lambda }^k_{Q}\}_{k\in K}\)) (by Lemma 6(ii)) implies boundedness of \(\{\varDelta \varvec{\lambda }^k_{Q}\}_{k\in K}\), we only need \(\{\alpha _\text {d}^k\}_{k\in K(Q)}\rightarrow 1\) in order to guarantee that \(\Vert {\tilde{\varvec{\lambda }}}^{k+1}_{Q}-{\breve{\varvec{\lambda }}}^{k+1}_Q\Vert \rightarrow 0\) on K(Q). Now, \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}_{k\in K}\rightarrow {\mathbf {0}}\) implies that \(\{\varDelta {\mathbf {s}}^{\text {a},k}\}_{k\in K}\rightarrow {\mathbf {0}}\), and from Lemma 3 that \(\{\varDelta {\mathbf {x}}^k\}_{k\in K}\rightarrow {\mathbf {0}}\), implying that \(\{\varDelta {\mathbf {s}}^k\}_{k\in K}\rightarrow {\mathbf {0}}\); and \(\{{\mathbf {x}}^k\}\rightarrow {\mathbf {x}}^*\) yields \(\{{\mathbf {s}}^k\}\rightarrow {\mathbf {s}}^*:=A{\mathbf {x}}^*-{\mathbf {b}}\), so \(s_i^k+\varDelta s_i^k>0\) for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)^\mathrm{c}\), \(k\in K\) large enough. Moreover, Assumption 3 gives \(\lambda _i^*>0\) for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)\) so that, for sufficiently large \(k\in K\), \({\tilde{\lambda }}_i^{k+1}>0\) for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)\), and Condition CSR(ii) implies that \({\mathcal {A}}({\mathbf {x}}^*)\subseteq Q\), so Lemma 4 applies, with \({\mathcal {A}}:={\mathcal {A}}({\mathbf {x}}^*)\). It follows that \(\{{\bar{\alpha }}_\text {d}^k\}_{k\in K}\rightarrow 1\), since all terms on the right-hand side of (41) converge to one on K. Thus, from the definition of \(\alpha _\text {d}^k\) in (24) and the fact that \(\{\varDelta {\mathbf {x}}^k\}_{k\in K}\rightarrow {\mathbf {0}}\), we have \(\{\alpha _\text {d}^k\}_{k\in K}\rightarrow 1\) indeed, establishing that \(\{{\tilde{\varvec{\lambda }}}^{k+1}_{Q}-{\breve{\varvec{\lambda }}}^{k+1}_Q\}_{k\in K(Q)}\rightarrow {\mathbf {0}}\).
It remains to show that, for all \(Q\in {\mathcal {Q}}\), \(\{{\tilde{\varvec{\lambda }}}^{k+1}_{Q^{\text {c}}}-{\breve{\varvec{\lambda }}}^{k+1}_{Q^{\text {c}}}\}_{k\in K(Q)}\rightarrow {\mathbf {0}}\). To show this, we first note that, since \(\{\chi _k\}_{k\in K}\rightarrow 0\), it follows from (26) and (27) and the fact established above that \(\{{\breve{\varvec{\lambda }}}^{k+1}_Q\}_{K(Q)}\rightarrow \varvec{\lambda }_Q^*\) that, for all \(Q\in {\mathcal {Q}}\),
Next, from (26), (27), and the definition (34) of \({{\tilde{\varvec{\lambda }}}}^+\), we have, for \(Q\in {\mathcal {Q}}\) and sufficiently large \(k\in K(Q)\),
Clearly, since \({\mathcal {A}}({\mathbf {x}}^*)\subseteq Q\), we have \(s_i^*>0\) for \(i\in Q^{\text {c}}\). Hence, since \(\{\varDelta {\mathbf {s}}^k\}_{k\in K}\rightarrow {\mathbf {0}}\), \(\{s_i^{k+1}\}\) is bounded away from zero on K for \(i\in Q^{\text {c}}\). When Q is empty, the right-hand side of (63) is set to zero [see definition (14) of \(\mu _{(Q)}\)]. When Q is not empty, since \(\xi _i^*=0\) whenever \(\lambda _i^*=0\), (62) and complementary slackness gives
and it follows from (63) that \(\{{\tilde{\varvec{\lambda }}}^{k+1}_{Q^{\text {c}}}-{\breve{\varvec{\lambda }}}^{k+1}_{Q^{\text {c}}}\}_{k\in K(Q)}\rightarrow {\mathbf {0}}\), completing the proof. \(\square \)
Claim (ii) of Theorem 1 can now be proved.
Lemma 12
Suppose Assumptions 1–3 hold and let \(({\mathbf {x}}^*,\varvec{\lambda }^*)\) be the unique primal-dual solution. Then \(\{{\tilde{\varvec{\lambda }}}^k\}\rightarrow \varvec{\lambda }^*\) and \(\{\varvec{\lambda }^{k}\}\rightarrow \varvec{\xi }^*\), with \(\xi _i^*:=\min \{\lambda ^*_i,\lambda ^{\max }\}\) for all \(i\in {\mathbf {m}}\).
Proof
Again, Lemma 10 guarantees that \(\{{\mathbf {x}}^k\}\rightarrow {\mathbf {x}}^*\) and \(\{{\mathbf {s}}^k\}\rightarrow {\mathbf {s}}^*:=A{\mathbf {x}}^*-{\mathbf {b}}\). Note that if \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\), the claims are immediate consequences of Lemmas 6 and 11. We now prove by contradiction that \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\). Thus, suppose that for some infinite index set K, \(\inf _{k\in K}\Vert \varDelta {\mathbf {x}}^{\text {a},k}\Vert >0\). Then, Lemma 3 gives \(\inf _{k\in K}\Vert \varDelta {\mathbf {x}}^k\Vert >0\). It follows from Proposition 3 that, on some infinite index set \(K^\prime \subseteq K\), \(\{\varDelta {\mathbf {x}}^{\text {a},k-1}\}\rightarrow {\mathbf {0}}\) and \(\{[{\tilde{\varvec{\lambda }}}^{\text {a},k}]_-\}\rightarrow {\mathbf {0}}\). Since \(Q_k\) is selected from a finite set and \(\{W_k\}\) is bounded, we can assume without loss of generality that \(Q_k=Q\) on \(K^\prime \) for some \(Q \subseteq {\mathbf {m}}\), and that \(\{W_k\}\rightarrow W^*\succeq H\) on \(K^\prime \). Further, from Lemma 11, \(\{\varvec{\lambda }^k\}_{k\in K^\prime }\rightarrow \varvec{\xi }^*\). Therefore, \(\{J(W_k,A_{Q_k},{\mathbf {s}}_{Q_k},\varvec{\lambda }_{Q_k})\}_{k\in K^\prime } \rightarrow J(W^*,A_{Q},{\mathbf {s}}_Q^*,\varvec{\xi }_Q^*)\), and in view of Assumptions 2 and 3 and Lemma 1, \(J(W^*,A_{Q},{\mathbf {s}}_Q^*,\varvec{\xi }_Q^*)\) is non-singular (since \(({\mathbf {x}}^*,\varvec{\lambda }^*)\) is optimal). It follows from (10), with W substituted for H, that \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\) on \(K^{\prime }\), a contradiction, proving that \(\{\varDelta {\mathbf {x}}^{\text {a},k}\}\rightarrow {\mathbf {0}}\). \(\square \)
Claim (iv) of Theorem 1 follows as well.
Lemma 13
Suppose Assumptions 1 and 2 hold and \(\varepsilon >0\). Then Algorithm CR-MPC terminates (in Step 1) after finitely many iterations.
Proof
If \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) has a limit point in \({\mathcal {F}}^*\), then \(\inf _k\{E_k\}=0\), proving the claim. Thus, suppose that \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\) is bounded away from \({\mathcal {F}}^*\). In view of Lemmas 5 and 10, \(\{{\mathbf {x}}^k\}\) has a limit point \({\mathbf {x}}^*\in {\mathcal {F}}_P^*\). Assumption 2 then implies that there exists a unique KKT multiplier vector \(\varvec{\lambda }^*\ge {\mathbf {0}}\) associated to \({\mathbf {x}}^*\). If \(({\mathbf {x}}^*,\varvec{\lambda }^*)\in {\mathcal {F}}^*\) is a limit point of \(\{({\mathbf {x}}^k, {\tilde{\varvec{\lambda }}}^k)\}\), which also implies that \(\inf _k\{E({\mathbf {x}}^k, {\tilde{\varvec{\lambda }}}^k)\}=0\), then in view of the stopping criterion, the claim again follows. Thus, further suppose that there is an infinite index set K such that \(\{{\mathbf {x}}^k\}_{k\in K}\rightarrow {\mathbf {x}}^*\), but \(\inf _{k\in K}\Vert {\tilde{\varvec{\lambda }}}^k-\varvec{\lambda }^*\Vert >0\). It then follows from Lemma 6(ii) that \(\{\varDelta {\mathbf {x}}^{\text {a},k-1}\}_{k\in K}\not \rightarrow {\mathbf {0}}\), and from Lemma 3 that \(\{\varDelta {\mathbf {x}}^{k-1}\}_{k\in K}\not \rightarrow {\mathbf {0}}\). Proposition 3 and Lemma 3 then imply that \(\{\varDelta {\mathbf {x}}^{\text {a},k-2}\}_{k\in K^\prime }\rightarrow {\mathbf {0}}\) and \(\{\varDelta {\mathbf {x}}^{k-2}\}_{k\in K^\prime }\rightarrow {\mathbf {0}}\) for some infinite index set \(K^\prime \subseteq K\). Next, from Lemmas 5 and 10, we have \(\{{\mathbf {x}}^{k-2}\}_{k\in K^{\prime \prime }}\rightarrow {\mathbf {x}}^{**}\in {\mathcal {F}}_P^*\) for some infinite index set \(K^{\prime \prime }\subseteq K^\prime \), and in view of Lemma 6(ii) \(\{{{\tilde{\varvec{\lambda }}}}^{k-1}\}_{k\in K^{\prime \prime }}\rightarrow \varvec{\lambda }^{**}\), where \(\varvec{\lambda }^{**}\) is the KKT multiplier associated to \({\mathbf {x}}^{**}\). Since \(\alpha _\text {p}^{k}\in [0,1]\) for all k, we also have \(\{{\mathbf {x}}^{k-1}\}_{k\in K^{\prime \prime }} =\{{\mathbf {x}}^{k-2}+\alpha _\text {p}^{k-2}\varDelta {\mathbf {x}}^{k-2}\}_{k\in K^{\prime \prime }}\rightarrow {\mathbf {x}}^{**}\), i.e., \(\{({\mathbf {x}}^{k-1},{{\tilde{\varvec{\lambda }}}}^{k-1})\}_{k\in K^{\prime \prime }}\rightarrow ({\mathbf {x}}^{**},\varvec{\lambda }^{**})\in {\mathcal {F}}^*\), completing the proof. \(\square \)
Proof of Theorem 1
Claim (i) was proved in Lemma 10 and Claim (ii) in Lemma 12, Claim (iii) is a direct consequence of Condition CSR(ii), and Claim (iv) was proved in Lemma 13. \(\square \)
Proof of Corollary 2
From Theorem 1, \(\{({\mathbf {x}}^k,\varvec{\lambda }^k)\}\rightarrow ({\mathbf {x}}^*,\varvec{\lambda }^*)\), i.e., \(\{E_k\}\rightarrow 0\). It follows that (i) in view of Proposition 1 and Condition CSR(ii) \(Q_k\supseteq {\mathcal {A}}(x^*)\) for all k large enough, and (ii) in view of Rule R, \(\{\delta _k\}\rightarrow 0\), so that \(Q_k\) eventually excludes all indexes that are not in \({\mathcal {A}}({\mathbf {x}}^*)\). \(\square \)
B Proof of Theorem 2
Parts of this proof are adapted from [16, 29, 34]. Throughout, we assume that Assumption 3 holds (so that Assumption 1 also holds), that \(\varepsilon =0\) and that the iteration never stops, and that \(\lambda ^*_i<\lambda ^{\max }\) for all i.
Newton’s method plays the central role in the local analysis. The following lemma is standard or readily proved; see, e.g., [29, Proposition 3.10].
Lemma 14
Let \(\varPhi :{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{n}\) be twice continuously differentiable and let \({\mathbf {t}}^*\in {\mathbb {R}}^n\) such that \(\varPhi ({\mathbf {t}}^*)={\mathbf {0}}\). Suppose there exists \(\rho >0\) such that \(\frac{\partial \varPhi }{\partial {\mathbf {t}}}({\mathbf {t}})\) is non-singular for all \({\mathbf {t}}\in B({\mathbf {t}}^*,\rho )\). Define \(\varDelta ^{\mathrm{N}}{\mathbf {t}}\) to be the Newton increment at \({\mathbf {t}}\), i.e., \(\varDelta ^\mathrm{N}{\mathbf {t}}=-\left( \frac{\partial \varPhi }{\partial {\mathbf {t}}}({\mathbf {t}})\right) ^{-1}\varPhi ({\mathbf {t}})\). Then, given any \(c>0\), there exists \(c^*>0\) such that, for all \({\mathbf {t}}\in B({\mathbf {t}}^*,\rho )\), if \({\mathbf {t}}^+\in {\mathbb {R}}^n\) satisfies
then
For convenience, define \({\mathbf {z}}:=({\mathbf {x}}, \varvec{\lambda })\) (as well as \({\mathbf {z}}^*:=({\mathbf {x}}^*,\varvec{\lambda }^*)\), etc.). For \({\mathbf {z}}\in {\mathcal {F}}^o:=\{{\mathbf {z}}:{\mathbf {x}}\in {\mathcal {F}}_P^o,\,\varvec{\lambda }>{\mathbf {0}}\}\), define
The gist of the remainder of this appendix is to apply Lemma 14 to
(Note that \(\varPhi _Q({\mathbf {z}}^*)={\mathbf {0}}\).) Let \({\mathbf {z}}_Q:=({\mathbf {x}}, \varvec{\lambda }_Q)\), then the step taken on the Q components along the search direction generated by the Algorithm CR-MPC is analogously given by \({\breve{{\mathbf {z}}}}^+_Q : = ({\mathbf {x}}^+,{\breve{\varvec{\lambda }}}_Q^+)\) with \({\breve{\varvec{\lambda }}}_Q^+:=\varvec{\lambda }_Q + \alpha _\text {d}\varDelta \varvec{\lambda }_Q\). The first major step of the proof is achieved by Proposition 4 below, where the focus is on \({\breve{{\mathbf {z}}}}^+_Q\) rather than on \({\mathbf {z}}^+\). Thus we compare \({\breve{{\mathbf {z}}}}^+_Q\), with \(Q\in {\mathcal {Q}}^*\) to the Q components of the (unregularized) Newton step, i.e., \({\mathbf {z}}_Q+(\varDelta ^\mathrm{N}{\mathbf {z}})_Q\). Define
The difference between the CR-MPC iteration and the Newton iteration can be written as
where \(\varDelta {\mathbf {z}}_Q:=(\varDelta {\mathbf {x}},\varDelta \varvec{\lambda }_Q)\), \(\varDelta {\mathbf {z}}^{\text {a}}_Q:=(\varDelta {\mathbf {x}}^{\text {a}},\varDelta \varvec{\lambda }^\text {a}_Q)\), \(\varDelta {\mathbf {z}}^\text {c}_Q:=(\varDelta {\mathbf {x}}^{\text {c}},\varDelta \varvec{\lambda }^\text {c}_Q)\), and \(\varDelta {\mathbf {z}}^0_Q\) is the (constraint-reduced) affine-scaling direction for the original (unregularized) system (so \(\varDelta ^{\mathrm{N}}{\mathbf {z}}=\varDelta {\mathbf {z}}^0_{\mathbf {m}}\)).
Let
The following readily proved lemma will be of help. (For details, see Lemmas B.15 and B.16 in [16]; also Lemmas 13 and 1 in [28])
Lemma 15
Let \({\mathbf {s}},\varvec{\lambda }\in {\mathbb {R}}^m\) and \(Q\subseteq {\mathbf {m}}\) be arbitrary and let W be symmetric, with \(W\succeq H\). Then (i) \(J_a(W,A_Q,{\mathbf {s}}_Q,\varvec{\lambda }_Q)\) is non-singular if and only if \(J(W,A_Q,{\mathbf {s}}_Q,\varvec{\lambda }_Q)\) is, and (ii) if \({\mathcal {A}}({\mathbf {x}}^*)\subseteq Q\), then \(J(W,A_Q,{\mathbf {s}}_Q^*,\varvec{\lambda }_Q^*)\) is non-singular (and so is \(J_a(W,A_Q,{\mathbf {s}}_Q^*,\varvec{\lambda }_Q^*)\)).
With \({\mathbf {s}}:=A{\mathbf {x}}-{\mathbf {b}}\), \(J_a(H,A_Q,{\mathbf {s}}_Q,\varvec{\lambda }_Q)\), the system matrix for the (constraint-reduced) original (unregularized) “augmented” system, is the Jacobian of \(\varPhi _Q({\mathbf {z}})\), i.e.,
and its regularized version \(J_a(W({\mathbf {z}}),A_Q,{\mathbf {s}}_Q,\varvec{\lambda }_Q)\) satisfies (among other systems solved by Algorithm CR-MPC)
Next, we verify that \(J_a(W({\mathbf {z}}),A_Q,{\mathbf {s}}_Q,\varvec{\lambda }_Q)\) is non-singular near \({\mathbf {z}}^*\) (so that \(\varDelta {\mathbf {z}}^0_Q\) and \(\varDelta {\mathbf {z}}^{\text {a}}_Q\) in (65) are well defined) and establish other useful local properties. For convenience, we define
and
Lemma 16
Let \(\epsilon ^*:=\min \{1,\min _{i\in {\mathbf {m}}}(\lambda _i^*+s_i^*)\}\). There exist \(\rho ^*>0\) and \(r>0\), such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\) and all \(Q\in {\mathcal {Q}}^*\), the following hold:
-
(i)
\(\Vert J_a(W({\mathbf {z}}),A_Q,\varvec{\lambda }_Q,{\mathbf {s}}_Q)^{-1}\Vert \le r\),
-
(ii)
\(\max \{\Vert \varDelta {\mathbf {z}}^{\text {a}}_Q\Vert , \Vert \varDelta {\mathbf {z}}_Q\Vert , \Vert \varDelta {\mathbf {s}}^{\text {a}}_Q\Vert , \Vert \varDelta {\mathbf {s}}_Q\Vert \}<\epsilon ^*/4\),
-
(iii)
\(\min \{\lambda _i, {\tilde{\lambda }}_i^{\text {a},+} , {\tilde{\lambda }}_i^+ \}>\epsilon ^*/2,\,\forall i\in {\mathcal {A}}({\mathbf {x}}^*)\),
\(\max \{ \lambda _i , {\tilde{\lambda }}_i^{\text {a},+} , {\tilde{\lambda }}_i^+ \}<\epsilon ^*/2,\,\forall i\in {\mathbf {m}}\setminus {\mathcal {A}}({\mathbf {x}}^*)\),
\(\max \{ s_i , \tilde{s}_i^{\text {a},+} , \tilde{s}_i^+ \}<\epsilon ^*/2,\,\forall i\in {\mathcal {A}}({\mathbf {x}}^*)\),
\(\min \{ s_i , \tilde{s}_i^{\text {a},+} , \tilde{s}_i^+ \}>\epsilon ^*/2,\,\forall i\in {\mathbf {m}}\setminus {\mathcal {A}}({\mathbf {x}}^*)\).
-
(iv)
\({{\tilde{\lambda }}}_i^+ < \lambda ^{\max },\,\forall i\in {\mathbf {m}}\).
Proof
Claim (i) follows from Lemma 15, continuity of \(J_a(W({\mathbf {z}}),A_Q,\varvec{\lambda }_Q,{\mathbf {s}}_Q)\) (and the fact that \(W({\mathbf {z}}^*)=H\)). Claims (ii) and (iv) follow from Claim (i), Lemma 15, continuity of the right-hand sides of (10) and (32), which are zero at the solution, definition (34) of \({{\tilde{\varvec{\lambda }}}}^+\), and our assumption that \(\lambda ^*_i < \lambda ^{\max }\) for all \(i\in {\mathbf {m}}\). Claim (iii) is true due to strict complementary slackness, the definition of \(\epsilon ^*\), and Claim (ii). \(\square \)
In preparation for Proposition 4, Lemmas 17–20 provide bounds on the four terms in the last line of (65). The \(\rho ^*\) used in these lemmas comes from Lemma 16. The proofs of Lemmas 17, 18, and 20 are omitted, as they are very similar to those of Lemmas A.9 and A.10 in the supplementary materials of [34] (where an MPC algorithm for linear optimization problems is considered) and of Lemma B.19 in [16] (also Lemma 16 in [28]).
Lemma 17
There exists a constant \(c_1>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\), and for all \(Q\in {\mathcal {Q}}^*\),
Note that an upper bound on the magnitude of the MPC search direction \(\varDelta {\mathbf {z}}_Q\) can be obtained by using Lemmas 17 and 16(ii), viz.
This bound is used in the proofs of Lemma 18 and Proposition 4.
Lemma 18
There exists a constant \(c_2>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\), and for all \(Q\in {\mathcal {Q}}^*\),
Lemma 19
There exists a constant \(c_3>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\) and all \(Q\in {\mathcal {Q}}^*\),
Proof
We have
so that there exist \(c_{31}>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\) and all \(Q\in {\mathcal {Q}}^*\),
where the second inequality follows from Lemma 16(i). Since \(W({\mathbf {z}})-H=\varrho ({\mathbf {z}})R\), \(|\varrho ({\mathbf {z}})|\le c_{32} |E({\mathbf {z}})|\), and \(|E({\mathbf {z}})|\le c_{33}\Vert {\mathbf {z}}-{\mathbf {z}}^*\Vert \), for some \(c_{32}>0\) and \(c_{33}>0\), the proof is complete. \(\square \)
Lemma 20
There exists a constant \(c_4>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\), and for all \(Q\in {\mathcal {Q}}^*\),
With Lemmas 17–20 in hand, we return to inequality (65).
Proposition 4
There exists a constant \(c_5>0\) such that, for all \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\), and for all \(Q\in {\mathcal {Q}}^*\),
Proof
Let \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\) and \(Q\in {\mathcal {Q}}^*\). It follows from (65), Lemmas 17–20, and (66) that
Also, by Lemmas 19 and 20, we have
The claim follows (in view of boundedness of \({\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\)). \(\square \)
With Proposition 4 established, we proceed to the second major step of the proof of Theorem 2: to show that (67) still holds when \({\mathbf {z}}^+\) is substituted for \({\breve{{\mathbf {z}}}}^+_Q\).
Proof of Theorem 2
Again, let \(\rho ^*\) be as given in Lemma 16. Let \({\mathbf {z}}\in {\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\) and \(Q\in {\mathcal {Q}}^*\). Let \(\rho :=\rho ^*\), \({\mathbf {t}}:={\mathbf {z}}\), and \({\mathbf {t}}^*:={\mathbf {z}}^*\). Then the desired q-quadratic convergence is a direct consequence of Lemma 14, provided that the condition (64) is satisfied. Hence, we now show that there exists some constant \(c>0\) such that, for each \(i\in {\mathbf {m}}\),
As per Proposition 4, (69) holds for \(i\in Q\) with \(z_i^+\) replaced with \({\breve{\text {z}}}_i^+\). In particular, (69) holds for the \({\mathbf {x}}^+\) components of \({\mathbf {z}}^+\). It remains to show that (69) holds for the \(\varvec{\lambda }^+\) components of \({\mathbf {z}}^+\). Firstly, for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)\), we show that \(\lambda _i^+={\breve{\lambda }}_i^+\), thus (69) holds for all \(\lambda ^+_i\) such that \(i\in {\mathcal {A}}({\mathbf {x}}^*)\) by Proposition 4. From the fact that \(\varvec{\lambda }>{\mathbf {0}}\) (\({\mathbf {z}}\in {\mathcal {F}}^o\)) and Lemma 16(ii), and since \(\nu \ge 2\), it follows that
so that \(\min \{\chi ,{{\underline{\lambda }}}\}\le \epsilon ^*/2\). Also, from Lemma 16(iii) and the fact that \({\breve{\varvec{\lambda }}}_Q^+\) is a convex combination of \(\varvec{\lambda }_Q\) and \({\tilde{\varvec{\lambda }}}_Q^+\), we have, for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)\),
Hence, from (70), (71), Lemma 16(iv), and (26), we conclude that \(\lambda _i^+={\breve{\lambda }}_i^+\) for all \(i\in {\mathcal {A}}({\mathbf {x}}^*)\). Secondly, we prove that there exists \(d_1>0\) such that
thus establishing (69) for \(\lambda _i^+\) with \(i\in Q\setminus {\mathcal {A}}({\mathbf {x}}^*)\). For \(i\in Q\setminus {\mathcal {A}}({\mathbf {x}}^*)\), we know from (26) that, either \(\lambda _i^+=\min \{\lambda ^{\max }, {\breve{\lambda }}_i^+\}\), or \(\lambda _i^+ = \min \{{\underline{\lambda }},\Vert \varDelta {\mathbf {x}}^\text {a}\Vert ^\nu +\Vert [{\tilde{\varvec{\lambda }}}^{\text {a},+}_Q]_-\Vert ^\nu \}\). In the former case, we have
for some \(d_2>0\), \(d_3>0\). Here the last inequality follows from Proposition 4 and the quadratic rate of the Newton step given in Lemma 14. In the latter case, since \(\varvec{\lambda }>{\mathbf {0}}\), we obtain
for some \(d_4>0\). Here the equality is from the definition of \(\varDelta {\mathbf {z}}^{\text {a}}\) and the last inequality follows from \(\nu \ge 2\), (68), and boundedness of \({\mathcal {F}}^o\cap B({\mathbf {z}}^*,\rho ^*)\). Hence, we have established (72). Thirdly and finally, consider the case that \(i\in Q^{\text {c}}\). Since \({\mathcal {A}}({\mathbf {x}}^*)\subseteq Q\), \(\varvec{\lambda }_{Q^{\text {c}}}^*={\mathbf {0}}\) and it follows from (27) that, either \(\lambda _i^+=\min \{\lambda ^{\max }, {\mu _{(Q)}^+}/{s_i^+}\}\), or \(\lambda _i^+ = \min \{{\underline{\lambda }},\Vert \varDelta {\mathbf {x}}^\text {a}\Vert ^\nu +\Vert [{\tilde{\varvec{\lambda }}}^\text {a}_Q]_-\Vert ^\nu \}\). In the latter case, the bound in (73) follows. In the former case, we have
By definition, \(s_i^+:=s_i+\alpha _\text {p}\varDelta s_i\) is a convex combination of \(s_i\) and \(\tilde{s}_i^+\). Thus, Lemma 16(iii) gives that \(s_i^+\ge \min \{s_i,\tilde{s}_i^+\}>\epsilon ^*/2\). Then using the definition of \(\mu _{(Q)}^+\) (see Step 10 of Algorithm CR-MPC) leads to
Since \({\mathbf {z}}\in B({\mathbf {z}}^*,\rho ^*)\), \(\varvec{\lambda }_{{\mathcal {A}}({\mathbf {x}}^*)}^+\) and \({\mathbf {s}}_{Q\setminus {\mathcal {A}}({\mathbf {x}}^*)}^+\) are bounded by Lemma 16(ii). Also, by definition, \({\mathbf {s}}_{{\mathcal {A}}({\mathbf {x}}^*)}^*={\mathbf {0}}\). Thus there exist \(d_5>0\) and \(d_6>0\) such that
Having already established that the second term is bounded by the right-hand side of (72), and we are left to prove that the first term also is. By definition,
Applying Proposition 4 and Lemma 14, we get
for some \(d_7>0\), \(d_8>0\). Hence, we established (69) for all \(i\in {\mathbf {m}}\), thus proving the q-quadratic convergence rate. \(\square \)
Rights and permissions
About this article
Cite this article
Laiu, M.P., Tits, A.L. A constraint-reduced MPC algorithm for convex quadratic programming, with a modified active set identification scheme. Comput Optim Appl 72, 727–768 (2019). https://doi.org/10.1007/s10589-019-00058-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00058-0