Eigenvaluebased algorithm and analysis for nonconvex QCQP with one constraint
 813 Downloads
Abstract
A nonconvex quadratically constrained quadratic programming (QCQP) with one constraint is usually solved via a dual SDP problem, or Moré’s algorithm based on iteratively solving linear systems. In this work we introduce an algorithm for QCQP that requires finding just one eigenpair of a generalized eigenvalue problem, and involves no outer iterations other than the (usually blackbox) iterations for computing the eigenpair. Numerical experiments illustrate the efficiency and accuracy of our algorithm. We also analyze the QCQP solution extensively, including difficult cases, and show that the canonical form of a matrix pair gives a complete classification of the QCQP in terms of boundedness and attainability, and explain how to obtain a global solution whenever it exists.
Keywords
QCQP Generalized eigenvalue problem Canonical form for symmetric matrix pairMathematics Subject Classification
49M37 65K05 90C20 90C301 Introduction
The main contribution of this paper is the development of an efficient algorithm for QCQP (2) that is strictly feasible and (A, B) is a definite pair with \(A+\lambda B\succ 0\) for some \(\lambda \ge 0\) (which we call definite feasible), which we argue is a generic condition for QCQP to be bounded.^{1} The running time is \(O(n^3)\) when the matrices A, B are dense, and it can be significantly faster if the matrices are sparse. The algorithm requires (i) finding a \(\hat{\lambda }\ge 0\) such that \(A+\hat{\lambda }B\) is positive definite, and (ii) computing an extremal eigenpair of an \((2n+1)\times (2n+1)\) generalized eigenvalue problem. We emphasize that the algorithm requires just one eigenvalue problem. The algorithm is easy to implement given a routine for computing an extremal (largest or smallest) eigenpair, for which highquality software based on shiftinvert Arnoldi is publically available such as ARPACK [2, 22]. We present experiments that illustrate the efficiency and accuracy of our algorithm compared with the SDPbased approach. Our algorithm is based on the framework established in [1, 11, 17] of formulating the KKT conditions as an eigenvalue problem.
In addition, this paper also contributes to the theoretical understanding of QCQP, treating those that are not definite feasible. Specifically, it is a nontrivial problem to decide whether a given QCQP is bounded or not, and if bounded, whether the infimum is attainable. We present a classification of QCQP in terms of feasibility/boundedness/attainability, based on the canonical form of the symmetric pair (A, B) under congruence. We shall see that the canonical form provides rich information on the properties of the associated QCQP. We thus establish a process that (in exact arithmetic) solves QCQP completely in the sense that feasibility/boundedness/attainability is checked and the optimal objective value and a global solution is computed if it exists.
Broadly speaking, this paper is a contribution in the direction of “global optimization via eigenvalues”. To our knowledge the earliest reference is Gander, Golub and von Matt [11] for TRS. This algorithm was revisited and further developed recently in [1] to illustrate its high efficiency, which was extended in [34] to deal with an additional linear constraint. This paper is largely an outgrowth of [1], extending the scope from TRS to QCQP, relaxing the convexity assumption in the constraint, and fully analyzing the degenerate cases. We also note [33], which solves QCQP with an additional ball constraint (generalized CDT problem; GCDT) via a twoparameter eigenvalue problem. It is in principle possible to impose a ball constraint with sufficiently large radius to convert QCQP (2) to GCDT, then use the algorithm in [33]. However, this would be very inefficient, requiring \(O(n^6)\) operations: our algorithm here needs at most \(O(n^3)\) operations, and can be faster when sparsity structure is present.
This paper is organized as follows. In Sect. 2 we review (mostly existing) results on the optimality and boundedness of QCQP (2). Section. 3 is the heart of this paper where we derive our eigenvaluebased algorithm for definite feasible QCQP. We present numerical experiments in Sect. 4, and analyze QCQP that are not definite feasible in Sect. 5.
Notation. We denote by \({\mathcal {R}}(X)\) the range of a matrix X, and by \({\mathcal {N}}(X)\) the null space. \(X\succ (\succeq ) 0\) indicates X is a positive (semi)definite matrix. \(I_n\) is the \(n\times n\) identity, and \(O_n,O_{m\times n}\) are zero matrices of size \(n\times n\) and \(m\times n\). We simply write I, O if the dimensions are clear from the context. The MoorePenrose pseudoinverse of a matrix A is denoted by \(A^\dagger \). \(x_*\) denotes a QCQP solution with associated Lagrange multiplier \(\lambda _*\).
2 Preliminaries: optimality and boundedness of QCQP
This section collects results on QCQP that are needed for our analysis and algorithm.
2.1 QCQP with no interior feasible point
QCQP (2) has no strictly feasible point when Slater’s condition is violated. Note that checking strict feasibility can be done by an unconstrained quadratic minimization problem \({{\mathrm{minimize}}}_x g(x)\). This subsection focuses on the case \(\min _x g(x)=0\).
2.2 Boundedness and attainability
We start with a necessary and sufficient condition for boundedness of a strictly feasible QCQP.
Lemma 1
Proof
This is essentially a corollary of strong duality between QCQP (2) and SDP (3), which is bounded below if and only if there exist \(\lambda \ge 0\) and \(\gamma \) satisfying the first constraint in (3), which is equivalent to (5). \(\square \)
Lemma 2
A reasonable output of a numerical algorithm for such QCQP is the infimum objective value 0 with the warning that it is unattainable.
2.3 Optimality conditions
When QCQP (2) satisfies Slater’s condition and has a global solution, a set of necessary and sufficient conditions is given by Moré [24]:
Theorem 1
The first three conditions in (8) represent the KKT conditions, and there can be many KKT points \((\lambda ,x)\) satisfying these three, reflecting the nonconvexity of the problem. The final condition \(A+\lambda _* B\succeq 0\) specifies which of the KKT points is the solution.
2.4 Definite feasible QCQP: strictly feasible and definite
By Lemma 1, for a strictly feasible QCQP to be bounded we necessarily need \(A+\hat{\lambda }B\succeq 0\) for some \(\hat{\lambda }\ge 0\). If we further have \(A+\hat{\lambda }B\succ 0\), then the QCQP is clearly bounded. As we argue in Sect. 5, such cases form a “generic” class of QCQP (2) that are bounded and has a global solution. We therefore give a name for such QCQP.
Definition 1
 1.
It is strictly feasible: there exists \(x\in {\mathbb {R}}^n\) such that \(g(x)<0\), and
 2.
(A, B) is definite with nonnegative shift: there exists \(\hat{\lambda }\ge 0\) such that \(A+\hat{\lambda }B\succ 0\).
We shall treat such QCQP in detail and derive an efficient algorithm in Sect. 3. To begin with, for definite feasible QCQP there always exists a global solution \(x_*\).
Theorem 2
(Moré [24]) For a definite feasible QCQP (2), there exist \(x_*\in {\mathbb {R}}^n\), \(\lambda _*\ge 0\) such that the conditions (8) hold.
In the special case of TRS we have \(B\succ 0\), so by taking \(\hat{\lambda }\) arbitrarily large we have \(A+\hat{\lambda }B\succ 0\), and since Slater’s condition is trivially satisifed, it follows that TRS is a definite feasible QCQP. Similarly, if \(A\succ 0\), taking \(\hat{\lambda }=0\) shows the pencil is definite, so such QCQP is definite feasible as long as it is strictly feasible. Indeed a number of studies focus on such cases [8, 9].
2.4.1 Checking definite feasibility
Let us now discuss how to determine whether a given QCQP (2) is definite feasible.
Generally the values of \(\lambda \) for which \(A+\lambda B\succ 0\), if nonempty, is an open interval \(\tilde{D}=(\tilde{\lambda }_1,\tilde{\lambda }_2)\), allowing \(\tilde{\lambda }_1=\infty \) and \(\tilde{\lambda }_2=\infty \), and the set of \(\lambda \) for which \(A+\lambda B\succeq 0\) is its closure [24] if \(\tilde{D}\) is nonempty. \(\tilde{\lambda }_1,\tilde{\lambda }_2\) are eigenvalues of the pencil \(A+\lambda B\) unless they are \(\pm \infty \).
3 Eigenvaluebased algorithm for definite feasible QCQP
We now develop an eigenvalue algorithm for definite feasible QCQP. In this section we assume that a value of \(\hat{\lambda }\ge 0\) such that \(A+\hat{\lambda }B\succ 0\) is known, through a process such as those described in Sect. 2.4.1.
By Theorems 1 and 2, a definite feasible QCQP can be solved by solving (8) for \(\lambda _*\) and \(x_*\). We develop an algorithm that first finds the optimal Lagrange multiplier \(\lambda _*\) by an eigenvalue problem, then computes \(x_*\).
3.1 Preparations
First, let \(D\) be the interval \(\{\lambda \ge 0A+\lambda B\succ 0\}\). We denote the leftend of \(D\) by \(\lambda _1\), and the rightend by \(\lambda _2\). Note that \(\lambda _2=\tilde{\lambda }_2\), but due to the requirement \(\lambda \ge 0\), the leftend of \(D\) may not be the same as \(\tilde{\lambda }_1\) in Sect. 2.4.1. We have either \(D=(\lambda _1,\lambda _2)\) if \(\lambda _1>0\), or \(D=[\lambda _1,\lambda _2)\), which happens if \(A\succ 0\) and hence \(\lambda _1=0\).
Proposition 1
(Moré [24]) \(\gamma (\lambda )\) is monotonically nonincreasing on \(\lambda \in \tilde{D}=(\tilde{\lambda }_1,\tilde{\lambda }_2)\supseteq D\). Moreover, excluding the case where \(x(\lambda )\) is a constant, \(\gamma (\lambda )\) is monotonically strictly decreasing on \(\lambda \in \tilde{D}\).
3.2 Classification of definite feasible QCQP
 (a)
\(\gamma (\lambda )\) takes both nonnegative and nonpositive values.
 (b)
\(\gamma (\lambda )>0\) everywhere.
 (c)
\(\gamma (\lambda )<0\) everywhere, and \(\lambda _1>0\).
 (d)
\(\gamma (\lambda )<0\) everywhere, and \(\lambda _1=0\).
First for (a), by the meanvalue theorem there exists \(\lambda \in D\) such that \(\gamma (\lambda )=0\), and for this \(\lambda \) we have \(\lambda _*=\lambda \).
To deal with cases (b), (c) we use the following result.
Proposition 2
 1.
Suppose case (b) holds with \(\lambda _2<\infty \). Then \(x(\lambda )\) converges as \(\lambda \nearrow \lambda _2\) and there exists x with \(g(x)=0,(A+\lambda _2 B)x=(a+\lambda _2 b)\).
 2.
Suppose case (c) holds with \(\lambda _1>0\). Then \(x(\lambda )\) converges as \(\lambda \searrow \lambda _1\), and there exists x with \(g(x)=0\) and \((A+\lambda _1 B)x=(a+\lambda _1 b)\).
Proof
Suppose case (b) holds with \(\lambda _2<\infty \). Since \(\lambda _2=\frac{d_{i_2}}{e_{i_2}}<\infty \) we have \(e_{i_2}\ne 0\) and further \(e_{i_2}<0\) from (10), which holds even if \(i_2\) contains multiple elements. Note also that \(x(\lambda _2)_i\) are bounded constants for all \(i\notin i_2\). Hence by (12), \(\gamma (\lambda )\) is a quadratic equation in \(x(\lambda )_{i_2}\) with negative leading coefficient, and so for the assumption \(\gamma (\lambda _2)>0\) to hold, \(x(\lambda _2)_{i_2}\) cannot blow up to \(\infty \). Hence by (11) we must have \(a_{i_2}+\lambda _{2} b_{i_2}=0\), and thus \(x(\lambda _2)\) converges to the vector (11) with the \(i_2\)th element \(x(\lambda _2)_{i_2} = \frac{b_{i_2}}{e_{i_2}}\). Now, any vector x equal to \(x(\lambda _2)\) with the \(i_2\)th element replaced with an arbitrary number satisfies \((A+\lambda _2 B)x=(a+\lambda _2 b)\); we shall choose this \(i_2\)th element of x—which we denote by y —so that \(g(x_y)=0\), where we made the ydependence of x explicit. Then \(g(x_y)=0\) is a quadratic equation in y with negative leading coefficient \(e_{i_2}\). Together with the assumption \(g(x_{b_{i_2}/e_{i_2}})>0\), there are two real solutions in y to \(g(x_y)=0\). With either root, the vector \(x:=x_y\) satisfies \(g(x)=0,(A+\lambda _2 B)x=(a+\lambda _2 b)\).
The the case (c) with \(\lambda _1>0\) is similar: We need \(a_{i_1}+\lambda _{1} b_{i_1}=0\), and \(x(\lambda _1)\) converges to the vector (11) with the \(i_1\)th element \(x(\lambda _1)_{i_1} = \frac{b_{i_1}}{e_{i_1}}\). Define the vector \(x_y\) to be equal to \(x(\lambda _1)\) except the \(i_1\)th element y, which is set so that \(g(x_y)=0\). Then \(x:=x_y\) satisfies the two equations. \(\square \)
We note that it is possible to prove Proposition 2 as a straightforward corollary of [9, Lemma 2].
By Proposition 2, in case (b) we have \(\lambda _*=\lambda _2\) when \(\lambda _2<\infty \). Similarly, in case (c) we have \(\lambda _*=\lambda _1\). In Proposition 2 we assumed \(\lambda _2<\infty \), but indeed Slater’s condition assures that \(\lambda _2=\infty \) and \(\gamma (\lambda _2)>0\) cannot happen.
Proposition 3
Suppose that \(\lambda _2=\infty \) for a definite feasible QCQP. Then \(\lim _{\lambda \rightarrow \infty }\gamma (\lambda )<0\).
Proof
An alternative way to understand Proposition 3 is to note that \(\lambda _2=\infty \), including \(B\succ 0\), indicates the QCQP (2) is essentially a TRS (after an affine changeofvariables), for which a solution for \(\gamma (\lambda _*)=0\) is known to exist [1]. One can actually show \(\lim _{\lambda \rightarrow \infty }\gamma (\lambda )=\min _xg(x)\) (which is \(<0\) by assumption), based on [28, Lemma 2.3].
Finally, consider case (d). Since \(\gamma (\lambda )\le 0\) as \(\lambda \rightarrow +0\), letting \(x_0\) be the limit (which exists [9]) we have \(g(x_0)\le 0\) and \(Ax_0=a\), so taking \(\lambda _*=0\), \(x_*=x_0\) we see that the conditions (8) are satisfied.
3.3 Computing the Lagrange multiplier \(\lambda _*\)
Our approach, building upon [1, 11] for the TRS, is to express \(\gamma (\lambda )=0\) as a generalized eigenvalue problem.
Theorem 3
Proof
When \(A+\lambda _* B\) is singular, we have \(a+\lambda _* b=(A+\lambda _*B)x_*\in {\mathcal {R}}(A+\lambda _* B)\), hence the bottom n rows of \(M_0+\lambda _* M_1\) have rank \((n1)\) or less, hence \(\det (M_0+\lambda _* M_1)=0\). \(\square \)
The above proof also shows that any KKT multiplier \(\lambda \) satisfying the first three equations in (8) is an eigenvalue of \(M_0+\lambda M_1\). Note from (18) and the fact \(A+\hat{\lambda }B\succ 0\) that we either have \(\det (M_0+\hat{\lambda }M_1)\ne 0\), indicating \(M_0+\lambda M_1\) is a regular matrix pencil (thus having exactly \(2n+1\) eigenvalues), or that \(\det (M_0+\hat{\lambda }M_1)= 0\), in which case \((\hat{\lambda },x(\hat{\lambda }))\) satisfies all the conditions in (8), thus is a solution for QCQP (2). It thus follows that \(\lambda _*\) can be found (or at least a finite set containing it) via the eigenvalue problem \(\det (M_0+\lambda M_1)= 0\). Note from Proposition 1 that \(M_0+\lambda M_1\) is regular unless \(\gamma (\lambda )\) is identically zero, in which case every value of \(\lambda \) for which \(A+\lambda B\succ 0\) is an optimal Lagrange multiplier. Since this case is easy, in what follows we assume \(M_0+\lambda M_1\) is regular.
We consider separate cases depending on the sign of \(\gamma (\hat{\lambda })\), and we next show that in both cases it suffices to compute one extremal eigenpair of (19).
Lemma 3
 1.
When \(\gamma (\hat{\lambda })>0\), the smallest real eigenvalue \(\lambda =\lambda _*\) of (15) larger than \(\hat{\lambda }\) satisfies \(\hat{\lambda }<\lambda _*\le \lambda _2\).
 2.
Similarly, when \(\gamma (\hat{\lambda })<0\), the largest real eigenvalue \(\lambda =\lambda _*\) of (15) smaller than \(\hat{\lambda }\) satisfies \(\lambda _1\le \lambda _*<\hat{\lambda }\), except in case (d).
 3.
When \(\gamma (\hat{\lambda })<0\) and case (d) happens, the largest real eigenvalue \(\lambda =\lambda _*\) of (15) smaller than \(\hat{\lambda }\) satisfies \(\lambda _*\le \lambda _1=0\), or there is no \(\lambda _*<\hat{\lambda }\) satisfying \(\gamma (\lambda _*)=0.\)
Proof
Except in case (d), \(\gamma (\lambda )\) is monotically strictly decreasing on \(\lambda \in \tilde{D}\) by Proposition (1). In this case, by (13) and Theorem 3, \(\lambda =\lambda _*>0\) satisfies (17). Checking (13) and the sign of \(\gamma (\hat{\lambda })\), we complete the proof for all cases but (d).
In case (d), there is no nonnegative \(\lambda <\lambda _2\) that satisfies \(\gamma (\lambda )=0\). Two possibilities are (i) \(\lambda _*\le 0\) exists such that \(\gamma (\lambda _*)=0\), or (ii) no such \(\lambda _*\) exists. This completes the proof.
We note that in Lemma 3, eigenvalues \(\lambda =\pm \infty \) of (15) are not allowed.
3.3.1 When \(\gamma (\hat{\lambda })=0\)
In this case \(\gamma (\hat{\lambda })=g(x(\hat{\lambda }))=0\), so \((\lambda _*,x_*)=(\hat{\lambda },x(\hat{\lambda }))\) satisfies (8). Hence in this case we are done; there is no need to solve the generalized eigenvalue problem.
3.3.2 When \(\gamma (\hat{\lambda })>0\)
Theorem 4
Suppose \(\gamma (\hat{\lambda })>0\). Then for the largest finite real eigenvalue \(\xi '\) of (19) it holds \(\xi '>0\), and the optimal Lagrange multiplier satisfying (8) is \(\lambda _*=\hat{\lambda }+\xi '^{1}\).
Proof
Using the same \(\lambda _*\) as in Lemma 3, \(\xi _*=(\lambda _*\hat{\lambda })^{1}\) is an eigenvalue of (19). By Lemma 3, \(\xi '\ge \xi _*=(\lambda _*\hat{\lambda })^{1}>0.\)
If \(\xi '> \xi _*\), \(\lambda =\hat{\lambda }+\xi '^{1}\) becomes the smallest real eigenvalue of (15) larger than \(\hat{\lambda }\), which contradicts Lemma 3. Therefore \(\xi '=\xi _*,\) and \(\lambda _*=\hat{\lambda }+\xi '^{1}\). \(\square \)
Theorem 4 shows that when \(\gamma (\hat{\lambda })>0\), the optimal Lagrange multiplier \(\lambda _*\) can be obtained by computing the largest real eigenvalue of (19). One practical difficulty here is that by an algorithm such as shiftandinvert Arnoldi, it can be much harder to compute the largest real eigenvalue than the eigenvalue with largest real part (which can be complex). We shall now show that in fact these are the same for (19), that is, its rightmost eigenvalue is real. A similar statement was made in [1] for the special case of TRS; here we extend the result to definite feasible QCQP.
Theorem 5
Let \(\gamma (\hat{\lambda })>0\). Then the rightmost finite eigenvalue of the pencil (19) is real.
Proof
It suffices to prove that for every \(\xi =s+t{\text {i}}\) with \(s\ge \xi '\) and \(t\ne 0\), we have \(\det (M_1+\xi \hat{M})\ne 0\), or equivalently (by (18)), that \(\det (A+(\hat{\lambda }+\xi ^{1})B)\ne 0\) and \(\gamma (\hat{\lambda }+\xi ^{1})\ne 0\).
First consider values of \(\lambda \) such that \(\det (A+\lambda B)=0\). These are the eigenvalues \(\lambda =\frac{d_i}{e_i}\) of \(A+\lambda B\). In particular, when \(\lambda \) is nonreal we have \(\det (A+\lambda B)\ne 0\), and so \(\det (A+(\hat{\lambda }+\xi ^{1})B)\ne 0\).
The upshot is that to obtain the optimal Lagrange multiplier \(\lambda _*\) when \(\gamma (\hat{\lambda })>0\), it suffices to compute the rightmost eigenpair of (19).
3.3.3 When \(\gamma (\hat{\lambda })<0\)
This case can be treated in essentially the same way as above, but special treatment is necessary for the case (d).
Theorem 6
Suppose \(\gamma (\hat{\lambda })<0\). Let \(\xi '\) be the leftmost real finite eigenvalue of (19). If \(\hat{\lambda }=0\) or \(\hat{\lambda }^{1}\le \xi '\le 0\), the case corresponds to (d), and we have \(\lambda _*=0\). If \(\hat{\lambda }>0\) and \(\xi '<\hat{\lambda }^{1}\), then \(\lambda _*=\hat{\lambda }+\xi '^{1}\).
Proof
If \(\hat{\lambda }=0\) then \(0\le \lambda _1\le \hat{\lambda }=0\), which happens only in case (d). If \(\hat{\lambda }^{1}\le \xi '\le 0\) then \(\hat{\lambda }+\xi '^{1}\le 0\) or \(\xi '=0\) holds. These conditions imply that the largest eigenvalue of (15) smaller than \(\hat{\lambda }\) is nonpositive, or (15) has no nonpositive eigenvalue. By Lemma 3, these occur only in case (d).
Now suppose \(\hat{\lambda }>0\) and \(\xi '<\hat{\lambda }^{1}\). These conditions imply \(\hat{\lambda }+\xi '^{1}>0\) and \(\det (M_0+(\hat{\lambda }+\xi '^{1})M_1)=0\), therefore the case (d) does not occur. By \(\hat{\lambda }+\xi '^{1}\le \hat{\lambda }\) and \(\det (M_0+(\hat{\lambda }+\xi '^{1})M_1)=0\) we have \(\hat{\lambda }+\xi '^{1}\le \lambda _*\). If \(\hat{\lambda }+\xi '^{1}<\lambda _*\), \(\lambda =\hat{\lambda }+\xi '^{1}\) becomes the largest real eigenvalue of (15) smaller than \(\hat{\lambda }\), which contradicts Lemma 3. Therefore \(\lambda _*=\hat{\lambda }+\xi '^{1}\). \(\square \)
The following is an analogue of Theorem 5.
Theorem 7
Suppose that \(\gamma (\hat{\lambda })<0\), and that for the leftmost real finite eigenvalue \(\xi '\) of (19), \(\xi =s+t{\mathrm {i}}\) satisfies \(s\le \xi '\), \(t\ne 0\) and \(\xi \) is an eigenvalue of (19). Then this corresponds to case (d), and \(\text{ Re }(\hat{\lambda }+\xi ^{1})\le 0\).
Proof
Now suppose we are in case (d). Since \(D\) is bounded below by 0, if \(0<\lambda \le \hat{\lambda }\) then \(\lambda \in D\). Also, since \(p\le \hat{\lambda }\) always holds, if \(p>0\) then by \(0<p\le \hat{\lambda }\) we have \(p\in D\), so as in the proof of Theorem 5 we see that \(\xi \) is not a solution for (19), again a contradiction. Thus we conclude that \(p={\text {Re}}(\hat{\lambda }+\xi ^{1})\le 0\).
In case (d) with \(\hat{\lambda }=0\), using the fact that \(\xi =0\) is always a solution of (19), we obtain \(\text{ Re }(\xi )\le \xi '\le 0\) and \(\text{ Re }(\hat{\lambda }+\xi ^{1})=\text{ Re }(\xi ^{1})\le 0\). \(\square \)

if \(\hat{\lambda }+\xi _*^{1}>0\), take \(\lambda _*=\hat{\lambda }+\xi _*^{1}\)

if either \(\hat{\lambda }=0\), \(\text{ Re }(\hat{\lambda }+\xi _*^{1})\le 0\), or if \(\xi _*=0\), take \(\lambda _*=0\).
3.3.4 Pseudocode for computing \(\lambda _*\)
As discussed before, we can compute the rightmost (or leftmost) eigenpair of a generalized eigenvalue problem using the Arnoldi method, which is much more efficient than computing all the eigenvalues, especially when the matrices have structure such as symmetry and/or sparsity. In Matlab the eigs command with the flag ’lr’ (’sr’) computes such eigenpair.
3.4 Obtaining the solution \(x_*\)
Having computed the optimal Lagrange multiplier \(\lambda _*\), we now turn to finding the solution \(x_*\). We shall show that generically the eigenvector z obtained in Algorithm 3.1 contains the desired information on \(x_*\).
First, if the output of Algorithm 3.1 is \(\lambda _*=0\), then the QCQP solution is simply \(A^{1}a\), the solution of a linear system (see Sect. 3.4.2 for the case \(\det (A)=0\)).
For nonzero \(\lambda _*\), we can generically obtain the solution by computing \(x_*=(A+\lambda _* B)^{1}(a+\lambda _* b)\), but below we show that solving such linear system is usually unnecessary.
3.4.1 When \(A+\lambda _* B\) is nonsingular
If \(\lambda _*>0\) and \(\det (A+\lambda _* B)\ne 0\), then we can obtain \(x_*\) via the eigenvector associated with \(\lambda _*\) (which is obtained by the Arnoldi method). Suppose \(z=[\theta \ y_1^\top \ y_2^\top ]^\top \) is the computed eigenvector where \(\theta \in {\mathbb {R}},y_1,y_2\in {\mathbb {R}}^n\).
Next suppose that \(\theta =0\). By (21), if \(y_1\ne 0\) then \(y_1\in {\mathcal {N}}(A+\lambda _* B)\), and \(A+\lambda _* B\) is singular. When \(y_1=0\), by (20) we have \(y_2\in {\mathcal {N}}(A+\lambda _* B)\) so again \(A+\lambda _* B\) is singular. Thus when \(A+\lambda _* B\) is nonsingular we necessarily have \(\theta \ne 0\), and thus we can obtain \(x_*\) directly from the eigenvector z.
3.4.2 When \(A+\lambda _* B\) is singular
In fact, the case where \(A+\lambda _* B\) is singular corresponds to the well known “hard case” for the special case of TRS. For TRS, dealing with such hard cases are discussed in [25, 30]. In this work we discuss dealing with the hard case for the general QCQP by forming and solving a nonsingular linear system that has the same solution as (22). The development here parallels that in [1], which is in turn based on [10].
The following theorem will be the basis for the construction of \(x_*\).
Theorem 8
 1.
\(\tilde{A}\succ 0\), in particular, \(\tilde{A}\) is nonsingular (hence \(w_*\) above exists uniquely),
 2.
\((A+\lambda _* B)w_*=(a+\lambda _* b)\),
 3.
\((Bw_*+b)^\top v=0\) for every \(v\in {\mathcal {N}}(A+\lambda _* B)\).
To prove the theorem we prepare a lemma, which we will use repeatedly.
Lemma 4
For a definite feasible QCQP (2), if \(x\in {\mathcal {N}}(A+\lambda _* B)\) and \(x^\top Bx=0\) then \(x=0\).
Proof
We are now ready to prove Theorem 8.
Proof
 1.By \(A+\lambda _* B\succeq 0\) and \(\sum _{i=1}^{j}Bv_iv_i^\top B\succeq 0\), we trivially have \(\tilde{A}\succeq 0\). For any \(x\in {\mathbb {R}}^n\), there exist a unique \(x_0\in {\mathcal {N}}(A+\lambda _* B)\) and \(x_1\in {\mathcal {R}}(A+\lambda _* B)\) such that \(x=x_0+x_1\). Let x be a vector such that \(x^\top \tilde{A}x=0\). We show that \(x=0\). We have \( x^\top \tilde{A}x=x_1^\top (A+\lambda _* B)x_1+\alpha \sum _{i=1}^{j}(v_i^\top Bx)^2=0, \) henceSince \((A+\lambda _* B)\succeq 0\) we have \((A+\lambda _* B)x_1=0\), and hence \(x_1\in {\mathcal {N}}(A+\lambda _* B)\). Together with the assumption \(x_1\in {\mathcal {R}}(A+\lambda _* B)\) we obtain \(x_1=0\). Therefore, \(x=x_0\) can be written as \(x=\sum _{i=1}^{j}c_iv_i\), for some constants \(c_1,\ldots ,c_j\), so together with (24) we obtain \(x^\top Bx=\sum _{i=1}^{j}c_iv_i^\top Bx=0\). Combining this with \(x^\top (A+\lambda _* B)x=0\) and Lemma 4 we obtain \(x=0\). Therefore \(\tilde{A}\succ 0\).$$\begin{aligned} x_1^\top (A+\lambda _* B)x_1=0,\quad \text{ and } \quad v_i^\top Bx=0\ (i=1,\ldots ,j) . \end{aligned}$$(24)
 2.We shall first prove thatFrom \(\tilde{A}u_i=Bv_i,\) we have$$\begin{aligned} u_i:=\tilde{A}^{1}Bv_i\in {\mathcal {N}}(A+\lambda _* B),\quad i=1,\ldots ,j. \end{aligned}$$(25)where we defined \(v'_i:=v_i\alpha \sum _{k=1}^{j}(v_k^\top Bu_i)v_k\). Since \(v'_i\in {\mathcal {N}}(A+\lambda _* B)\) we have \((v'_i)^\top Bv'_i=(v'_i)^\top (A+\lambda _*B)u_i=0\). Therefore by Lemma 4 we have \(v'_i=0\), so \((A+\lambda _* B)u_i=0\), hence \(u_i\in {\mathcal {N}}(A+\lambda _* B)\), establishing (25). From this it follows that \((A+\lambda _* B)\tilde{A}^{1}Bv_i=0\), and$$\begin{aligned} (A+\lambda _* B)u_i&=Bv_i\alpha \sum _{k=1}^{j}\left( v_k^\top Bu_i\right) Bv_k\\&=B\Big (v_i\alpha \sum _{k=1}^{j}\left( v_k^\top Bu_i\right) v_k \Big ) =:Bv_i', \end{aligned}$$where for the last equality we used \((A+\lambda _* B)\tilde{A}^{1}Bv_i=0\) and (23). Now from (8) we see that there exists \(x_*\) such that \(a+\lambda _* b=(A+\lambda _* B)x_*\), so \(v_i^\top B\tilde{A}^{1}(a+\lambda _* b)=u_i^\top (A+\lambda _* B)x_*=0\) for \(i=1,\ldots ,j\), and \((A+\lambda _* B)w_*=(a+\lambda _* b)\).$$\begin{aligned} (A+\lambda _* B)w_*+(a+\lambda _* b)&=(A+\lambda _* B)\tilde{A}^{1}\tilde{a}+(a+\lambda _* b) \\&=(A+\lambda _* B)\tilde{A}^{1}\left( a+\lambda _* b+\alpha \sum _{i=1}^{j}Bv_iv_i^\top b\right) \\&\qquad +(a+\lambda _* b)\\&=(A+\lambda _* B\tilde{A})\tilde{A}^{1}(a+\lambda _* b)\\&\quad \alpha \sum _{i=1}^{j}((A+\lambda _* B)\tilde{A}^{1}Bv_i)v_i^\top b \\&=\alpha \sum _{i=1}^{j}Bv_i\left( v_i^\top B\tilde{A}^{1}(a+\lambda _* b)\right) , \end{aligned}$$
 3.For any \(v\in {\mathcal {N}}(A+\lambda _* B)\), we havewhere we define \(L:=\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}B\). Next, suppose that \(Lx=0\) and \(x\in {\mathcal {N}}(A+\lambda _*B)\). We show that \(x=0.\) To this end, note that$$\begin{aligned} (Bw_*+b)^\top v&=(B\tilde{A}^{1}\tilde{a}+b)^\top v=\Big (B\tilde{A}^{1}\Big (a+\lambda _* b+\alpha \sum _{i=1}^{j}Bv_iv_i^\top b\Big )+b\Big )^\top v\nonumber \\&=b^\top \Big (B\tilde{A}^{1}\alpha \sum _{i=1}^{j}Bv_iv_i^\top \Big )^\top v+b^\top v\nonumber \\&=b^\top \Big (\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}B\Big ) v+b^\top v =:b^\top Lv+b^\top v, \end{aligned}$$(26)so \(v_i^\top B\tilde{A}^{1}Bx=0\) for \(i=1,\ldots ,j\). Now since x can be written as a linear combination of \(v_1,\ldots ,v_j\), it follows that \(x^\top B\tilde{A}^{1}Bx=0\), and by \(\tilde{A}^{1}\succ 0\) we have \(Bx=0\). Hence \(x^\top Bx=0\), and again by Lemma 4 we conclude that \(x=0\). Moreover,$$\begin{aligned} Lx=\alpha \sum _{i=1}^{j}\left( v_i^\top B\tilde{A}^{1}Bx\right) v_i=0, \end{aligned}$$that is, L is an idempotent matrix: indeed, it turns out that L does not depend on \(\alpha \). Therefore, for every \(v\in {\mathcal {N}}(A+\lambda _* B)\), Lv is a linear combination of \(v_1,\ldots ,v_j\), so \((vLv)\in {\mathcal {N}}(A+\lambda _* B)\), and from \(L(vLv)=LvL^2v=0\) it follows from the above argument, taking \(x\leftarrow vLv\), that \(vLv=0\). Hence by (26) we conclude that$$\begin{aligned} L&=\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}B =\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}\tilde{A}\tilde{A}^{1}B\\&=\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}\left( A+\lambda _* B+\alpha \sum _{l=1}^{j}Bv_lv_l^\top B\right) \tilde{A}^{1}B \\&=\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}\bigg (\alpha \sum _{l=1}^{j}Bv_lv_l^\top B\bigg )\tilde{A}^{1}B \quad \text{(by } \text{(25)) }\\&=\alpha \sum _{l=1}^{j}\bigg (\alpha \sum _{i=1}^{j}v_iv_i^\top B\tilde{A}^{1}B\bigg )v_lv_l^\top B\tilde{A}^{1}B\\&=\alpha \sum _{l=1}^{j}Lv_lv_l^\top B\tilde{A}^{1}B=L^2, \end{aligned}$$as required.\(\square \)$$\begin{aligned} (Bw_*+b)^\top v&=b^\top (vLv)=0, \end{aligned}$$
Similarly, in case (c), we have \(v^\top Bv=\frac{1}{\hat{\lambda }\lambda _*}v^\top (A+\hat{\lambda }B)v>0\) and \(0=g(x_*)>g(w_*)\) holds. Thus the quadratic equation \( g(w_*+tv)=v^\top Bvt^2+g(w_*)=0 \) in t has real solutions \(t=\pm \sqrt{g(w_*)/(v^\top Bv)}\). Letting t be one of these solutions and taking \(x_*=w_*+tv\), we see that \((\lambda _*,x_*)\) satisfies (8).
Next when \(\lambda _*=0\) and \(A=A+\lambda _* B\) is singular, we similarly have \(g(w_*)\le 0\). However, recalling (8), when \(\lambda _*=0\) we do not need \(g(x_*)=0\), so we can directly take \(x_*=w_*\).
3.5 Complexity
When no structure is present and A, B are dense, the dominant cost in Algorithm 3.2 lies in finding an eigenpair and the solution of a linear system; these are both \(O(n^3)\). Computing \({\mathcal {N}}(A+\lambda _*B)\) can be done by an SVD, and finding \(\gamma (\hat{\lambda })\) is mostly solving a linear system, and the other steps are all \(O(n^2)\). Hence the overall complexity of Algorithm 3.2 is \(O(n^3)\).
In comparison, the SDPbased approaches require at least \(O(n^3)\) in each iteration of the interiorpoint method [3] with a rather large constant, so we see that Algorithm 3.2 can be much more efficient.
Moreover, the dominant step of finding an extremal eigenpair can easily take advantage of the sparsity structure of A, B if present, resulting in running time much faster than \(O(n^3)\). This fact is illustrated in our experiments.
4 Numerical experiments
To illustrate the performance (speed and accuracy) of Algorithm 3.2 for solving QCQP (2), here we present Matlab experiments comparing with the SDPbased algorithm. Specifically, we compare Algorithm 3.2 with SDP solvers based on the interiorpoint method: SeDuMi [36], and SDPT3 [39], which we invoke via CVX [14]. We used the default values for parameters such as the stopping criterion. However, since the core of that algorithm and ours are both essentially the same as those for the TRS (excluding finding \(\hat{\lambda }\), which they both require), we expect that our code would outperform [24] in speed and accuracy just as in TRS.
All experiments were carried out in MATLAB version R2013a on a machine with an Intel Xeon E52680 processor with 64GB RAM.
4.1 Setup
We generate a “random” definite feasible QCQP with indefinite A, B as follows. First form a random positive definite \(K\succ 0\), formed as \(X^{\top }X+I\) where X is a random \(n\times n\) matrix, obtained by MATLAB’s function randn(n). Since the problem becomes illconditioned if K is close to singular, we chose K to have eigenvalues at least 1. We then set \(\hat{\lambda }\) to be a random positive number.
We then took a random symmetric matrix B obtained by Y = randn(n); B = Y+Y’, and define A via \(K=A+\hat{\lambda }B\). We took a and b to be random vectors.
To form a problem with known exact solution (so that the accuracy of the computed solution x can be evaluated), we take \(\lambda _\mathrm{opt}:=\hat{\lambda }+\epsilon \) where \(\epsilon \approx 10^{10}\) and computed \(x_\mathrm{opt}=(A+\lambda _\mathrm{opt}B)^{1}(a+\lambda _\mathrm{opt}b)\), then set \(\beta \) to satisfy \(g(x_\mathrm{opt})=0\), so that \((\lambda _\mathrm{opt},x_\mathrm{opt})\) satisfies (1), hence \(x_\mathrm{opt}\) is the QCQP solution, i.e., \((\lambda _\mathrm{opt},x_\mathrm{opt}) = (\lambda _*,x_*)\).
Below we report the average speed and accuracy from 50 randomly generated instances for each matrix size n.
4.1.1 Computing \(\hat{\lambda }\)
In practice \(\hat{\lambda }\) is usually unknown in advance, and in that case our algorithm starts by computing \(\hat{\lambda }\). To this end we used the algorithm in [6, 15] to find \(\hat{\lambda }\), as discussed in Sect. 2.4.1, to obtain the interval \(D\). Although any value in \(D\) is allowed to be \(\hat{\lambda }\), we chose \(\hat{ \lambda }\) as the middle point of \(D\) to avoid illconditioning of \(A+\hat{\lambda }B\).
In the figures we show the performance of our algorithm in two cases: (i) when \(\hat{\lambda }\) is known a priori, shown as “Eig”, and and (ii) when \(\hat{\lambda }\) needs to be computed, shown as “Eigcheck”. In other words, the runtime of Eigcheck is the sum of Eig and finding \(\hat{\lambda }\).
4.1.2 Newton refinement process
4.2 Results
We see that when \(\hat{\lambda }\) is known (Eig), our algorithm is faster than SeDuMi, SDPT3 by orders of magnitude. Even if we include the time for computing \(\hat{\lambda }\), our algorithm (EigCheck) is still faster than SeDuMi and SDPT3.
Figure 6 illustrates that our algorithm found solutions and objective values nearest to optimal, hence more accurate.
Sparse matrices Another strength of our method is that it can directly take advantage of the matrix sparsity structure. Specifically, for the computation of an extremal eigenpair, which is the dominant part of our algorithm, efficient eigensolvers for largesparse matrices are widely available [2, 22], and implemented for example in Matlab’s eigs command.
In Figs. 7, 8 and 9 we verify that when the matrices A, B are highly sparse, our method runs faster than \(O(n^3)\); here it scaled like \(O(n^2)\) for the tridiagonal case, and also for the random sparse case when the number of nonzeros per row is fixed. The accuracy of the solution and objective values was also consistently good, as illustrated in Fig. 8 for the tridiagonal case; the other examples gave similar results.
In Figure 10, we observe that our algorithm computed the optimal value reliably even in the illconditioned case, unlike the SDPbased algorithms. Moré’s algorithm gave even better accuracy here: recall that this algorithm is an extension of the classical MoréSorensen algorithm for TRS [35], which is iterative in nature (solving a linear system in each iteration), and not matrixfree.
5 QCQP that are not definite feasible
Thus far we have focused on the definite feasible QCQP and derived an eigenvaluebased algorithm that is fast and accurate. We now develop an analysis that accounts for “nongeneric” QCQP that are not necessarily definite feasible (since the discussion in Sect. 2.1 still holds, we still focus on the strictly feasible case). The key tool for our analysis is the canonical form of a symmetric pair under congruence, which we review next.
5.1 The canonical form of (A, B) under congruence
For a pair of symmetric matrices (A, B) the canonical form under congruence [21, 37, 38], shown below, is the simplest form taken by \(W^\top (A +\lambda B )W\) where W is nonsingular. We define the \(\oplus \) operator as \(A_1\oplus A_2:=\left[ {\begin{matrix} A_1 &{} O \\ O &{} A_2 \end{matrix}} \right] \).
Theorem 9
 1.
The blocks in the righthand side of (27) correspond to a singular part; any matrix pair that possess these blocks is singular, that is, \(\det (A+\lambda B)=0\) for every \(\lambda \).
 2.
The blocks (28) correspond to real finite (right term) and infinite (left term) eigenvalues. The right terms are the “natural” extensions of the Jordan block in standard eigenvalue problems. \(k_j,l_j\) are the size of the Jordan blocks.
 3.
The blocks (29) correspond to nonreal eigenvalues, which must appear in conjugate pairs. Again, \(m_j\) is the Jordan block size.
5.2 Implication of canonical form for QCQP boundedness
Now we turn to the implications of Theorem 9 for QCQP, first focusing on the condition for QCQP to be bounded.
In Sect. 2.1 we dealt with the case where the feasible region has no interior point, and the analysis made no assumption on definite feasibility. Hence, here we assume Slater’s condition, which allows us to invoke Lemma 1 and Theorem 1. The first observation is that the necessary condition \(A+\lambda B\succeq 0\) in (5) for QCQP to be bounded restricts the admissible canonical forms of \(A+\lambda B\) from the general form in Theorem 9 to the following.
Theorem 10
Proof
For \(A+\lambda B\succeq 0\) to hold, each block in Theorem 9 needs to be positive semidefinite. We examine each of the blocks of the form (27), (28) and (29).
Similarly, the second term in (27) cannot exist since its (1, 1), \((1,2\varepsilon _j)\), \((2\varepsilon _j,2\varepsilon _j)\) elements are respectively 0,1 and 0.
Given A, B satisfying (30), we next examine the values of \(\lambda \) for which \(A+\lambda B\succeq 0\).
Proposition 4
A proof is a straightforward examination of each term in (30).
The above results imply that for a bounded QCQP, the pencil \(A+\lambda B\) cannot have nonreal eigenvalues. Furthermore, Jordan blocks must be of size at most two, and when (30) contains \(q_2\ge 1\) blocks of size two \(J(\lambda ;\theta )\) (the QCQP in (6) is one such example with \(q_2=1\)), the corresponding eigenvalue \(\theta \) need to be all the same for all the \(q_2\) blocks, and moreover the value of \(\lambda \) with \(A+\lambda B\succeq 0\) is restricted to just one value, namely \(\lambda =\theta \), corresponding to the block \(J(\theta ;\theta )=\big [ {\begin{matrix} 1&{}0 \\ 0&{}0 \end{matrix}}\big ]\), and this value \(\lambda =\theta \) needs to satisfy \(\bigoplus _{j=1}^{q_1}\eta _j[\lambda +\alpha _j]\succeq 0\) for \(A+\lambda B\succeq 0\) to hold.
5.2.1 Characterizing bounded QCQP via the canonical form
Here we show that the conditions (32) can be written explicitly using the canonical form of symmetric pencils by congruence. Essentially this specifies the types of A, B for which the QCQP is solvable.

if no block \(J(\lambda ;\theta )\) is present, then by Proposition 4 there exists an interval \([\alpha _{j},\alpha _{j+1}]\) on which \(A+\lambda B\succeq 0\), and the requirement on \(\eta _i\) is \(\eta _i=1\) for \(i\le j\) and \(\eta _i=1\) for \(i\ge j+1\). (Note that \(\alpha _j=\alpha _{j+1}\) is allowed, in which case the interval \([\alpha _{j},\alpha _{j+1}]\) becomes a point. We also allow “\(\alpha _{j+1}=\infty \)”, which is when \(\eta _i=1\) for all i; this includes TRS).

if a block \(J(\lambda ;\theta )\) is present then the \(\theta \) values must be all the same, and \(\lambda =\theta \) is the only value for which \(A+\lambda B\succeq 0\). The requirement on \(\eta _i\) is \(\eta _i=1\) if \(\alpha _j>\theta \), and \(\eta _j=1\) if \(\alpha _j<\theta \). For the real and semisimple eigenvalues \(\alpha _j=\theta \), the corresponding sign characteristic \(\eta _j\) is allowed to be either 1 or \(1\).
 \(A+\lambda B\succeq 0\) on an interval \([\lambda _j,\lambda _{j+1}]\) with \(\lambda _j<\lambda _{j+1}\). In this caseThis has a solution for any value of \(\lambda \in (\lambda _j,\lambda _{j+1})\) if and only if \(W^{\top } (a+\lambda b)\) is of the form$$\begin{aligned} \left( O_{u\times u}\oplus I_{r\times r}\oplus \bigoplus _{j=1}^{q_1}\eta _j[\lambda +\alpha _j]\right) W^{1}x=W^{\top } (a+\lambda b). \end{aligned}$$where \(*\in {\mathbb {R}}^{nu}\) can take any value. Crucial here is the zero pattern of the vector \(W^{\top } (a+\lambda b)\); whether such vector exists with \(\lambda \in (\lambda _j,\lambda _{j+1})\) can be verified easily once \(W^{\top }a,W^{\top }b\) are available.$$\begin{aligned} W^{\top } (a+\lambda b) = \begin{bmatrix} 0_{1\times u}\\*\end{bmatrix}, \end{aligned}$$(34)
 \(A+\lambda B\succeq 0\) only at a point \(\hat{\lambda }\). In this case (33) reduces toNote that \(q_2=0\) is allowed, and otherwise \(\hat{\lambda }=\theta \). Clearly, this has a solution if and only if$$\begin{aligned}&\left( O_{u\times u}\oplus I_{r\times r}\oplus \small \begin{bmatrix} \eta _1(\hat{\lambda }+\alpha _1)&\\ {}&\ddots&\\&\eta _{q_1}(\hat{\lambda }+\alpha _{q_1}) \end{bmatrix} \normalsize \oplus \bigoplus _{j=1}^{q_2} \begin{bmatrix} 1&\quad 0\\0&\quad 0 \end{bmatrix} \right) W^{1}x\\&\quad =W^{\top } (a+\hat{\lambda }b). \end{aligned}$$where \(*\in {\mathbb {R}}^{nu2q_2}\) can take any value (except for elements corresponding to \((\hat{\lambda }+\alpha _i)=0\) if such elements are present), and \(*_J\in {\mathbb {R}}^{2q_2}\) has zeros in coordinates of even indices: \(*_J = [\star ,0,\star ,0,\ldots ,0,\star ,0]\) where each \(\star \) denotes an arbitrary scalar.$$\begin{aligned} W^\top (a+\hat{\lambda }b) = \begin{bmatrix} O_{u\times 1}\\*\\*_J \end{bmatrix} \end{aligned}$$(35)
Theorem 11
Note that the conditions in the theorem are straightforward to verify provided that the congruence transformation W for the canonical form is available. We note that in [19, Thm. 6] a necessary condition is given for QCQP to be bounded; Theorem 11 gives the necessary and sufficient conditions.
To compute W, in [19] an algorithm is presented assuming B is nonsingular and all the eigenvalues are real and the Jordan blocks are of size at most two with the same eigenvalue. For the general case, one can proceed by upper triangularizing the matrix pencil using the QZ algorithm (or the GUPTRI algorithm [7, 20] to deal with singular pencils), and then solving generalized Sylvester equations [12, Sec. 7.7] to block diagonalize the matrices, detect the Jordan block sizes and find the corresponding transformations for each block. Unfortunately, currently no numerically stable algorithm appears to be available for computing the canoincal form of a general symmetric pair.
5.3 Complete solution for QCQP
We now discuss how to solve a QCQP that is not necessarily definite feasible. We describe the process in a way that avoids computing the canonical form whenever possible.
5.3.1 Removing common null space
For QCQP that are not definite feasible, attempting to compute \(\lambda _*\) as in Sect. 3, we face the difficulty that the \(O_{u\times u}\) block (if it exists) forces \(\det (M_0+\lambda M_1)=0\) for every value of \(\lambda \), so the pencil is singular and hence we cannot compute \(\lambda _*\) via the generalized eigenvalue problem. Here we discuss how to remove such \(O_{u\times u}\) blocks.
First suppose that \(c\ne 0\) but \(d=0\). Then (36) is clearly unbounded, as we can take \(z=\alpha c\) with \(\alpha \rightarrow \infty \).
We thus focus on QCQP without a \(O_{u\times u}\) block in what follows.
5.3.2 Solution process for nongeneric QCQP
 1.
When \({\mathcal {N}}(A+\hat{\lambda }B)=0\). This means \(A+\hat{\lambda }B\) is nonsingular and so \(A+\hat{\lambda }B\succ 0\), so it belongs to the definite feasible case, for which Algorithm 3.2 suffices.
 2.
When \(V^\top BV\succ 0\) or \(V^\top BV\prec 0\). By (39), \(V^\top BV\succ 0\) is equivalent to \({\mathcal {J}}_2=\emptyset \), and \(\eta _j=1\) for all \(j\in {\mathcal {J}}_1\). Similarly, \(V^\top BV\prec 0\) is equivalent to \({\mathcal {J}}_2=\emptyset \) and \(\eta _j=1\) for \(j\in {\mathcal {J}}_1\). Thus slightly perturbing \(\hat{\lambda }\) in the positive direction \(\hat{\lambda }\leftarrow \hat{\lambda }+\epsilon \) (when \(V^\top BV\succ 0\)) or the negative direction \(\hat{\lambda }\leftarrow \hat{\lambda }\epsilon \) (when \(V^\top BV\prec 0\)) for a positive \(\epsilon \), we obtain \(W^\top (A+\hat{\lambda }B)W = I_{r\times r}\oplus \bigoplus _{j=1,j\notin {\mathcal {J}}_1}^{q_1}\eta _j[\lambda +\alpha _j\pm \epsilon ]\oplus \bigoplus _{j=1,j\in {\mathcal {J}}_1}^{q_1}[\epsilon ]\succ 0\), as long as \(\epsilon >0\) is taken sufficiently small. Thus by updating \(\hat{\lambda }\) to the perturbed \(\hat{\lambda }\), we have \({\mathcal {N}}(A+\hat{\lambda }B)=0\).
 3.
When \(V^\top B V\) is indefinite with both positive and negative eigenvalues. For all j such that \(\hat{\lambda }+\alpha _j=0\), the signs of \(\eta _j\) take both \(+1\) and \(1\). This implies \(\hat{\lambda }=\lambda _*\), which is the only value \(\lambda \) for which \(A+\lambda B\succeq 0\). Moreover, we can take \(v_1,v_2\in {\mathcal {N}}(A+\hat{\lambda }B)\) such that \(v_1^\top Bv_1>0,v_2^\top Bv_2<0\), so we solve \((A+\hat{\lambda }B)\hat{x}=(a+\hat{\lambda }b)\) for \(\hat{x}\) (by (34) the QCQP is unbounded if no such \(\hat{x}\) exists) and then find \(t\in {\mathbb {R}}\) such that \(g(\hat{x}+tv_i)=0\); we choose \(i\in \{1,2\}\) depending on the sign of \(g(\hat{x})\): \(i=1\) if \(g(x_*)<0\), and \(i=2\) otherwise. Then \(x_*=\hat{x}+tv_i\) is the solution.
 4.When \(V^\top BV\ne O\) has a zero eigenvalue, and we have \(V^\top BV\succeq 0\) or \(V^\top BV\preceq 0\). For definiteness suppose that \(V^\top BV\succeq 0\); the other case is analogous. Since a zero eigenvalue is present, this is a case where the \(J(\lambda ;\theta )\) block exists. The goal is to find x such that \(g(x)=0\) and \((A+\hat{\lambda }B)x=(a+\hat{\lambda }b)\). We first find a vector \(w_*\) such thatwhile if no such \(w_*\) exists then the QCQP is unbounded by Theorem 11. Otherwise the QCQP is bounded, and we proceed to solve the unconstrained quadratic optimization problem$$\begin{aligned} (A+\hat{\lambda }B)w_*=(a+\hat{\lambda }b), \end{aligned}$$(40)If the optimal objective value is 0 or below (including \(\infty \)), there must exist \(u_0\) such that \(g(w_*+Vu_0)\le 0\). We then use a vector v such that \(v^\top Bv>0,~v\in {\mathcal {N}}(A+\hat{\lambda }B)\) and adjust a scalar t so that \(g(w_*+Vu_0+tv)= 0\). Then we obtain a global solution \(w_*+Vu_0+tv\). Next consider the case where the optimal value of (41) is larger than 0. In this case there is no x such that \(g(x)=0\) and \((A+\hat{\lambda }B)x=(a+\hat{\lambda }b)\). Since we are dealing with the bounded case, this means we are in the unattainable case; there exists a scalar \(\mu \) such that for any \(\varepsilon >0\), there exists a feasible point x with \(f(x)=\mu +\varepsilon \). A similar statement is made in [16, Thm. 7]. Since there is no solution in this case (ii), a reasonable goal would be to provide just \(\mu \), which is the optimal objective value for$$\begin{aligned} \mathop {{{\mathrm{minimize}}}}\limits _u\quad g(w_*+Vu). \end{aligned}$$(41)Since \(\lambda \) is fixed to \(\lambda =\hat{\lambda }\), by the definition of \(w_*\) we see that it suffices to find the largest \(\mu \) for which$$\begin{aligned}&\underset{\mu ,\lambda \in {\mathbb {R}}}{\mathrm{maximize}}\quad \mu \\&\mathrm{subject to}\quad \lambda \ge 0,\quad M(\lambda ,\mu )=\left[ \begin{array}{cc} \lambda \beta \mu &{}\quad (a+\lambda b)^{\top } \\ a+\lambda b &{}\quad A+\lambda B \end{array} \right] \succeq 0 \end{aligned}$$We can rewrite this as$$\begin{aligned} \left[ \begin{array}{cc} \hat{\lambda }\beta \mu &{}\quad ((A+\hat{\lambda }B)w_*)^\top \\ (A+\hat{\lambda }B)w_* &{}\quad A+\hat{\lambda } B \end{array} \right] \succeq 0. \end{aligned}$$so it follows that the desired value of \(\mu \) is$$\begin{aligned}&\left[ \begin{array}{cc} 1 &{}\quad w_*^\top \\ 0 &{}\quad A+\hat{\lambda }B \end{array} \right] \left[ \begin{array}{cc} \hat{\lambda }\beta \mu &{}\quad ((A+\hat{\lambda }B)w_*)^\top \\ (A+\hat{\lambda }B)w_* &{}\quad A+\hat{\lambda } B \end{array} \right] \left[ \begin{array}{cc} 1 &{}\quad 0\\ w_*\quad &{} A+\hat{\lambda }B \end{array} \right] \\&\quad = \left[ \begin{array}{cc} \hat{\lambda }\beta \mu  w_*^\top (A+\hat{\lambda }B)w_* &{}\quad 0\\ 0 &{}\quad (A+\hat{\lambda }B)^3 \end{array} \right] \succeq 0, \end{aligned}$$$$\begin{aligned} \mu =\hat{\lambda }\beta w_*^\top (A+\hat{\lambda }B)w_*. \end{aligned}$$(42)
 5.When \(V^\top BV=O.\) By (39), this is the case where \(q_1=0\) and \(q_2>0.\) We proceed as above until (40). The goal is to find u such that \(g(w_*+Vu)=0\). In this case,so \(g(w_*+Vu)\) is constant if and only if \((Bw_*+b)^\top V=0.\) If \((Bw_*+b)^\top V\ne 0\), there exists \(u_0\) such that \(g(w_*+Vu_0)=0\), which means that the global solution is \(x_*=w_*+Vu_0\). Otherwise, when \((Bw_*+b)^\top V=0\), we are unable to find u such that \(g(w_*+Vu)=0\) unless \(g(w_*)=0\). This means we are in the unattainable case, and the optimal value is as in (42).$$\begin{aligned} g(w_*+Vu)&= g(w_*)+2(Bw_*+b)^\top Vu + u^\top (V^\top BV)u\\&=g(w_*)+2(Bw_*+b)^\top Vu, \end{aligned}$$
If \(\hat{\lambda }=0\), we either have \(\lambda _*=\hat{\lambda }=0\) or \(\lambda _*>0\); the latter case (in which QCQP is definite feasible) occurs if and only if \(V^\top B V\succ 0\), because if \(V^\top B V\succeq 0\) has a zero eigenvalue, then by (38) a zero eigenvalue of \(V^\top B V\) implies the existence of \(J(\lambda ;\theta )\), which means \(D\) is a point. If \(V^\top B V\) is not positive definite, we must have \(\lambda _*=\hat{\lambda }=0\). We then compute \(w_*\) such that (40) holds, and solve (41), or more precisely a feasibility problem of finding u such that \(g(w_*+Vu)\le 0\). In fact, any \(w_*+Vu\) such that \(g(w_*+Vu)\le 0\) satisfies (8) and is therefore a global solution; recall from the complementarity condition in (8) that when \(\lambda _*=0\) it is not necessary to satisfy \(g(x_*)=0\). Such u trivially exists if \(V^\top BV\) has a negative eigenvalue. If \(V^\top BV\succeq 0\) and \(\det (V^\top BV)=0\) (i.e., \(J(\lambda ;0)\) exists) then it could be that \(\min _ug(w_*+Vu)>0\); then by Lemma 2 this corresponds to the unattainable case, with infimum value \(\mu =w_*^\top Aw_*\) as in (42).
 1.
For any bounded QCQP, it returns the optimal (or infimum) objective value, along with its corresponding solution x if it is attainable.
 2.
If the QCQP is unbounded, it reports unboundedness.
 3.
If the QCQP is infeasible, it reports infeasibility.
6 Conclusion and discussion
We introduced an algorithm for QCQP with one constraint, which for generic (i.e., definite feasible QCQP for which \(\hat{\lambda }\) is known) QCQP requires computing just one eigenpair of a generalized eigenvalue problem. The algorithm is both faster and more accurate than the SDPbased approach, and can directly take advantage of the matrix sparsity structure if present.
For QCQP that are not definite feasible, for which SDPbased methods also face difficulty, we have classified the possible canonical forms under congruence of the pair (A, B), and described an algorithm (though more expensive than Algorithm 3.2) that completely solves the QCQP.
We close with remarks on future directions. First, a recent paper [34] describes an eigenvaluebased algorithm for TRS with an additional linear constraint, and a natural direction is to examine such an extension for QCQP. Second, since our algorithm essentially also solves the SDP (3), it is worth examining the class of SDP problems that can be solved similarly by an eigenvalue problem. Also of interest would be to deal with Riemannian optimization, such as minimization of \(\text{ trace }(X^{\top }AX+C^{\top }X)\) over \(X\in {\mathbb {R}}^{n\times k}\) subject to the orthogonality constraint \(X^\top X=I_k\).
Footnotes
Notes
Acknowledgements
We thank Satoru Iwata and Akiko Takeda for comments on an early draft, and Françoise Tisseur for a fruitful discussion on detecting definite matrix pairs and sharing with us the MATLAB code for [15]. We gratefully acknowledge the referees for their constructive comments and suggestions.
References
 1.Adachi, S., Iwata, S., Nakatsukasa, Y., Takeda, A.: Solving the trustregion subproblem by a generalized eigenvalue problem. SIAM J. Optim. 27(1), 269–291 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)CrossRefzbMATHGoogle Scholar
 3.BenTal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM, Philadelphia (2001)CrossRefzbMATHGoogle Scholar
 4.Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
 5.Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)CrossRefzbMATHGoogle Scholar
 6.Crawford, C.R., Moon, Y.S.: Finding a positive definite linear combination of two Hermitian matrices. Linear Algebra Appl. 51, 37–48 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Demmel, J., Kågström, B.: The generalized schur decomposition of an arbitrary pencil \(A \lambda B\): robust software with error bounds and applications. Part I: theory and algorithms. ACM Trans. Math. Soft. 19(2), 160–174 (1993)CrossRefGoogle Scholar
 8.Fehmers, G.C., Kamp, L.P.J., Sluijter, F.W.: An algorithm for quadratic optimization with one quadratic constraint and bounds on the variables. Inverse Probl. 14(4), 893 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Feng, J.M., Lin, G.X., Sheu, R.L., Xia, Y.: Duality and solutions for quadratic programming over single nonhomogeneous quadratic constraint. J. Glob. Optim. 54(2), 275–293 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Fortin, C., Wolkowicz, H.: The trust region subproblem and semidefinite programming. Optim. Methods Softw. 19(1), 41–67 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Gander, W., Golub, G.H., von Matt, U.: A constrained eigenvalue problem. Linear Algebra Appl. 114, 815–839 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Golub, G.H., Loan, C.F.V.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2012)zbMATHGoogle Scholar
 13.Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Solving the trustregion subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 14.Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)
 15.Guo, C.H., Higham, N.J., Tisseur, F.: An improved arc algorithm for detecting definite Hermitian pairs. SIAM J. Matrix Anal. Appl. 31(3), 1131–1151 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Hsia, Y., Lin, G.X., Sheu, R.L.: A revisit to quadratic programming with one inequality quadratic constraint via matrix pencil. Pac. J. Optim. 10(3), 461–481 (2014)MathSciNetzbMATHGoogle Scholar
 17.Iwata, S., Nakatsukasa, Y., Takeda, A.: Global optimization methods for extended Fisher discriminant analysis. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 411–419 (2014)Google Scholar
 18.Jegelka, S.: Private communication (2015)Google Scholar
 19.Jiang, R., Li, D., Wu, B.: SOCP reformulation for the generalized trust region subproblem via a canonical form of two symmetric matrices. Math. Prog. (2017). https://doi.org/10.1007/s1010701711454
 20.Johansson, S., Johansson, P.: StratiGraph and MCS Toolbox Homepage. Department of Computing Science, Umeå University, Sweden. http://www.cs.umu.se/english/research/groups/matrixcomputations/stratigraph (2016)
 21.Lancaster, P., Rodman, L.: Canonical forms for Hermitian matrix pairs under strict equivalence and congruence. SIAM Rev. 47(3), 407–443 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solution of LargeScale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, vol. 6. SIAM, Philadelphia (1998)CrossRefzbMATHGoogle Scholar
 23.Mackey, D.S., Mackey, N., Mehl, C., Mehrmann, V.: Möbius transformations of matrix polynomials. Linear Algebra Appl. 470, 120–184 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Moré, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2(3–4), 189–209 (1993)CrossRefGoogle Scholar
 25.Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
 26.Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (1999)CrossRefzbMATHGoogle Scholar
 27.Pólik, I., Terlaky, T.: A survey of the slemma. SIAM Rev. 49(3), 371–418 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Pong, T.K., Wolkowicz, H.: The generalized trust region subproblem. Comput. Optim. Appl. 58(2), 273–322 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 29.Rendl, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 77(1), 273–299 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 30.Rojas, M., Santos, S.A., Sorensen, D.C.: A new matrixfree algorithm for the largescale trustregion subproblem. SIAM J. Optim. 11(3), 611–646 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
 31.Rojas, M., Santos, S.A., Sorensen, D.C.: Algorithm 873: LSTRS: MATLAB software for largescale trustregion subproblems and regularization. ACM Trans. Math. Soft. 34(2), 11:1–11:28 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 32.Sahni, S.: Computationally related problems. SIAM J. Comput. 3(4), 262–279 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
 33.Sakaue, S., Nakatsukasa, Y., Takeda, A., Iwata, S.: Solving generalized CDT problems via twoparameter eigenvalues. SIAM J. Optim. 26(3), 1669–1694 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 34.Salahi, M., Taati, A., Wolkowicz, H.: Local nonglobal minima for solving largescale extended trustregion subproblems. Comput. Optim. Appl. 66, 223–244 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Sorensen, D.C.: Minimization of a largescale quadratic function subject to a spherical constraint. SIAM J. Optim. 7(1), 141–161 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 36.Sturm, J.F.: Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 37.Thompson, R.C.: The characteristic polynomial of a principal subpencil of a hermitian matrix pencil. Linear Algebra Appl. 14(2), 135–177 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
 38.Thompson, R.C.: Pencils of complex and real symmetric and skew matrices. Linear Algebra Appl. 147, 323–371 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
 39.Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3 Matlab software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Vavasis, S.A.: Quadratic programming is in NP. Inf. Process. Lett. 36(2), 73–77 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.