Advertisement

A fast algorithm for globally solving Tikhonov regularized total least squares problem

  • Yong Xia
  • Longfei Wang
  • Meijia YangEmail author
Article
  • 67 Downloads

Abstract

The total least squares problem with the general Tikhonov regularization can be reformulated as a one-dimensional parametric minimization problem (PM), where each parameterized function evaluation corresponds to solving an n-dimensional trust region subproblem. Under a mild assumption, the parametric function is differentiable and then an efficient bisection method has been proposed for solving (PM) in literature. In the first part of this paper, we show that the bisection algorithm can be greatly improved by reducing the initially estimated interval covering the optimal parameter. It is observed that the bisection method cannot guarantee to find the globally optimal solution since the nonconvex (PM) could have a local non-global minimizer. The main contribution of this paper is to propose an efficient branch-and-bound algorithm for globally solving (PM), based on a new underestimation of the parametric function over any given interval using only the information of the parametric function evaluations at the two endpoints. We can show that the new algorithm (BTD Algorithm) returns a global \(\epsilon \)-approximation solution in a computational effort of at most \(O(n^3/\sqrt{\epsilon })\) under the same assumption as in the bisection method. The numerical results demonstrate that our new global optimization algorithm performs even much faster than the improved version of the bisection heuristic algorithm.

Keywords

Total least squares Tikhonov regularization Trust region subproblem Fractional program Lower bound Branch and bound 

Mathematics Subject Classification

65F20 90C26 90C32 90C20 

1 Introduction

In order to handle the overestimated linear equations \(Ax\approx b\) with the noised data matrix \(A\in \mathbb {R}^{m\times n}\) and the noised observed vector \(b\in \mathbb {R}^m\), the total least squares (TLS) approach was firstly proposed in [9] by solving the following optimization problem:
$$\begin{aligned} \min _{E\in \mathbb {R}^{m\times n},r\in \mathbb {R}^{m},x\in \mathbb {R}^{n}} \left\{ \Vert E\Vert _{\mathrm{F}}^2+\Vert r\Vert ^2:~ (A+E)x=b+r\right\} , \end{aligned}$$
(1)
where \(\Vert \cdot \Vert _{\mathrm{F}}\) and \(\Vert \cdot \Vert \) denote the Frobenius norm and the Euclidean norm, respectively, E and r are the perturbations. For more details, we refer to [10, 23, 24] and references therein. Let \((E^*,r^*,x^*)\) be an optimal solution to the above minimization problem (1). It can be verified that \(E^*\) and \(r^*\) have a closed-form expression in terms of \(x^*\) as the problem (1) is a linear-equality constrained convex quadratic program with respect to E and r. Therefore, by eliminating E and r from (1), we obtain the following equivalent quadratic fractional program:
$$\begin{aligned} \min _{x\in \mathbb {R}^n}\frac{\Vert Ax-b\Vert ^2}{\Vert x\Vert ^2+1}, \end{aligned}$$
(2)
which can be easily solved by finding the smallest singular value and the corresponding vector of the augmented matrix [A b] if it has a full column rank, see [9, 24].
For the ill-conditioned (TLS) problem, Tikhonov regularization [22] is an efficient way to stabilize the solution by appending a quadratic penalty to the objective function:
$$\begin{aligned} \min _{E\in \mathbb {R}^{m\times n},r\in \mathbb {R}^{m},x\in \mathbb {R}^{n}} \left\{ \Vert E\Vert _{\mathrm{F}}^2+\Vert r\Vert ^2+\rho \Vert Lx\Vert ^2:~ (A+E)x=b+r \right\} , \end{aligned}$$
(3)
where \(\rho > 0\) is the penalty parameter, and \(L\in \mathbb {R}^{k\times n}\) \((k\le n)\) is the particularly chosen regularization matrix of full row-rank. It is worth noting that the model (3) also works for the underestimated linear system \(Ax=b\). Similar to (1), (2), by eliminating the variables E and r, we can recast (3) as the following optimization problem with respect to x [1, 14]:
$$\begin{aligned} (\mathrm{P})~~\min _{x\in \mathbb {R}^n}\frac{\Vert Ax-b\Vert ^2}{\Vert x\Vert ^2+1}+\rho \Vert Lx\Vert ^2. \end{aligned}$$
The objective function in (P) is non-convex and has local non-global minimizers. Consequently, it is difficult to solve (P) to the global optimality.
Let \(F\in \mathbb {R}^{n\times (n-k)}\) be a matrix whose columns form an orthogonal basis of the null space of L. Throughout this paper, we make the following assumption, which was firstly presented in [1]:
$$\begin{aligned} \mathrm{either}~k=n~\mathrm{or}~\lambda _{\min }\left[ \begin{array}{c@{\quad }c}F^TA^TAF&{} F^TA^Tb\\ b^TAF&{} \Vert b\Vert ^2\end{array}\right] <\lambda _{\min }\left( F^TA^TAF\right) , \end{aligned}$$
(4)
where \(\lambda _{\min }(\cdot )\) is the minimal eigenvalue of \((\cdot )\). As shown in [1], it is a sufficient condition under which the minimum of (P) is attained. The assumption (4) is also essential in an extended version of (P), see [3].
It is not difficult to verify that (P) can be equivalently rewritten as the following one-dimensional parametric optimization problem [1]:
$$\begin{aligned} (\mathrm{PM})~~\min _{\alpha \ge 1} \left\{ \mathcal {G}(\alpha ):=\min _{\Vert x\Vert ^2=\alpha -1}~\left\{ \frac{\Vert Ax-b\Vert ^2}{\alpha }+\rho \Vert Lx\Vert ^2 \right\} \right\} , \end{aligned}$$
(5)
where evaluating the function value \(\mathcal {G}(\alpha )\) corresponds to solving an equality version of the trust-region subproblem (TRS) [4, 8, 16]. It is shown in [1] that \(\mathcal {G}(\alpha )\) is continuous. Under a mild condition, it is also differentiable. Then, a bisection method is suggested in [1] to solve the equation \(\mathcal {G}'(\alpha )=0\) based on solving a sequence of (TRS), denoted by Algorithm TRTLSG. It converges to the global minimizer if the function \(\mathcal {G}(\alpha )\) is unimodal, which is true when \(L=I\). For recent progress on this special identical case, we referee to [26]. Since there are exceptional examples [1] to show that \(\mathcal {G}(\alpha )\) is not always unimodal, Algorithm TRTLSG remains a heuristic algorithm as it does not guarantee the convergence to the global minimizer of (P).

Let \(x^*\) be a globally optimal solution to (P). Then \(\alpha ^*=\Vert x^*\Vert ^2+1\) is an optimal solution of \((\mathrm PM)\). Algorithm TRTLSG starts from an initial interval covering \(\alpha ^*\), denoted by \([\alpha _{\min },\alpha _{\max }]\). As in [1], \(\alpha _{\min }\) is trivially set as \(1\,+\,\epsilon _1\), where \(\epsilon _1>0\) is a tolerance parameter. \(\alpha _{\max }\) is chosen in a closed form based on a tedious derivation of the upper bound of \(\Vert x^*\Vert \) under Assumption (4). Notice that the computational cost of Algorithm TRTLSG is proportional to \(\log (\alpha _{\max }-\,\alpha _{\min })\), where \(\alpha _{\max }-\alpha _{\min }\) is the length of the initial interval. Thus, in the first part of this paper, we try to improve the lower and upper estimations of \(\alpha ^*\). More precisely, we firstly establish a new closed-form upper bound of \(\alpha ^*\), which greatly improves the quality of the current estimation at the same computational cost. Secondly, a new lower bound of \(\alpha ^*\) is derived in place of the trivial setting \(\alpha _{\min }=1\,+\,\epsilon _1\). With the new setting of \(\alpha _{\min }\) and \(\alpha _{\max }\), the efficiency of Algorithm TRTLSG is greatly improved at least for all our tested numerical results.

The main contribution of this paper is to propose a two-layer dual approach for underestimating \(\mathcal {G}(\alpha )\) over any given interval, without additional computational cost except for evaluating \(\mathcal {G}(\alpha )\) at the two endpoints of the interval. With this high-quality underestimation, we develop an efficient branch-and-bound algorithm to solve the one-dimensional parametric reformulation \((\mathrm PM)\) (5). Our new algorithm guarantees to find a global \(\epsilon \)-approximation solution of \((\mathrm PM)\) in at most \(O(1/\epsilon )\) iterations and the computational effort in each iteration is \(O(n^3\log (1/\epsilon ))\). Under the additional assumption to make \(\mathcal {G}(\alpha )\) be differentiable, the number of iterations can be further reduced to \(O(1/\sqrt{\epsilon })\). Numerical results demonstrate that, in most cases, our new global optimization algorithm is much faster than the improved version of the heuristic Algorithm TRTLSG.

The remainder of the paper is organized as follows. In Sect. 2, we present some preliminaries and the bisection heuristic Algorithm TRTLSG. In Sect. 3, we establish new lower and upper bounds on the norm of any optimal solution of (P), with which the computation cost of Algorithm TRTLSG greatly decreases. In Sect. 4, we propose a new underestimation and then use it to develop an efficient branch-and-bound algorithm. The worst-case computational complexity is also analyzed. Numerical comparisons among the above two algorithms are reported in Sect. 5. Concluding remarks are made in Sect. 6.

Throughout the paper, the notation “:=” denotes “define”. \(v(\cdot )\) denotes the optimal objective value of the problem \((\cdot )\). I is the identity matrix. The matrix \(A\succ (\succeq )0\) stands for that A is positive (semi-)definite. The inner product of two matrices A and B are tr(\(AB^T\)). \(\mathrm{Range}(A)=\{Ax: x\in \mathbb {R}^n\}\) is the range space of A. The one-dimensional intervals \(\{x: a< x < b\}\) and \(\{x: a \le x \le b\}\) are denoted by (ab) and [ab], respectively. \(\lceil (\cdot )\rceil \) is the smallest integer larger than or equal to \((\cdot )\).

2 The bisection algorithm

In this section, we present the bisection algorithm, denoted by Algorithm TRTLSG in [1]. To begin with, we firstly list some preliminary results of (P) and \(\mathcal {G}(\alpha )\) defined in (5).

Theorem 1

([1]) Under Assumption (4) and \(k<n\), we have
$$\begin{aligned} v(\mathrm{P})\le \lambda _{\min }\left[ \begin{array}{c@{\quad }c}F^TA^TAF&{} F^TA^Tb\\ b^TAF&{} \Vert b\Vert ^2\end{array}\right] \end{aligned}$$
(6)
and the minimum of (P) is attained.

Theorem 2

([1]) Let \(x^*\) be an optimal solution of (P). If \(k=n\), we have
$$\begin{aligned} \Vert x^*\Vert ^2 \le \frac{\Vert b\Vert ^2}{\rho \cdot \lambda _{\min }\left( LL^T\right) }. \end{aligned}$$
Otherwise, if \(k<n\), under Assumption (4), it holds that
$$\begin{aligned} \Vert x^*\Vert ^2 \le \max \bigg \{ 1, \frac{\Vert b\Vert ^2+ \left( \lambda _{\max }(A^TA)+\Vert A^{T}b\Vert \right) (\delta +2\sqrt{\delta })+l_1(1+\delta )}{l_1-l_2} \bigg \}^2+\delta , \end{aligned}$$
(7)
where \(\lambda _{\max }(\cdot )\) is the maximal eigenvalue of \((\cdot )\), and
$$\begin{aligned} l_1= & {} \lambda _{\min }\left( F^TA^TAF \right) , \end{aligned}$$
(8)
$$\begin{aligned} l_2= & {} \lambda _{\min }\left[ \begin{array}{c@{\quad }c}F^TA^TAF&{} F^TA^Tb\\ b^TAF&{} \Vert b\Vert ^2\end{array}\right] ,\nonumber \\ \delta= & {} \frac{l_2}{\rho \cdot \lambda _{\min }\left( LL^T\right) }. \end{aligned}$$
(9)
Define
$$\begin{aligned} Q_{\alpha }:=\frac{1}{\alpha }A^{T}A+\rho L^{T}L,~~f_{\alpha }:=\frac{1}{\alpha }A^{T}b. \end{aligned}$$
(10)
We reformulate \(\mathcal {G}(\alpha )\) (5) as
$$\begin{aligned} \mathcal {G}(\alpha ) = \min _{\Vert x\Vert ^2=\alpha -1} \left\{ x^{T}Q_{\alpha }x-2f_{\alpha }^{T}x+\frac{\Vert b\Vert ^2}{\alpha } \right\} , \end{aligned}$$
(11)
which is an equality version of the trust region subproblem (TRS) [4, 8, 16]. Though it is a non-convex optimization problem, there is a necessary and sufficient condition to characterize any globally optimal solution of (TRS) (11). It means that (TRS) enjoys the hidden convexity.

Theorem 3

([6, 8, 16]) For any \(\alpha >1\), \(x(\alpha )\) is an optimal solution of (11) if and only if there exists \(\lambda (\alpha )\in \mathbb {R}\) such that
$$\begin{aligned}&(Q_{\alpha }-\lambda (\alpha )I)x(\alpha )=f_{\alpha }, \end{aligned}$$
(12)
$$\begin{aligned}&\Vert x(\alpha )\Vert ^2= \alpha -1, \end{aligned}$$
(13)
$$\begin{aligned}&Q_{\alpha }-\lambda (\alpha )I \succeq 0. \end{aligned}$$
(14)

Corollary 1

For any \(\alpha >1\), suppose
$$\begin{aligned} f_{\alpha } \notin \mathrm{Null}(Q_{\alpha }-\lambda _{\min }(Q_{\alpha })I)^\bot , \end{aligned}$$
(15)
then the KKT conditions (12)–(14) has a unique solution (\(x(\alpha )\), \(\lambda (\alpha )\)).

Theorem 3 supported many algorithms for solving (TRS), see, for example, [4, 6, 16, 18, 19, 21]. In this paper, for the tested medium-scale problems, we apply the solution approach based on the complete spectral decomposition [7].

Theorem 4

([1]) \(\mathcal {G}(\alpha )\) is continuous over \([1,+\infty )\).

Theorem 5

([1]) Suppose that assumption (15) holds for all \(\alpha >1\), then \(\mathcal {G}(\alpha )\) is differentiable of any order. Moreover, the first derivative is given by
$$\begin{aligned} \mathcal {G}'(\alpha )=\lambda (\alpha )-\frac{\Vert Ax(\alpha )-b\Vert ^2}{\alpha ^2}, \end{aligned}$$
where (\(x(\alpha )\), \(\lambda (\alpha )\)) is the unique solution of the KKT conditions (12)–(14).
Based on Theorems 2, 3 and 5, applying the simple bisection method to solve \(\mathcal {G}'(\alpha )=0\) yields Algorithm TRTLSG proposed in [1].

If the function \(\mathcal {G}(\alpha )\) is unimodal, Algorithm TRTLSG converges to the global minimizer of (P). It is proved to be true when \(L=I\) [1]. In general, this is not true. A counterexample (with \(m = n =4\), \(k = 3\)) is plotted in [1] to show that \(\mathcal {G}(\alpha )\) is not always unimodal. Thus, Algorithm TRTLSG could return a local non-global minimizer of (P).

3 Bounds on the norm of any globally optimal solution

In this section, we establish new lower and upper bounds on the norm of any globally optimal solution of (P). They help to greatly improve the efficiency of Algorithm TRTLSG.

3.1 A new lower bound

To our best knowledge, there is no nontrivial lower bound on the norm of any globally optimal solution of (P) except for the trivial setting \(1+\epsilon _1\) in [1]. In this subsection, in order to derive such a new lower bound, we firstly need a technical lemma.

Lemma 1

Under Assumption (4), for any \(\mu >0\), we have
$$\begin{aligned} A^TA+\mu L^TL\succ 0. \end{aligned}$$
(16)

Proof

It is sufficient to consider the nontrivial case \(k<n\), as the other case \(k=n\) implies that \(L^TL\succ 0\) and hence (16) holds true. Then, according to Assumption (4), we have
$$\begin{aligned} \lambda _{\min }\left( F^TA^TAF\right) >\lambda _{\min }\left[ \begin{array}{c@{\quad }c}F^TA^TAF&{} F^TA^Tb\\ b^TAF&{} \Vert b\Vert ^2\end{array}\right] =\lambda _{\min }\left( [AF~b]^T [AF~b]\right) \ge 0. \end{aligned}$$
It follows that \(F^TA^TAF\succ 0\), that is, \(y^TF^TA^TAFy>0\) for all \(y\ne 0\). Since \(\{Fy:y\in \mathbb {R}^{n-k},y\ne 0\}=\{x\ne 0:Lx=0\}\), we have \(x^TA^TAx>0\) for all \(Lx=0\) and \(x\ne 0\). This implies that \(\mathrm{Null}(L)\cap \mathrm{Null}(A)= \{0\}\), which means that for any \(x \ne 0\), either \(x^TA^TAx>0\) or \(x^TL^TLx>0\). Thus, \(x^T(A^TA+\mu L^TL)x>0\) for all \(x \ne 0\), implying that \(A^TA+\mu L^TL \succ 0\). The proof is complete. \(\square \)

Theorem 6

Suppose \(A^Tb\ne 0\). Let \(x^*\) be an optimal solution of (P). Define
$$\begin{aligned} \kappa _1= \left\{ \begin{array}{l@{\quad }l} \Vert b\Vert ^2-b^TA(A^{T}A+ \rho L^{T}L)^{-1}A^{T}b, &{}\mathrm{if}~k=n,\\ \min \left\{ l_2,\Vert b\Vert ^2-b^TA(A^{T}A+ \rho L^{T}L)^{-1}A^{T}b\right\} , &{}\mathrm{if}~k<n,\end{array}\right. \end{aligned}$$
where \(l_2\) is defined in (9). Let \(\kappa _2=\lambda _{\min }(A^{T}A+ \rho L^{T}L)-\kappa _1\). Then, \(\kappa _1<\Vert b\Vert ^2\) and
$$\begin{aligned} \Vert x^*\Vert \ge \left\{ \begin{array}{l@{\quad }l}\frac{\Vert b\Vert ^2-\kappa _1}{2\Vert A^{T}b\Vert }, &{}\mathrm{if}~ \kappa _2=0,\\ \frac{\Vert A^{T}b\Vert -\sqrt{\Vert A^{T}b\Vert ^2-\kappa _2 (\Vert b\Vert ^2-\kappa _1)}}{\kappa _2}, &{}\mathrm{otherwise}.\end{array}\right. \end{aligned}$$
(17)

Proof

Since
$$\begin{aligned} \dfrac{\Vert Ax-b\Vert ^2}{\Vert x\Vert ^2+1}+\rho \Vert Lx\Vert ^2 \le J(x):= \Vert Ax-b\Vert ^2 + \rho \Vert Lx\Vert ^2, \end{aligned}$$
we have
$$\begin{aligned} v(\mathrm{P})\le \min _{x\in \mathbb {R}^n} J(x). \end{aligned}$$
By Lemma 1, J(x) has a unique minimizer \(x^*=(A^{T}A+ \rho L^{T}L)^{-1}A^{T}b\). Since \(A^{T}b \ne 0\), we have \(x^*\ne 0\) and thus it holds that
$$\begin{aligned} J(x^*)=\Vert b\Vert ^2-b^TA(A^{T}A+ \rho L^{T}L)^{-1}A^{T}b<J(0)=\Vert b\Vert ^2. \end{aligned}$$
We obtain that \(\kappa _1<\Vert b\Vert ^2\).
According to Theorem 1 and the definitions of \(\kappa _1\), \(l_2\) and \(x^*\), we have
$$\begin{aligned} \kappa _1\ge v(\mathrm{P})\ge & {} \frac{\Vert Ax^*-b\Vert ^2 + \rho \Vert Lx^*\Vert ^2}{\Vert x^*\Vert ^2+1}\\\ge & {} \frac{\lambda _{\min }(A^{T}A+ \rho L^{T}L)\Vert x^*\Vert ^2-2b^TAx^*+\Vert b\Vert ^2}{\Vert x^*\Vert ^2+1}\\\ge & {} \frac{\lambda _{\min }(A^{T}A+ \rho L^{T}L)\Vert x^*\Vert ^2-2\Vert A^{T}b\Vert \Vert x^*\Vert +\Vert b\Vert ^2}{\Vert x^*\Vert ^2+1}, \end{aligned}$$
where the last inequality follows from Cauchy-Schwartz inequality. Therefore, we obtain
$$\begin{aligned} \left( \lambda _{\min }(A^{T}A+ \rho L^{T}L)-\kappa _1\right) \Vert x^*\Vert ^2 - 2\Vert A^{T}b\Vert \Vert x^*\Vert + \Vert b\Vert ^2-\kappa _1 \le 0. \end{aligned}$$
(18)
Solving the quadratic inequality (18) with respect to \(\Vert x^*\Vert \) gives the lower bound on \(\Vert x^*\Vert \) (17). The proof is complete. \(\square \)
Suppose \(A^{T}b = 0\), (P) is reduced to
$$\begin{aligned} \min _{x\in \mathbb {R}^n}\frac{x^TA^TAx+\Vert b\Vert ^2}{\Vert x\Vert ^2+1}+\rho \Vert Lx\Vert ^2, \end{aligned}$$
(19)
If \(b=0\), since the objective function (19) is nonnegative, we can see that \(x^*=0\) is an optimal solution of (P). For this case, the initial setting \(\alpha _{\min }= 1+\epsilon _1\) in Algorithm TRTLSG [1] is overestimated.

Finally, we assume \(A^{T}b = 0\) and \(b\ne 0\). The relation between the initial setting of \(\alpha _{\min }\) and the quality of the approximation minimizer of \(\mathcal {G}(\alpha )\) over \(\{1\}\cup [\alpha _{\min },\alpha _{\max }]\) is established as follows.

Proposition 1

Suppose \(A^{T}b = 0\) and \(b\ne 0\), for any \(\epsilon \ge 0\),
$$\begin{aligned} \min \left\{ \mathcal {G}(1),~\min _{ \alpha \ge \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha )\right\} \le v(\mathrm{P})+\epsilon . \end{aligned}$$

Proof

According to the definition (5), we have
$$\begin{aligned} \min _{1\le \alpha \le \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha )\ge & {} \min _x \frac{x^TA^TAx+\Vert b\Vert ^2}{ \Vert b\Vert ^2/(\Vert b\Vert ^2- \epsilon )}+\rho \Vert Lx\Vert ^2 \\\ge & {} \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2/(\Vert b\Vert ^2- \epsilon )} = \Vert b\Vert ^2- \epsilon . \end{aligned}$$
Since \(\mathcal {G}(1)=\Vert b\Vert ^2\), we have
$$\begin{aligned} v(\mathrm{P})= & {} \min \left\{ \min _{1\le \alpha \le \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha ),~\min _{ \alpha \ge \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha )\right\} \\\ge & {} \min \left\{ \mathcal {G}(1)-\epsilon ,~\min _{ \alpha \ge \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha )\right\} \ge \min \left\{ \mathcal {G}(1),~\min _{ \alpha \ge \frac{\Vert b\Vert ^2}{\Vert b\Vert ^2- \epsilon }} \mathcal {G}(\alpha )\right\} -\epsilon . \end{aligned}$$
\(\square \)

3.2 New upper bounds

In this subsection, we propose two improved upper bounds on the norm of any optimal solution of (P), one of which has the same computational cost as the upper bound given in Theorem 2.

Let \(x^*\) be any globally optimal solution of (P). Consider the nontrivial case \(k<n\). Though the derivation of the upper bound (7) given in Theorem 2 is rather tedious, it is basically based on the two inequalities
$$\begin{aligned}&\Vert Ax^*-b\Vert ^2\le l_2(\Vert x^*\Vert ^2+1), \end{aligned}$$
(20)
$$\begin{aligned}&\rho \Vert Lx^*\Vert ^2 \le l_2, \end{aligned}$$
(21)
which follow from (6) in Theorem 1. Thus, a tighter upper bound is given by
$$\begin{aligned} \max _{(20),(21)}~\Vert x^*\Vert ^2. \end{aligned}$$
(22)
It leads to an inhomogeneous quadratic constrained quadratic program and still hard to solve. We further relax (22) to its Lagrangian dual problem, which can be rewritten as the following semidefinite program (SDP):
$$\begin{aligned} (\mathrm{SDP})~~&\min&~ t \\&\mathrm{s.t.}&\mu _1 B_2+\mu _2 B_3-B_1\succeq 0,\\&\mu _1 \ge 0, ~\mu _2 \ge 0, \end{aligned}$$
where
$$\begin{aligned} B_1=\left( \begin{matrix}I&{}\quad 0\\ 0&{}\quad -t \end{matrix}\right) ,~~ B_2=\left( \begin{matrix} A^{T}A-l_2 I&{}\quad -A^{T}b \\ -b^{T}A &{}\quad b^{T}b- l_2 \end{matrix}\right) ,~~ B_3=\left( \begin{matrix} \rho L^{T}L &{}\quad 0 \\ 0 &{}\quad -l_2 \end{matrix}\right) . \end{aligned}$$
v(SDP) gives a new upper bound of \(\Vert x^*\Vert ^2\). If strong duality holds for (22), then the new bound v(SDP) is always not weaker than (7). But the computation of (SDP) is much more time-consuming than that of (7).

In the following, we propose a new upper bound of \(\Vert x^*\Vert ^2\) with the same computational effort as (7). The basic idea is directly following the original inequality (6) rather than (20)-(21).

Theorem 7

Let \(\beta =2\lambda _{\max }(A^{T}A)\), \(\gamma =2 \Vert A^{T}b\Vert \), \(\zeta = \rho \lambda _{\min }(L L^{T})\). We have
$$\begin{aligned} \Vert x^*\Vert ^2\le & {} -\frac{1}{2} +\frac{l_2}{2\zeta } +\frac{\sqrt{ (\zeta -l_2)^2+\beta ^2+4\zeta l_2+\frac{\gamma ^2}{l_1-l_2}\zeta } }{2\zeta } \nonumber \\&+ \left( \frac{\gamma +\sqrt{\gamma ^2 +(l_1-l_2)(4l_2+\frac{\beta ^2}{\zeta } +\frac{(\zeta -l_2)^2}{\zeta }) } }{2(l_1-l_2)} \right) ^2. \end{aligned}$$
(23)

Proof

\(x^*\) has the following decomposition
$$\begin{aligned} x^*=L^Tw+Fv, \end{aligned}$$
(24)
where \(w\in \mathbb {R}^k\) and \(v\in \mathbb {R}^{n-k}\). Substituting (24) into (6) yields
$$\begin{aligned} \Vert AL^Tw+AFv-b\Vert ^2+\rho \Vert LL^Tw\Vert ^2(t_1+t_2+1)\le l_2(t_1+t_2+1), \end{aligned}$$
(25)
where \(t_1=\Vert L^Tw\Vert ^2\) and \(t_2=\Vert v\Vert ^2\). Since
$$\begin{aligned} \Vert AL^Tw+AFv-b\Vert ^2= & {} \Vert AFv\Vert ^2+2v^TF^TA^T(AL^Tw-b)+\Vert AL^Tw-b\Vert ^2\\\ge & {} l_1\Vert v\Vert ^2+2v^TF^TA^T(AL^Tw-b)\\\ge & {} l_1\Vert v\Vert ^2-2\Vert v\Vert \cdot \Vert A^TAL^Tw-A^Tb\Vert \\\ge & {} l_1\Vert v\Vert ^2-2\Vert v\Vert \left( \lambda _{\max }(A^TA)\Vert L^Tw\Vert +\Vert A^Tb\Vert \right) \\= & {} l_1t_2-\sqrt{t_2}\left( \beta \sqrt{t_1}+\gamma \right) , \end{aligned}$$
where the second inequality follows from Cauchy-Schwartz inequality, and
$$\begin{aligned} \rho \Vert LL^Tw\Vert ^2= & {} \rho w^T(LL^T)^{\frac{1}{2}}LL^T(LL^T)^{\frac{1}{2}}w \\\ge & {} \rho \lambda _{\min }(LL^T) w^T(LL^T)^{\frac{1}{2}}(LL^T)^{\frac{1}{2}}w\\= & {} \zeta w^TLL^Tw\\= & {} \zeta t_1, \end{aligned}$$
it follows from the inequality (25) that
$$\begin{aligned} l_1t_2-\sqrt{t_2}\left( \beta \sqrt{t_1}+\gamma \right) +\zeta t_1(t_1+t_2+1)\le l_2(t_1+t_2+1). \end{aligned}$$
Or equivalently, we have
$$\begin{aligned} \left[ \zeta t_1^2+(\zeta -l_2) t_1\right] +\left[ \zeta t_1t_2-\beta \sqrt{t_1t_2}\right] +\left[ (l_1-l_2)t_2-\gamma \sqrt{t_2}\right] -l_2\le 0. \end{aligned}$$
(26)
Notice that
$$\begin{aligned}&\zeta t_1^2+(\zeta -l_2) t_1\ge -\frac{(\zeta -l_2)^2}{4\zeta }, \end{aligned}$$
(27)
$$\begin{aligned}&\zeta t_1t_2-\beta \sqrt{t_1t_2}\ge -\frac{\beta ^2}{4\zeta }, \end{aligned}$$
(28)
$$\begin{aligned}&(l_1-l_2)t_2-\gamma \sqrt{t_2}\ge -\frac{\gamma ^2}{4(l_1-l_2)}. \end{aligned}$$
(29)
Substituting (28), (29) into (26), we obtain
$$\begin{aligned} t_1\le -\frac{1}{2} +\frac{l_2}{2\zeta } +\frac{\sqrt{ (\zeta -l_2)^2+\beta ^2+4\zeta l_2+\frac{\gamma ^2}{l_1-l_2}\zeta } }{2\zeta }. \end{aligned}$$
Similarly, substituting (27), (28) into (26), we have
$$\begin{aligned} \sqrt{t_2}\le \frac{\gamma +\sqrt{\gamma ^2 +(l_1-l_2)(4l_2+\frac{\beta ^2}{\zeta } +\frac{(\zeta -l_2)^2}{\zeta }) } }{2(l_1-l_2)}. \end{aligned}$$
The proof is complete as \(\Vert x^*\Vert ^2=t_1+t_2\). \(\square \)
In order to compare the existing upper bound (7) with the new bounds v(SDP) and (23), we do numerical experiments using the noise-free data of the first example presented in Sect. 5. The dimension n varies from 20 to 3000 and the regularization parameter \(\rho \) is simply fixed at 0.5. The computational environment is presented in Sect. 5. We report the numerical results in Table 1. It can be seen that v(SDP) gives the tightest upper bound with the highest computation cost. For each test instance, the new upper bound (23) is much tighter than the existing upper bound (7) in the same computational time. We can see that, for the instance of dimension 1000, Algorithm TRTLSG will save \(\log _2\left( \frac{1.97\times 10^{12}}{4.79\times 10^6}\right) \approx 15\) iterations if the new upper bound (23) is used to replace (7). From Columns 2–3 of Table 1, it is observed that the new lower bound (17), solved at a low computational cost, is much tighter than the trivial bound \(1+\epsilon _1=1.1\). For the instance of dimension 1000, replacing the trivial bound \(1+\epsilon _1\) with the new lower bound (17) will help Algorithm TRTLSG to save \(\log _2\left( \frac{1.64\times 10^{2}}{1.1}\right) \approx 7\) iterations.
Table 1

Computational time (in s) and the quality of the new lower bound (17), the upper bound (7) given in [1], the new upper bounds (23) and v(SDP), where \(aeb=a\times 10^b\)

n

New b.d. (17)

b.d. (7) in [1]

New b.d. (23)

v(SDP)

Time

\(\alpha _{\min }\)

Time

\(\alpha _{\max }\)

Time

\(\alpha _{\max }\)

Time

\(\alpha _{\max }\)

20

0.00

4.28

0.00

3.02e4

0.00

2.28e3

1.39

4.76e1

50

0.00

9.18

0.00

1.35e6

0.00

1.32e4

0.47

1.60e2

100

0.00

1.73e1

0.00

3.08e7

0.00

5.08e4

0.63

4.27e2

200

0.00

3.37e1

0.00

7.98e8

0.02

1.98e5

1.30

1.20e3

500

0.03

8.27e1

0.02

6.62e10

0.03

1.21e6

6.97

5.17e3

1000

0.09

1.64e2

0.09

1.97e12

0.09

4.79e6

25.75

1.66e4

1200

0.14

1.97e2

0.13

4.83e12

0.14

6.88e6

59.03

2.28e4

1500

0.22

2.46e2

0.20

1.45e13

0.20

1.07e7

102.81

3.37e4

1800

0.31

2.95e2

0.33

3.56e13

0.33

1.54e7

149.72

4.66e4

2000

0.39

3.28e2

0.41

6.00e13

0.41

1.90e7

199.24

5.63e4

2500

0.90

4.10e2

0.98

1.81e14

1.02

2.96e7

307.04

8.42e4

3000

2.42

4.92e2

2.44

4.46e14

2.36

4.26e7

470.16

1.17e5

4 Branch-and-bound algorithm based on a two-layer dual approach

In this section we firstly present a two-layer dual approach for underestimating \(\mathcal {G}(\alpha )\) (5) and then use it to develop an efficient branch-and-bound algorithm (Algorithm BB-TLD). The worst-case computational complexity is also analyzed.

4.1 A two-layer dual underestimation approach

The efficiency to solve (P) via (5) relies on an easy-to-compute and high-quality lower bound of \(\mathcal {G}(\alpha )\) (5) over any given interval \([\alpha _i,\alpha _{i+1}]\). The difficulty is that there seems to be no closed-form expression of \(\mathcal {G}(\alpha )\). In this subsection, we present a new approach for underestimating \(\mathcal {G}(\alpha )\).

For the sake of simplicity, let \(p(x):=\Vert Ax-b\Vert ^2\), \(g(x):=\Vert x\Vert ^2+1\), and \(h(x):=\rho \Vert Lx\Vert ^2\). Our goal is to find a lower bound of the problem:
$$\begin{aligned} \min _{\alpha \in [\alpha _i,\alpha _{i+1}]} \left\{ \mathcal {G}(\alpha )=\min _{g(x)=\alpha } \frac{p(x)}{\alpha }+h(x)\right\} \end{aligned}$$
(30)
using only the solutions of evaluating \(\mathcal {G}(\alpha )\) at the two endpoints \(\alpha _i\) and \(\alpha _{i+1}\), i.e., \((x(\alpha _i),\lambda (\alpha _i))\) and \((x(\alpha _{i+1}),\lambda (\alpha _{i+1}))\), which are obtained by solving (12)–(14) with the setting \(\alpha =\alpha _i\) and \(\alpha =\alpha _{i+1}\), respectively.
For any \(\alpha \in [\alpha _i,\alpha _{i+1}]\), the problem of evaluating \(\mathcal {G}(\alpha )\) is an equality version of (TRS) and enjoys the strong Lagrangian duality [25]. Then, it follows that
$$\begin{aligned} \mathcal {G}(\alpha )= & {} \max _{\lambda \in \mathbb {R}}\min _{x\in \mathbb {R}^n} \frac{p(x)}{\alpha }+h(x)-\lambda (g(x)-\alpha ) \end{aligned}$$
(31)
$$\begin{aligned}= & {} \max _{\lambda \in \mathbb {R}} \frac{p(x(\lambda ,\alpha ))}{\alpha }+h(x(\lambda ,\alpha ))-\lambda g(x(\lambda ,\alpha ))+\alpha \lambda , \end{aligned}$$
(32)
where \(x(\lambda ,\alpha )\) is an optimal solution of the inner minimization of (31).
Let \((x(\alpha ),\lambda (\alpha ))\) be the solution of the KKT system (12)–(14), i.e., \(x(\alpha )\) is an optimal solution to the minimization problem of evaluating \(\mathcal {G}(\alpha )\) and \(\lambda (\alpha )\) is the Lagrangian multiplier corresponding to the sphere constraint \(g(x)-\alpha =0\). Then, it follows from (30) that
$$\begin{aligned} \mathcal {G}(\alpha )= & {} \frac{p(x(\alpha ))}{\alpha }+h(x(\alpha ))\nonumber \\= & {} \frac{p(x(\alpha ))}{\alpha }+h(x(\alpha ))-\lambda (\alpha )(g(x(\alpha ))-\alpha ) \end{aligned}$$
(33)
$$\begin{aligned}= & {} \min _{x\in \mathbb {R}^n} \frac{p(x)}{\alpha }+h(x)-\lambda (\alpha )(g(x)-\alpha ), \end{aligned}$$
(34)
where (33) holds since \(x(\alpha )\) is a feasible solution so that \(g(x(\alpha ))-\alpha =0\), (34) follows from the fact that \(\lambda (\alpha )\) is an optimal solution to the outer optimization problem of (31).
Setting \(\alpha =\alpha _i\) and \(\alpha =\alpha _{i+1}\) in (33), respectively, we have
$$\begin{aligned}&\frac{p(x(\lambda ,\alpha ))}{\alpha _i}+h(x(\lambda ,\alpha )) - \lambda (\alpha _i) g(x(\lambda ,\alpha ))+\alpha _i \lambda (\alpha _i) \ge \mathcal {G}(\alpha _i), \end{aligned}$$
and
$$\begin{aligned}&\frac{p(x(\lambda ,\alpha ))}{\alpha _{i+1}}+h(x(\lambda ,\alpha )) - \lambda (\alpha _{i+1}) g(x(\lambda ,\alpha ))+\alpha _{i+1} \lambda (\alpha _{i+1}) \ge \mathcal {G}(\alpha _{i+1}), \end{aligned}$$
which give hints of estimating the unknowns in (32), \(p(x(\lambda ,\alpha ))\), \(h(x(\lambda ,\alpha ))\) and \(g(x(\lambda ,\alpha ))\). It leads to the following underestimation of \(\mathcal {G}(\alpha )\) over \(\alpha \in [\alpha _i,\alpha _{i+1}]\):
$$\begin{aligned} \underline{\mathcal {G}}(\alpha ):=\max _{\lambda \in \mathbb {R}}&\min \limits _{y_p,y_h,y_g}&\frac{y_p}{\alpha }+y_h-\lambda y_g+\alpha \lambda \end{aligned}$$
(35)
$$\begin{aligned}&\mathrm{s.t.}&\frac{y_p}{\alpha _i}+y_h - \lambda (\alpha _i) y_g+\alpha _i \lambda (\alpha _i) \ge \mathcal {G}(\alpha _i), \end{aligned}$$
(36)
$$\begin{aligned}&\frac{y_p}{\alpha _{i+1}}+y_h- \lambda (\alpha _{i+1}) y_g+\alpha _{i+1} \lambda (\alpha _{i+1}) \ge \mathcal {G}(\alpha _{i+1}). \end{aligned}$$
(37)
The inner optimization problem of (35)–(37) in terms of \((y_p,y_h,y_g)\) is a linear program and hence it is equivalent to its dual maximization problem. Thus, we can rewrite the underestimation (35)–(37) as a double maximization problem:
$$\begin{aligned} \max _{\lambda \in \mathbb {R}}&\max \limits _{\mu _1,\mu _2}&\mu _1(\mathcal {G}(\alpha _i)-\alpha _i \lambda (\alpha _i))+\mu _2(\mathcal {G}(\alpha _{i+1})-\alpha _{i+1} \lambda (\alpha _{i+1}))+\alpha \lambda \end{aligned}$$
(38)
$$\begin{aligned}&\mathrm{s.t.}&\frac{1}{\alpha _i}\mu _1+\frac{1}{\alpha _{i+1}}\mu _2=\frac{1}{\alpha }, \end{aligned}$$
(39)
$$\begin{aligned}&\mu _1+\mu _2=1, \end{aligned}$$
(40)
$$\begin{aligned}&\lambda (\alpha _i)\mu _1+ \lambda (\alpha _{i+1})\mu _2 = \lambda , \end{aligned}$$
(41)
$$\begin{aligned}&\mu _1,\mu _2 \ge 0, \end{aligned}$$
(42)
which can be recast as a standard optimization problem of maximizing (38) subject to (39)–(42) with respect to \(\lambda \), \(\mu _1\) and \(\mu _2\). It follows from \(\alpha _i<\alpha _{i+1}\) that \(\mu _1\) and \(\mu _2\) can be uniquely solved by the equalities (39), (40), that is,
$$\begin{aligned} \mu _1= \frac{\alpha _i(\alpha _{i+1}-\alpha )}{\alpha (\alpha _{i+1}-\alpha _i)}, \quad \mu _2= \frac{\alpha _{i+1}(\alpha -\alpha _i)}{\alpha (\alpha _{i+1}-\alpha _i)}. \end{aligned}$$
(43)
Moreover, for the solutions \(\mu _1\) and \(\mu _2\) of (43), the constraint (42) holds as \(1\le \alpha _i<\alpha _{i+1}\) and \(\alpha \in [\alpha _i,\alpha _{i+1}]\). Substituting the solutions \(\mu _1\) and \(\mu _2\) (43) into the equality constraint (41), we obtain
$$\begin{aligned} \lambda = \frac{\alpha _i \alpha _{i+1}}{\alpha _{i+1}-\alpha _i}(\lambda (\alpha _i) - \lambda (\alpha _{i+1})) \frac{1}{\alpha }+ \frac{1}{\alpha _{i+1}-\alpha _i} (\lambda (\alpha _{i+1}) \alpha _{i+1} - \lambda (\alpha _i) \alpha _i). \end{aligned}$$
(44)
Therefore, the optimization problem (38)–(42) has been explicitly solved. Plugging (43) and (44) in (38) yields a closed-form expression of the underestimation function \(\underline{\mathcal {G}}(\alpha )\):
$$\begin{aligned} \underline{\mathcal {G}}(\alpha )= c_1 \alpha +\frac{c_2}{\alpha }+c_3, \end{aligned}$$
(45)
where the constant coefficients are defined as
$$\begin{aligned} c_1= & {} \frac{\alpha _{i+1}\lambda (\alpha _{i+1}) - \alpha _i\lambda (\alpha _i)}{\alpha _{i+1}-\alpha _i}, \end{aligned}$$
(46)
$$\begin{aligned} c_2= & {} \alpha _i \alpha _{i+1}\left( c_1-\frac{\mathcal {G}(\alpha _{i+1})-\mathcal {G}(\alpha _i)}{\alpha _{i+1}-\alpha _i}\right) , \end{aligned}$$
(47)
$$\begin{aligned} c_3= & {} \frac{ \alpha _{i+1}\mathcal {G}(\alpha _{i+1})- \alpha _i\mathcal {G}(\alpha _{i})}{\alpha _{i+1}-\alpha _i}-c_1(\alpha _{i+1}+\alpha _{i}). \end{aligned}$$
(48)
The underestimation \(\underline{\mathcal {G}}(\alpha )\) is tight at the two endpoints as we can verify that
$$\begin{aligned} \underline{\mathcal {G}}(\alpha _{i+1})= \mathcal {G}(\alpha _{i+1}),~ \underline{\mathcal {G}}(\alpha _{i})=\mathcal {G}(\alpha _{i}). \end{aligned}$$
(49)
Then, the minimum of \(\underline{\mathcal {G}}(\alpha )\) over \([\alpha _{i},\alpha _{i+1}]\) provides a lower bound of (30). By simple computation, we have
$$\begin{aligned} \min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\underline{\mathcal {G}}(\alpha )=\left\{ \begin{array}{l@{\quad }l} 2\sqrt{ c_1c_2}+c_3,&{}\mathrm{if}~c_1>0, c_2>0, \alpha _i<\frac{\sqrt{c_2}}{\sqrt{c_1}}<\alpha _{i+1},\\ \underline{\mathcal {G}}(\alpha _{i+1}),&{} \mathrm{if}~c_1>0, c_2>0, \alpha _{i+1}\le \frac{\sqrt{c_2}}{\sqrt{c_1}},\\ \underline{\mathcal {G}}(\alpha _{i}),&{} \mathrm{if}~c_1>0, c_2>0, \alpha _{i}\ge \frac{\sqrt{c_2}}{\sqrt{c_1}},\\ \underline{\mathcal {G}}(\alpha _i),&{} \mathrm{if}~c_1>0,c_2\le 0, \\ \underline{\mathcal {G}}(\alpha _{i+1}),&{} \mathrm{if}~c_1\le 0,c_2>0, \\ \min \left\{ \underline{\mathcal {G}}(\alpha _{i+1}), \underline{\mathcal {G}}(\alpha _{i})\right\} ,&{} \mathrm{if}~ c_1\le 0,c_2\le 0.\\ \end{array}\right. \end{aligned}$$
As a summary, we have the following result.

Theorem 8

Let \(c_1,c_2\) and \(c_3\) be defined in (46)–(48), respectively. If
$$\begin{aligned} c_1>0, ~c_2>0, ~\widetilde{\alpha }:=\sqrt{\frac{c_2}{c_1}}\in (\alpha _i,\alpha _{i+1}), \end{aligned}$$
(50)
then we have
$$\begin{aligned} \min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\mathcal {G}(\alpha ) \ge \min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\underline{\mathcal {G}}(\alpha )= 2\sqrt{ c_1c_2}+c_3, \end{aligned}$$
where \(\widetilde{\alpha }\) is the unique minimizer of \(\underline{\mathcal {G}}(\alpha )\) over \([\alpha _i,\alpha _{i+1}]\). Otherwise, if (50) does not hold, we have
$$\begin{aligned} \min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\mathcal {G}(\alpha ) =\min \left\{ \mathcal {G}(\alpha _{i}),\mathcal {G}(\alpha _{i+1})\right\} . \end{aligned}$$

4.2 A new branch-and-bound algorithm (Algorithm BB-TLD)

In this subsection, we employ a branch-and-bound algorithm to solve \((\mathrm{PM})\) (5) based on the above two-layer dual underestimation. For the rule of branching, we adopt the \(\omega \)-subdivision approach first introduced in [5]. More precisely, we select \(\widetilde{\alpha }\) defined in (50), the minimizer of the underestimating function \(\underline{\mathcal {G}}(\alpha )\), to subdivide the current interval \([\alpha _{i},\alpha _{i+1}]\). The whole algorithm is listed as follows.

We show the details of applying Algorithm BB-TLD to solve the following example where \(\mathcal {G}(\alpha )\) is not unimodal with the setting \(\epsilon =10^{-6}\).

Example 1

Let \(m = n =2\), \(k = 1\) and
$$\begin{aligned} A=\left( \begin{matrix} 0.4 &{} 0.8 \\ 0.2 &{} 1 \end{matrix}\right) , \quad b=\left( \begin{matrix} 0.1 \\ 0.5 \end{matrix}\right) ,\quad L=\left( \begin{matrix} 0.1&0.8\end{matrix}\right) ,\quad \rho =0.5. \end{aligned}$$
With the same setting \(\epsilon _1=10^{-1},~\epsilon _2=10^{-6}\) as given in [1], after 35 iterations, Algorithm TRTLSG finds a local non-global minimizer \(\widetilde{x}=(3.2209,-\,0.4897)^T\) with the objective function value 0.0673 and \(\widetilde{\alpha }=\Vert \widetilde{x}\Vert ^2+1\approx 11.6140\). Actually, the global minimizer of \((\mathrm PM)\) (5) is \(\alpha ^*\approx 1.6300\) and the corresponding objective value is \(v(\mathrm PM)\approx 0.0634\). The function \(\mathcal {G}(\alpha )\) for this example is plotted in Fig. 1, see Sect. 5.
It follows from Theorems 6 and 7 that
$$\begin{aligned} \alpha _1=\alpha _{\min }=1.0266,~ \alpha _2=\alpha _{\max }=3355.5794. \end{aligned}$$
As a contrast, the upper bound (7) given in [1] is 17551.0566. Following the \(\omega \)-subdivision approach, the first subdividing point (50) is given by
$$\begin{aligned} \alpha _3=\widetilde{\alpha }=\mathrm{arg}\min _{\alpha \in [\alpha _1,\alpha _{2}]}\underline{\mathcal {G}}(\alpha )=59.1724. \end{aligned}$$
The next 12 iterations are plotted in Fig. 1 and then the stopping criterion is reached. It returns a global approximation solution \(x^*=(-\,0.6541,0.4496)^T\) with \(\alpha ^*=\Vert x^*\Vert ^2+1\approx 1.6300\). It is observed that the algorithm based on the \(\omega \)-subdivision is much more efficient than that based on bisection. Moreover, in each iteration, our new lower bound is tight for one of the two subintervals divided from the current interval. Consequently, there is no need to subdivide this subinterval in the following iterations.
Fig. 1

The last 12 iterations of our new algorithm for solving Example 1

Let \(\alpha ^*\) be the solution obtained by Algorithm BB-TLD. It holds that
$$\begin{aligned} v(\mathrm{P_G})\le \mathcal {G}(\alpha ^*) \le v(\mathrm{P_G})+\epsilon . \end{aligned}$$
(51)
Throughout this paper, any \(\alpha ^*\ge 1\) satisfying (51) is called a global \(\epsilon \)-approximation solution of \((\mathrm{P_G})\).

In order to study the worst-case computational complexity of our new algorithm, we need the following lemma.

Lemma 2

Let \(\lambda (\alpha )\) be the Lagrangian multiplier of (TRS) (11) (i.e., the \(\lambda \)-solution of the KKT system (12)–(14)). Then, if \(\alpha _{\min }>1\), \(\lambda (\alpha )\) is bounded over \([\alpha _{\min },\alpha _{\max }]\):
$$\begin{aligned} |\lambda (\alpha )| \le U:=\frac{\Vert A^Tb\Vert }{\alpha _{\min }\sqrt{\alpha _{\min }-1}}+ \lambda _{\min }\left( \frac{1}{\alpha _{\min }}A^TA+ \rho L^TL\right) . \end{aligned}$$
(52)

Proof

It follows from the KKT system (12)–(14) that if \(\lambda (\alpha )\ne \lambda _{\min }(Q_{\alpha })\) then \(\lambda (\alpha )< \lambda _{\min }(Q_{\alpha })\) and
$$\begin{aligned} \Vert (Q_{\alpha }-\lambda (\alpha )I)^{-1}f_{\alpha }\Vert ^2= \alpha -1. \end{aligned}$$
(53)
Notice that
$$\begin{aligned} \Vert (Q_{\alpha }-\lambda (\alpha )I)^{-1}f_{\alpha }\Vert ^2\le & {} \lambda _{\max }^2\left( \left( Q_{\alpha }-\lambda (\alpha )I\right) ^{-1}\right) \Vert f_{\alpha }\Vert ^2 \nonumber \\= & {} \frac{\Vert f_{\alpha }\Vert ^2}{\left( \lambda _{\min }\left( Q_{\alpha }\right) -\lambda (\alpha )\right) ^2}. \end{aligned}$$
(54)
Plugging (54) in (53) yields
$$\begin{aligned} \frac{\Vert f_{\alpha }\Vert ^2}{\left( \lambda _{\min }\left( Q_{\alpha }\right) -\lambda (\alpha )\right) ^2} \ge \alpha -1, \end{aligned}$$
which further implies that
$$\begin{aligned} |\lambda (\alpha )|\le \frac{\Vert f_{\alpha }\Vert }{\sqrt{\alpha -1}}+ \lambda _{\min }\left( Q_{\alpha }\right) . \end{aligned}$$
(55)
Notice that the inequality (55) trivially holds true for the other case \(\lambda (\alpha )= \lambda _{\min }(Q_{\alpha })\). Then, according to (55) and the definitions of \(Q_{\alpha }\) and \(f_{\alpha }\) (10), we obtain the upper bound (52) over the interval \([\alpha _{\min },\alpha _{\max }]\). \(\square \)

Theorem 9

If \(A^Tb\ne 0\), our new algorithm finds a global \(\epsilon \)-approximation solution of \((\mathrm{P_G})\) (5) in at most
$$\begin{aligned} \left\lceil \frac{4U\alpha _{\max }^2(\alpha _{\max }-\alpha _{\min })}{ \alpha _{\min }^2~\epsilon }\right\rceil \end{aligned}$$
(56)
iterations, where U is defined in (52), \(\alpha _{\min }>1\) and \(\alpha _{\max }\) are constant real numbers defined in Theorems 6 and 7, respectively. Moreover, suppose the assumption (15) holds for all \(\alpha >1\), in order to find a global \(\epsilon \)-approximation solution of \((\mathrm{P_G})\), our new algorithm requires at most
$$\begin{aligned} \left\lceil \frac{2\widetilde{U}\sqrt{\alpha _{\max }} (\alpha _{\max }-\alpha _{\min })}{ \alpha _{\min }~\cdot \sqrt{\epsilon }}\right\rceil \end{aligned}$$
(57)
iterations, where
$$\begin{aligned} \widetilde{U}=\max _{\alpha \in [\alpha _{\min }, \alpha _{\max }]}\lambda (\alpha )+\alpha \lambda '(\alpha ), \end{aligned}$$
(58)
is a well-defined finite number and \(\lambda (\alpha )\) is the \(\lambda \)-solution of (12)–(14).

Proof

Suppose \((LB,\alpha _i,\alpha _{i+1})\in T\) is selected to subdivide in the current iteration of our new algorithm. Then, we have \(LB=LB^*\). Without loss of generality, we assume that (50) holds in the interval \([\alpha _i,\alpha _{i+1}]\), since otherwise, it follows from Theorem 8 that \(LB=UB\) and hence the algorithm has to stop.

The condition (50) implies that the underestimating function \(\underline{\mathcal {G}}(\alpha )\) (45) is convex. Therefore, for any \(\alpha \in [\alpha _{\min },\alpha _{\max }]\), we have
$$\begin{aligned} \underline{\mathcal {G}}(\alpha )\ge & {} \underline{\mathcal {G}}(\alpha _i)+\underline{\mathcal {G}}'(\alpha _i)(\alpha -\alpha _i)\nonumber \\= & {} \mathcal {G}(\alpha _i)+\left( c_1-\frac{c_2}{\alpha _i^2}\right) (\alpha -\alpha _i)\nonumber \\\ge & {} \mathcal {G}(\alpha _i)+\left( c_1-\frac{c_1\alpha ^2_{i+1}}{\alpha _i^2}\right) (\alpha -\alpha _i)\nonumber \\\ge & {} \mathcal {G}(\alpha _i)- \frac{\alpha ^2_{i+1}-\alpha _i^2}{\alpha _i^2}c_1(\alpha _{i+1}-\alpha _i), \end{aligned}$$
(59)
where the first equality follows from (49) and the second inequality holds due to the third inequality of (50).
According to the definition (46) and Lemma 2, we have
$$\begin{aligned} c_1(\alpha _{i+1}-\alpha _i)=\alpha _{i+1}\lambda (\alpha _{i+1}) - \alpha _i\lambda (\alpha _i) \le (\alpha _{i+1}+\alpha _{i})U. \end{aligned}$$
(60)
By substituting (60) into (59), we obtain
$$\begin{aligned} \textit{LB}^*=\min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\underline{\mathcal {G}}(\alpha )\ge & {} \mathcal {G}(\alpha _i)- \frac{(\alpha _{i+1}+\alpha _i)^2(\alpha _{i+1}-\alpha _i)}{\alpha _i^2} U\\\ge & {} \textit{UB}- \frac{4\alpha _{\max }^2(\alpha _{i+1}-\alpha _i)}{\alpha _{\min }^2}U. \end{aligned}$$
Consequently, the stopping criterion \(\textit{LB}^*>\textit{UB}- \epsilon \) is reached if
$$\begin{aligned} \alpha _{i+1}-\alpha _i < \frac{\alpha _{\min }^2}{4U\alpha _{\max }^2}\cdot \epsilon . \end{aligned}$$
Therefore, the number of the iterations of our new algorithm can not exceed the upper bound (56).
It has been shown in the first part of the proof of Theorem 5 (see [1]) that, under the assumption that (15) holds for all \(\alpha >1\), \(\lambda (\alpha )\) is differentiable of any order. Therefore, \(\widetilde{U}\) (58) is well defined and \(\widetilde{U}<+\infty \). Applying the mean-value theorem to the definition of \(c_1\) (46), we have
$$\begin{aligned} c_1=\frac{\alpha _{i+1}\lambda (\alpha _{i+1})-\alpha _i\lambda (\alpha _i)}{\alpha _{i+1}-\alpha _i} = \left( \alpha \lambda (\alpha )\right) '|_{\alpha =\xi } =\lambda (\xi )+\xi \lambda '(\xi )\le \widetilde{U}, \end{aligned}$$
where \(\xi \in (\alpha _i,\alpha _{i+1})\).
Then, it follows from (59) that
$$\begin{aligned} LB^*=\min _{\alpha \in [\alpha _i,\alpha _{i+1}]}\underline{\mathcal {G}}(\alpha )\ge & {} \mathcal {G}(\alpha _i)- \frac{(\alpha _{i+1}+\alpha _i) (\alpha _{i+1}-\alpha _i)^2}{\alpha _i^2} \widetilde{U} \\\ge & {} UB- \frac{2\alpha _{\max }\widetilde{U} }{\alpha _{\min }^2}(\alpha _{i+1}-\alpha _i)^2. \end{aligned}$$
Then, if
$$\begin{aligned} \alpha _{i+1}-\alpha _i < \frac{\alpha _{\min }}{\sqrt{2\widetilde{U}\alpha _{\max }}} \cdot \sqrt{\epsilon }, \end{aligned}$$
the stopping criterion \(LB^*>UB- \epsilon \) is reached. Consequently, (57) gives the maximal number of the iterations of our new algorithm in the worst case. \(\square \)

Corollary 2

Suppose \(A^Tb= 0\) and \(b\ne 0\). For any \(\epsilon \in (0,\Vert b\Vert ^2)\), our new algorithm finds a global \(\epsilon \)-approximation solution of \((\mathrm{P_G})\) (5) in at most
$$\begin{aligned} \left\lceil \frac{4\alpha _{\max }^2(\alpha _{\max }-1)\lambda _{\min }\left( A^TA+ \rho L^TL\right) }{\epsilon } \right\rceil \end{aligned}$$
(61)
iterations, where \(\alpha _{\max }\) is defined in Theorem 7.

Proof

Under the assumption \(A^Tb= 0\) and \(b\ne 0\), according to Proposition 1, we have \(\alpha _{\min }=\frac{\Vert b\Vert ^2}{\Vert b\Vert ^2-\epsilon }>1\). Then, Lemma 2 and Theorem 9 hold true. It follows from (52) that
$$\begin{aligned} U= & {} \frac{\Vert A^Tb\Vert }{\alpha _{\min }\sqrt{\alpha _{\min }-1}}+ \lambda _{\min }\left( \frac{1}{\alpha _{\min }}A^TA+ \rho L^TL\right) \\= & {} \lambda _{\min }\left( \frac{1}{\alpha _{\min }}A^TA+ \rho L^TL\right) \\\le & {} \lambda _{\min }\left( A^TA+ \rho L^TL\right) . \end{aligned}$$
According to Theorem 9 and the following inequality
$$\begin{aligned} \frac{4U\alpha _{\max }^2(\alpha _{\max }-\alpha _{\min })}{ \alpha _{\min }^2~\epsilon } \le \frac{4\alpha _{\max }^2(\alpha _{\max }-1)\lambda _{\min }\left( A^TA+ \rho L^TL\right) }{\epsilon }, \end{aligned}$$
the proof is complete. \(\square \)

Remark 1

The worst-case computational complexity (61) can not be similarly reduced to \(O(1/\sqrt{\epsilon })\) as in (57), since for any \(\alpha >1\), the assumption (15) can not hold true for the case \(A^Tb= 0\).

5 Numerical experiments

In this section, we numerically compare the computational efficiency of the improved version of the bisection-based Algorithm TRTLSG [1] (which is improved by strengthening the lower and upper bounds on the norm of any optimal solution, see Sect. 3) and our new branch-and-bound algorithm (denoted by Algorithm BB-TLD). Since the stopping criterion in Step 3 of Algorithm TRTLSG [1] is different from that of our global optimization algorithm, for the sake of fairness, we replace the original simple stopping criterion \(|\alpha _{\max }- \alpha _{\min }| > \epsilon _2\) with
$$\begin{aligned} \mathcal {G}(\alpha _{\max })\le LB^*+\epsilon , \end{aligned}$$
(62)
where \(LB^*\in [\mathcal {G}(\alpha ^*)-\epsilon ,\mathcal {G}(\alpha ^*)]\) is a lower approximation of the optimal value \(\mathcal {G}(\alpha ^*)\) obtained by calling our new global optimization algorithm in advance.

We numerically test two examples. The first one is taken from Hansen’s Regularization Tools [11], where the function shaw is used to generate the matrix \(A_{\mathrm{true}}\in \mathbb {R}^{n\times n}\), the vector \(b_{\mathrm{true}}\in \mathbb {R}^n\) and the true solution \(x_{\mathrm{true}}\in \mathbb {R}^n\), i.e., we have \(A_{\mathrm{true}}x_\mathrm{true}=b_{\mathrm{ture}}\). Then, we add the white noise of level \(\sigma =0.05\), i.e., \(A=A_{\mathrm{true}}+\sigma E\), \(b=b_\mathrm{true}+\sigma e\), where E and e are generated from a standard normal distribution. In our experiments, the dimension n varies from 20 to 5000.

The second one is an image deblurring example of a fixed dimension \(n=1024\), see [1, 2]. We generate the atmospheric turbulence blur matrix \(A_{\mathrm{true}}\in \mathbb {R}^{n\times n}\) by implementing blur(n, 3), which is taken from [11]. The true solution \(x_{\mathrm{true}}\in \mathbb {R}^{n}\) is obtained by stacking the columns of \(X\in \mathbb {R}^{32\times 32}\) one underneath the other and then normalizing it so that \(\Vert x_{\mathrm{true}}\Vert =1\), where \(X\in \mathbb {R}^{32\times 32}\) is the following two dimensional image:
$$\begin{aligned} X(z_1,z_2)=\sum _{l=1}^3 a_l\cos (w_{l,1}z_1+w_{l,2}z_2+\phi _l),1\le z_1,z_2\le 32, \end{aligned}$$
with the coefficients being given in Table 1 of [1]. Let \(b_{\mathrm{true}}=A_{\mathrm{true}}x_{\mathrm{true}}\). Then, the white noise is added, i.e., \(A=A_{\mathrm{true}}+\sigma E\), \(b=b_{\mathrm{true}}+\sigma e\), where E and e are generated from a standard normal distribution. In our experiments, we let the level of the noise \(\sigma \) vary in \(\{0.01,0.03,0.05, 0.08,0.1,0.3,0.5,0.8,1.3,1.5,1.8,2.0\}\).

For the regularization matrix of the first example, we take \(L=get\_l(n,1)\), which is given in [11]. For the second example, as in [2], we set the regularization matrix L as the discrete approximation of the Laplace operator, which is standard in image processing [13]. The regularization parameter \(\rho \) is selected by using the L-curve method [12]. It corresponds to the L-shaped corner of the norm \(\Vert Lx\Vert ^2\) versus the fractional residual \(\Vert Ax-b\Vert ^2/(\Vert x\Vert ^2+1)\) for a various number of regularization parameters.

All the experiments are carried out in MATLAB R2014a and run on a server with 2.6 GHz dual-core processor and 32 GB RAM. We set the tolerance parameter \(\epsilon =10^{-6}\) for all the two algorithms. For each setting of the dimension or the level of noise in the above two examples, we independently and randomly generate 10 instances and then run the two algorithms. We report in Tables 2 and 3 the average of the numerical results for the 10 times running, where the average computational time is recorded in seconds and the symbol ‘#iter’ denotes the average of the number of iterations, i.e., the number of evaluating (TRS).

The numerical results demonstrate that, in most cases, our global optimization algorithm outperforms the improved version of the heuristic Algorithm TRTLSG [1]. Moreover, the larger the dimension or the level of noise is, the faster our global algorithms performs. It is worth noting that with the modified stopping criterion (62), the improved Algorithm TRTLSG [1] requires much fewer iterations as the objective \(\mathcal {G}(\alpha )\) is quite flat around any optimal solution \(\alpha ^*\). So, it is more time-consuming if the original simple stopping criterion \(|\alpha _{\max }- \alpha _{\min }| > \epsilon _2\) is used. It is observed that the number of the iterations of the improved Algorithm TRTLSG (though slightly) increases with the increase of either the dimension or the level of noise. However, for all instances we have tested, the number of the iterations of our new global optimization algorithm is never larger than twenty and seems to be independent of the dimension and the level of noise.
Table 2

The average of the numerical results for ten times solving the first example with different dimension n

n

Algorithm TRTLSG

Algorithm BB-TLD

# iter

Time (s)

# iter

Time (s)

20

16.0

0.02

17.0

0.02

50

18.3

0.03

15.5

0.03

100

18.7

0.09

15.5

0.08

200

18.9

0.26

16.5

0.25

500

20.0

2.81

16.8

2.64

1000

20.5

10.49

16.1

9.10

1200

20.4

15.19

15.6

12.69

1500

21.1

24.07

18.0

22.40

1800

21.2

35.97

17.8

33.67

2000

20.8

43.88

17.8

43.10

2500

20.7

72.51

17.5

68.84

3000

21.8

125.16

16.2

102.76

4000

20.2

255.86

14.0

202.95

5000

20.0

448.39

14.5

366.50

Table 3

The average of the numerical results for ten times solving the second example with a fixed dimension \(n=1024\) and different level of noise \(\sigma \)

\(\sigma \)

Algorithm TRTLSG

Algorithm BB-TLD

# iter

Time (s)

# iter

Time (s)

0.01

17.2

9.46

14.4

8.60

0.03

21.9

12.33

16.6

10.11

0.05

16.2

8.91

17.0

10.52

0.08

18.0

10.04

17.0

10.68

0.1

19.8

11.25

18.4

11.56

0.3

29.4

17.65

17.0

10.94

0.5

30.8

18.59

17.4

11.19

0.8

30.7

20.52

15.9

12.06

1.0

31.4

21.08

15.4

11.71

1.3

32.2

21.56

15.6

12.04

1.5

32.0

21.96

15.6

12.33

1.8

33.6

23.11

16.1

13.04

2.0

33.7

23.52

16.0

13.03

6 Conclusions

The total least squares problem with the general Tikhonov regularization (TRTLS) is a non-convex optimization problem with local non-global minimizers. It can be reformulated as a problem of minimizing the one-dimensional function \(\mathcal {G}(\alpha )\) over an interval, where \(\mathcal {G}(\alpha )\) is evaluated by solving an n-dimensional trust region subproblem. In literature, there is an efficient bisection-based heuristic algorithm for solving (TRTLS), denoted by Algorithm TRTLSG. It converges to the global optimal solution except for some exceptional examples with non-unimodal \(\mathcal {G}(\alpha )\). In this paper, we firstly improve the lower and upper bounds on the norm of the globally optimal solution. It helps to greatly improve the efficiency of Algorithm TRTLSG. For the global optimization of (TRTLS), we employ the adaptive branch-and-bound algorithm, based on a newly introduced two-layer dual approach for underestimating \(\mathcal {G}(\alpha )\) over any given interval. Our new algorithm (Algorithm BB-TLD) guarantees to find a global \(\epsilon \)-approximation solution in at most \(O(1/\epsilon )\) iterations and the computational effort in each iteration is \(O(n^3\log (1/\epsilon ))\). Under the same assumptions as in Algorithm TRTLSG, the number of iterations of our new algorithm can be further reduced to \(O(1/\sqrt{\epsilon })\). In our experiments, the practical iteration numbers are always less than twenty and seem to be independent of the dimension and the level of noise. Numerical results demonstrate that our global optimization algorithm is even faster than the improved version of Algorithm TRTLSG, which is a bisection-based heuristic algorithm. It is the future work to extend our two-layer dual underestimation approach to globally solve more structured non-convex optimization problems.

Notes

Acknowledgements

The authors are grateful to the two anonymous referees for their valuable comments and suggestions.

References

  1. 1.
    Beck, A., Ben-Tal, A.: On the solution of the Tikhonov regularization of the total least squares problem. SIAM J. Optim. 17(3), 98–118 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Beck, A., Ben-Tal, A., Teboulle, M.: Finding a global optimal solution for a quadratically constrained fractional quadratic problem with applications to the regularized total least squares. SIAM J. Matrix Anal. Appl. 28(2), 425–445 (2006)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Beck, A., Teboulle, M.: A convex optimization approach for minimizing the ratio of indefinite quadratic functions over an ellipsoid. Math. Program. 118(1), 13–35 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia (2000)CrossRefGoogle Scholar
  5. 5.
    Falk, J.E., Soland, R.M.: An algorithm for separable nonconvex programming problems. Manag. Sci. 15, 550–569 (1969)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Fortin, C., Wolkowicz, H.: The trust region subproblem and semidefinite programming. Optim. Methods Softw. 19, 41–67 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gander, W., Golub, G.H., von Matt, U.: A constrained eigenvalue problem. Linear Algebra Appl. 114(115), 815–839 (1989)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gay, D.M.: Computing optimal locally constrained steps. SIAM J. Sci. Stat. Comput. 2(2), 186–197 (1981)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Golub, G.H., Van Loan, C.F.: An analysis of the total least-squares problem. SIAM J. Numer. Anal. 17(6), 883–893 (1980)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  11. 11.
    Hansen, P.C.: Regularization tools: a Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithm 6, 1–35 (1994)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hansen, P.C., O’Leary, D.P.: The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14, 1487–1503 (1993)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs (1989)zbMATHGoogle Scholar
  14. 14.
    Joerg, L., Heinrich, V.: Large-scale Tikhonov regularization of total least squares. J. Comput. Appl. Math. 238, 95–108 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Lbaraki, T., Schaible, S.: Fractional programming. Eur. J. Oper. Res. 12(4), 325–338 (2004)MathSciNetGoogle Scholar
  16. 16.
    Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Moré, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2, 189–209 (1993)CrossRefGoogle Scholar
  18. 18.
    Pong, T.K., Wolkowicz, H.: Generalizations of the trust region subproblem. Comput. Optim. Appl. 58(2), 273–322 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Rendel, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 77(2), 273–299 (1997)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Schaible, S., Shi, J.M.: Fractional programming: the sum-of-ratios case. Optim. Methods Softw. 18(2), 219–229 (2003)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Sorensen, D.C.: Minimization of a large-scale quadratic function subject to a spherical constraint. SIAM J. Optim. 7(1), 141–161 (1997)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. V.H. Winston, Washington (1977)zbMATHGoogle Scholar
  23. 23.
    Van Huffel, S., Lemmerling, P.: Total Least Squares and Errors-in-Variables Modeling. Kluwer, Dordrecht (2002)CrossRefGoogle Scholar
  24. 24.
    Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. Frontiers in Applied Mathematics, vol. 9. SIAM, Philadelphia (1991)CrossRefGoogle Scholar
  25. 25.
    Xia, Y., Wang, S., Sheu, R.L.: S-lemma with equality and its applications. Math. Program. Ser. A 156(1–2), 513–547 (2016)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Yang, M., Xia, Y., Wang, J., Peng, J.: Efficiently solving total least squares with Tikhonov identical regularization. Comput. Optim. Appl. 70(2), 571–592 (2018)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Software Development Environment, LMIB of the Ministry of Education, School of Mathematics and System SciencesBeihang UniversityBeijingPeople’s Republic of China

Personalised recommendations