Advertisement

A globally convergent hybrid conjugate gradient method with strong Wolfe conditions for unconstrained optimization

  • P. KaeloEmail author
  • P. Mtagulwa
  • M. V. Thuto
Open Access
Original Research
  • 67 Downloads

Abstract

In this paper, we develop a new hybrid conjugate gradient method that inherits the features of the Liu and Storey (LS), Hestenes and Stiefel (HS), Dai and Yuan (DY) and Conjugate Descent (CD) conjugate gradient methods. The new method generates a descent direction independently of any line search and possesses good convergence properties under the strong Wolfe line search conditions. Numerical results show that the proposed method is robust and efficient.

Keywords

Unconstrained optimization Global convergence Sufficient descent Strong Wolfe conditions 

Mathematics Subject Classification

90C30 65K05 90C06 

Introduction

In this paper, we consider solving the unconstrained optimization problem
$$\begin{aligned} {\text {min}}\ f(x), \end{aligned}$$
(1)
where \(x \in {\mathbb {R}}^n\) is an n-dimensional real vector and \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is a smooth function, using a nonlinear conjugate gradient method. Optimization problems arise naturally in problems from many scientific and operational applications (see e.g. [12, 19, 20, 21, 22, 35, 36], among others).
To solve problem (1), a nonlinear conjugate gradient method starts with an initial guess, \(x_{0}\in {\mathbb {R}}^n\), and generates a sequence \(\{x_{k}\}_{k=0}^{\infty }\) using the recurrence
$$\begin{aligned} x_{k+1}=x_{k}+\alpha _{k}d_{k}, \end{aligned}$$
(2)
where the step size \(\alpha _{k}\) is a positive parameter and \(d_{k}\) is the search direction defined by
$$\begin{aligned} d_{k} = \left\{ \begin{array}{ll} -g_{k}, &{}\quad {\text {if}}\ \quad k=0, \\ -g_{k}+\beta _{k}d_{k-1}, &{}\quad {\text {if}}\ \quad k > 0. \end{array} \right. \end{aligned}$$
(3)
The scalar \(\beta _{k}\) is the conjugate gradient update coefficient and \(g_{k}=\nabla f(x_{k})\) is the gradient of f at \(x_{k}.\) In finding the step size \(\alpha _{k},\) the inexpensive line searches such as the weak Wolfe line search
$$\begin{aligned} \left\{ \begin{array}{ll} f(x_{k}+\alpha _{k}d_{k})\le f(x_{k})+\delta \alpha _{k}g_{k}^{T}d_{k} \\ g_{k+1}^{T}d_{k}\ge \sigma g_{k}^{T}d_{k}, \end{array} \right. \end{aligned}$$
(4)
the strong Wolfe line search
$$\begin{aligned} \left\{ \begin{array}{ll} f(x_{k}+\alpha _{k}d_{k})\le f(x_{k})+\delta \alpha _{k}g_{k}^{T}d_{k} \\ |g_{k+1}^{T}d_{k}|\le \sigma |g_{k}^{T}d_{k}|, \end{array} \right. \end{aligned}$$
(5)
or the generalized Wolfe conditions
$$\begin{aligned} \left\{ \begin{array}{ll} f(x_{k}+\alpha _{k}d_{k})\le f(x_{k})+\delta \alpha _{k}g_{k}^{T}d_{k} \\ \sigma g_{k}^{T}d_{k}\le g_{k+1}^{T}d_{k} \le -\sigma _{1}g_{k}^{T}d_{k}, \end{array} \right. \end{aligned}$$
(6)
where \(0< \delta< \sigma <1\) and \(\sigma _{1}\ge 0\) are constants, are often used. Generally, conjugate gradient methods differ by the choice of the coefficient \(\beta _{k}.\) Well-known formulas for \(\beta _{k}\) can be divided into two categories. The first category includes Fletcher and Reeves (FR) [11], Dai and Yuan (DY) [6] and Conjugate Descent (CD) [10]:
$$\begin{aligned} \beta _{k}^{FR}=\frac{\Vert g_{k}\Vert ^{2}}{\Vert g_{k-1}\Vert ^{2}}, \ \ \beta _{k}^{DY}=\frac{\Vert g_{k}\Vert ^{2}}{d_{k-1}^{T}y_{k-1}},\ \ \beta _{k}^{CD}=-\frac{\Vert g_{k}\Vert ^{2}}{d_{k-1}^{T}g_{k-1}}, \end{aligned}$$
where \(\Vert \cdot \Vert\) denotes the Euclidean norm and \(y_{k-1}=g_{k}-g_{k-1}.\) These methods have strong convergence properties. However, since they are very often susceptible to jamming, they tend to have poor numerical performance. The other category includes Hestenes and Stiefel (HS) [16], Polak-Ribi\(\grave{\text {e}}\)re-Polyak (PRP) [28, 29] and Liu and Storey (LS) [26]:
$$\begin{aligned} \beta _{k}^{HS}=\frac{g_{k}^{T}y_{k-1}}{d_{k-1}^{T}y_{k-1}}, \ \ \beta _{k}^{PRP}=\frac{g_{k}^{T}y_{k-1}}{\Vert g_{k-1}\Vert ^{2}},\ \ \beta _{k}^{LS}=-\frac{g_{k}^{T}y_{k-1}}{d_{k-1}^{T}g_{k-1}}. \end{aligned}$$
Although these methods may fail to converge, they have an in-built automatic restart feature which helps them avoid jamming and hence makes them numerically efficient [5].
In view of the above stated drawbacks and advantages, many researchers have proposed hybrid conjugate gradient methods that combine different \(\beta _{k}\) coefficients so as to limit the drawbacks and maximize in the advantages of the original respective conjugate gradient methods. For instance, Touati-Ahmed and Storey [31] suggested one of the first hybrid method where the coefficient \(\beta _{k}\) is given by
$$\begin{aligned} \beta _{k}^{TS} = \left\{ \begin{array}{ll} \beta _{k}^{PRP}, &{}\quad {\text {if}}\ \quad 0\le \beta _{k}^{PRP} \le \beta _{k}^{FR}, \\ \beta _{k}^{FR}, &{}\quad {\text {otherwise}}. \end{array} \right. \end{aligned}$$
The authors proved that \(\beta _{k}^{TS}\) has good convergence properties and numerically outperforms both the \(\beta _{k}^{FR}\) and \(\beta _{k}^{PRP}\) methods. Alhawarat et al. [3] introduced a hybrid conjugate gradient method in which the conjugate gradient update coefficient is computed as
$$\begin{aligned} \beta _{k}^{AZPRP} = \left\{ \begin{array}{ll} \displaystyle {\frac{\Vert g_{k}\Vert ^{2}-g_{k}^{T}g_{k-1}}{\Vert g_{k-1}\Vert ^{2}}} , &{}\quad {\text {if}}\ \Vert g_{k}\Vert ^{2}> |g_{k}^{T}g_{k-1}|, \\ \displaystyle {\frac{\Vert g_{k}\Vert ^{2}-\mu _{k}|g_{k}^{T}g_{k-1}|}{\Vert g_{k-1}\Vert ^{2}}} , &{}\quad {\text {if}}\ \Vert g_{k}\Vert ^{2}> \mu _{k}|g_{k}^{T}g_{k-1}|, \\ 0, &{}\quad {\text {otherwise}}, \end{array} \right. \end{aligned}$$
where \(\mu _{k}\) is defined as
$$\begin{aligned} \mu _{k}=\frac{\Vert x_{k}-x_{k-1}\Vert }{\Vert y_{k-1}\Vert }. \end{aligned}$$
The authors proved that the method possesses global convergence property when weak Wolfe line search is employed. Moreover, numerical results demonstrate that the proposed method outperforms both the CG-Descent 6.8 [14] and CG-Descent 5.3 [13] methods on a number of benchmark test problems.
In [5], Babaie-Kafaki gave a quadratic hybridization of \(\beta _{k}^{FR}\) and \(\beta _{k}^{PRP}\), where
$$\begin{aligned} \beta _{k}^{HQ\pm } = \left\{ \begin{array}{ll} \beta _{k}^{+}(\theta ^{\pm }_{k}), &{}\quad \theta ^{\pm }_{k} \in [-1,1] , \\ \max (0,\beta _{k}^{PRP}), &{}\quad \theta ^{\pm }_{k} \in {\mathbb {C}}, \\ -\beta _{k}^{FR}, &{}\quad \theta ^{\pm }_{k} < -1, \\ \beta _{k}^{FR}, &{}\quad \theta ^{\pm }_{k} > 1, \end{array} \right. \end{aligned}$$
and the hybridization parameter \(\theta ^{\pm }_{k}\) is taken from the roots of the quadratic equation
$$\begin{aligned} \theta _{k}^{2}\beta _{k}^{PRP}-\theta _{k}\beta _{k}^{FR} +\beta _{k}^{HS}-\beta _{k}^{PRP}=0, \end{aligned}$$
that is
$$\begin{aligned} \theta ^{\pm }_{k}=\frac{\beta _{k}^{FR}\pm \sqrt{(\beta _{k}^{FR})^{2}-4\beta _{k}^{PRP}(\beta _{k}^{HS} -\beta _{k}^{PRP})}}{2\beta _{k}^{PRP}}, \end{aligned}$$
and
$$\begin{aligned} \beta _{k}^{+}(\theta _{k})=\max (0,\beta _{k}^{PRP})(1-\theta _{k}^{2}) +\theta _{k}\beta _{k}^{FR},\ \ \ \theta _{k}\in [-1,1]. \end{aligned}$$
Thus, the author suggested two methods \(\beta _{k}^{HQ+}\) and \(\beta _{k}^{HQ-}\), corresponding to \(\theta ^{+}\) and \(\theta ^{-},\) respectively, with numerical results showing \(\beta _{k}^{HQ-}\) to be more efficient than \(\beta _{k}^{HQ+}\).
More recently, Salih et al. [30] presented another hybrid conjugate gradient method defined by
$$\begin{aligned} \beta _{k}^{YHM} = \left\{ \begin{array}{ll} \displaystyle {\frac{g^{T}_{k}(g_{k}-g_{k-1})}{\Vert g_{k-1}\Vert ^{2}}}, &{}\quad {\text {if}}\ 0\le g_{k}^{T}g_{k-1}\le \Vert g_{k}\Vert ^{2}, \\ \displaystyle {\frac{g^{T}_{k}(g_{k}-\frac{\Vert g_{k}\Vert }{\Vert g_{k-1}\Vert }g_{k-1})}{\Vert g_{k-1}\Vert ^{2}}}, &{}\quad {\text {otherwise}}. \end{array} \right. \end{aligned}$$
The authors showed that the \(\beta _{k}^{YHM}\) method satisfies the sufficient descent condition and possesses global convergence property under the strong Wolfe line search. In 2019, Faramarzi and Amini [9] introduced a spectral conjugate gradient method defined as
$$\begin{aligned} \beta _{k}^{ZDK} =\frac{g_{k}^{T} {z}_{k-1}}{d_{k-1}^{T} {z}_{k-1}}- \frac{\Vert {z}_{k-1}\Vert ^{2}}{d_{k-1}^{T}{z}_{k-1}} \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}{z}_{k-1}}, \end{aligned}$$
with the spectral search direction given as
$$\begin{aligned} d_{k}=-\theta _{k}g_{k}+\beta _{k}^{ZDK}d_{k-1},\ d_{0}=-g_{0}, \end{aligned}$$
where
$$\begin{aligned} z_{k-1}& = {} y_{k-1}+h_k||g_{k-1}||^rs_{k-1},\, s_{k-1}=x_k-x_{k-1} {\text { and} }\,\, h_k \\& = {} D +\max \left\{ -\frac{s_{k-1}^Ty_{k-1}}{||s_{k-1}||^2},0\right\} ||g_{k-1}||^{-r}, \end{aligned}$$
with r and D being positive constants. The authors suggested computing the spectral parameter \(\theta _{k}\) as
$$\begin{aligned} \theta _{k} = \left\{ \begin{array}{ll} \theta _{k-1}^{N+}, &{}\quad {\text {if}}\ \theta _{k}^{N+} \in [\frac{1}{4}+\eta ,\tau ], \\ 1, &{}\quad {\text {otherwise}} \end{array} \right. \end{aligned}$$
or
$$\begin{aligned} \theta _{k} = \left\{ \begin{array}{ll} \theta _{k-1}^{N-}, &{}\quad {\text {if}}\ \theta _{k}^{N-} \in [\frac{1}{4}+\eta ,\tau ], \\ 1, &{}\quad {\text {otherwise}}, \end{array} \right. \end{aligned}$$
where \(\eta\) and \(\tau\) are constants such that \(\frac{1}{4}+\eta \le \theta _{k}\le \tau ,\)
$$\begin{aligned} \theta _{k}^{N-}=1- \frac{\Vert {z}_{k-1}\Vert ^{2}d_{k-1}^{T}g_{k}}{(d_{k-1}^{T}{z}_{k-1})({z}_{k-1}^{T}g_{k})} \end{aligned}$$
and
$$\begin{aligned} \theta _{k}^{N+}=1-\frac{1}{{z}^{T}_{k-1}g_{k}} \bigg (\frac{\Vert {z}_{k-1}\Vert ^{2}d_{k-1}^{T}g_{k}}{d_{k-1}^{T}{z}_{k-1}} -s_{k-1}^{T}g_{k}\bigg ). \end{aligned}$$
Convergence of this method is established under the strong Wolfe conditions. For more conjugate gradient methods, the reader is referred to the works of [1, 2, 8, 15, 17, 18, 23, 24, 25, 27, 32, 33].

In this paper, we suggest another new hybrid conjugate gradient method that inherits good computational efforts of \(\beta _{k}^{LS}\) and \(\beta _{k}^{HS}\) methods and also nice convergence properties of \(\beta _{k}^{DY}\) and \(\beta _{k}^{CD}\) methods. This proposed method is presented in the next section, and the rest of the paper is structured as follows. In Sect. 3, we show that the proposed method satisfies the descent condition for any line search and also present its global convergence analysis under the strong Wolfe line search. Numerical comparison with respect to performance profiles of Dolan-Morè [7] and conclusion is presented in Sects. 4 and 5, respectively.

A new hybrid conjugate gradient method

In [32], a variant of the \(\beta _k^{PRP}\) method is proposed, where the coefficient \(\beta _{k}\) is computed as
$$\begin{aligned} \beta _{k}^{WYL}=\frac{\Vert g_{k}\Vert ^{2}-\frac{\Vert g_{k}\Vert }{\Vert g_{k-1}\Vert }g_{k}^{T}g_{k-1}}{\Vert g_{k-1}\Vert ^{2}}. \end{aligned}$$
This method inherits the good numerical performance of the PRP method. Moreover, Huang et al. [17] proved that the \(\beta _k^{WYL}\) method satisfies the sufficient descent property and established that the method is globally convergent under the strong Wolfe line search if the parameter \(\sigma\) in (5) satisfies \(\sigma < \frac{1}{4}.\) Yao et al. [34] extended this idea to the \(\beta _k^{HS}\) method and proposed the update
$$\begin{aligned} \beta _{k}^{YWH}=\frac{\Vert g_{k}\Vert ^{2}-\frac{\Vert g_{k}\Vert }{\Vert g_{k-1}\Vert } g_{k}^{T}g_{k-1}}{d_{k-1}^{T}(g_{k}-g_{k-1})}. \end{aligned}$$
The authors proved that the method is globally convergent under the strong Wolfe line search with the parameter \(\sigma <\frac{1}{3}\). In Jian et al. [18], a hybrid of \(\beta _{k}^{DY}, \beta _{k}^{FR}, \beta _{k}^{WYL}\ {\text {and}}\ \beta _{k}^{YWH}\) is proposed by introducing the update
$$\begin{aligned} \beta _{k}^{N}=\frac{\Vert g_{k}\Vert ^{2} -\max \{0,\frac{\Vert g_{k}\Vert }{\Vert g_{k-1}\Vert }g_{k}^{T}g_{k-1}\}}{\max \{\Vert g_{k-1}\Vert ^{2},d_{k-1}^{T}(g_{k}-g_{k-1})\}}, \end{aligned}$$
with
$$\begin{aligned} d_{k} = \left\{ \begin{array}{ll} -g_{k}, &{}\quad {\text {if}}\ \quad k=0, \\ -g_{k}+\beta _{k}^{N}d_{k-1}, &{}\quad {\text {if}}\ \quad k > 0. \end{array} \right. \end{aligned}$$
Independent of the line search, the method generates a descent direction at every iteration. Furthermore, its global convergence is established under the weak Wolfe line search.
Now, motivated by the ideas of Jian et al. [18], in this paper we suggest a hybrid conjugate gradient method that inherits the strengths of the \(\beta _{k}^{LS},\ \beta _{k}^{HS},\ \beta _{k}^{DY}\) and \(\beta _{k}^{CD}\) methods by introducing \(\beta _{k}^{PKT}\) as
$$\begin{aligned} \beta _{k}^{PKT} = \left\{ \begin{array}{ll} \displaystyle {\frac{\Vert g_{k}\Vert ^{2}-g_{k}^{T}g_{k-1}}{\max \{d_{k-1}^{T}y_{k-1},-g_{k-1}^{T}d_{k-1}\}}}, &{}\quad {\text {if}}\ 0<g_{k}^{T}g_{k-1}<\Vert g_{k}\Vert ^{2}, \\ \displaystyle {\frac{\Vert g_{k}\Vert ^{2}}{\max \{d_{k-1}^{T}y_{k-1},-g_{k-1}^{T}d_{k-1}\}}}, &{}\quad {\text {otherwise}}, \end{array} \right. \end{aligned}$$
(7)
with direction \(d_{k}\) defined as
$$\begin{aligned} d_{k} = \left\{ \begin{array}{ll} -g_{k}, &{}\quad {\text {if}}\ \quad k=0 \ {\text {or}}\ |g_{k}^{T}g_{k-1} |\ge 0.2 \Vert g_{k}\Vert ^{2}, \\ -\left( 1+\beta _{k}^{PKT}\frac{d_{k-1}^{T}g_{k}}{\Vert g_{k}\Vert ^{2}}\right) g_{k}+\beta _{k}^{PKT}d_{k-1}, &{}\quad {\text {if}}\ \quad k > 0. \end{array} \right. \end{aligned}$$
(8)
Now, with \(\beta _{k}\) and \(d_{k}\) defined as in (7) and (8), respectively, we present our hybrid conjugate gradient algorithm below.

Global convergence of the proposed method

The following standard assumptions which have been used extensively in the literature are necessary to analyse the global convergence properties of our hybrid method.

Assumption 3.1

The level set
$$\begin{aligned} X=\{x\in {\mathbb {R}}^{n}:f(x)\le f(x_{0})\}, \end{aligned}$$
is bounded, where \(x_{0}\in {\mathbb {R}}^n\) is the initial guess of the iterative method (2).

Assumption 3.2

In some neighbourhood N of X, the objective function f is continuous and differentiable, and its gradient is Lipschitz continuous, that is, there exists a constant \(L>0\) such that
$$\begin{aligned} \Vert g(x)-g(y)\Vert \le L\Vert x-y\Vert \ {\text {for all}} \ x,y \in N. \end{aligned}$$
If Assumptions 3.1 and 3.2 hold, then there exists a positive constant \(\varsigma\) such that
$$\begin{aligned} \Vert g(x)\Vert \le \varsigma \ {\text {for all}} \ x\in N. \end{aligned}$$
(9)

Lemma 3.1

Consider the sequence \(\{g_{k}\}\) and \(\{d_{k}\}\) generated by Algorithm 1. Then, the sufficient descent condition
$$\begin{aligned} d_{k}^{T}g_{k}= -\Vert g_{k}\Vert ^{2}, \ \forall k\ge 0, \end{aligned}$$
(10)
holds.

Proof

If \(k=0\) or \(|g_{k}^{T}g_{k-1}|\ge 0.2 \Vert g_{k}\Vert ^{2},\) then the search direction \(d_{k}\) is given by
$$\begin{aligned} d_{k}& = {} - g_{k}. \end{aligned}$$
This gives
$$\begin{aligned} g^{T}_{k}d_{k}& = {} -\Vert g_{k}\Vert ^{2}. \end{aligned}$$
Otherwise, the search direction \(d_{k}\) is given by
$$\begin{aligned} d_{k}=-\left( 1+\beta _{k}^{PKT}\frac{d_{k-1}^{T}g_{k}}{\Vert g_{k}\Vert ^{2}} \right) g_{k}+\beta _{k}^{PKT}d_{k-1}. \end{aligned}$$
(11)
Now, if we pre-multiply Eq. (11) by \(g_{k}^T,\) we get
$$\begin{aligned} g_{k}^{T}d_{k}& = {} -\Vert g_{k}\Vert ^{2}\left( 1+\beta _{k}^{PKT}\frac{d_{k-1}^{T}g_{k}}{\Vert g_{k}\Vert ^{2}}\right) +\beta _{k}^{PKT}g_{k}^{T}d_{k-1} \\& = {} -\Vert g_{k}\Vert ^{2}-\beta _{k}^{PKT}d_{k-1}^{T}g_{k}+\beta _{k}^{PKT}g_{k}^{T}d_{k-1} \\& = {} -\Vert g_{k}\Vert ^{2}. \end{aligned}$$
Therefore, the new method satisfies the sufficient descent property (10) for all k. \(\square\)

Lemma 3.2

Suppose that Assumptions 3.1 and3.2 hold. Let the sequence \(\{x_{k}\}\) be generated by (2) and the search direction\(d_{k}\) be a descent direction. If\(\alpha _{k}\) is obtained by the strong Wolfe line search, then the Zoutendijk condition
$$\begin{aligned} \sum _{k\ge 0} \frac{(g_{k}^{T}d_{k})^{2}}{\Vert d_{k}\Vert ^{2}}<+\infty \end{aligned}$$
(12)
holds.

Lemma 3.3

For any\(k\ge 1,\) the relation\(0< \beta _{k}^{PKT}\le \beta _{k}^{CD}\) always holds.

Proof

From (5) and (10), it follows that
$$\begin{aligned} d_{k-1}^{T}y_{k-1}\ge & {} (1-\sigma )\Vert g_{k-1}\Vert ^{2}, \end{aligned}$$
and since \(0<\sigma <1,\) we have
$$\begin{aligned} d_{k-1}^{T}y_{k-1}>0. \end{aligned}$$
(13)
Also, by descent condition (10), we get
$$\begin{aligned} -g_{k-1}^{T}d_{k-1}= \Vert g_{k-1}\Vert ^{2}, \end{aligned}$$
implying
$$\begin{aligned} -g_{k-1}^{T}d_{k-1}> 0. \end{aligned}$$
(14)
Therefore, from (7), (13) and (14), it is clear that \(\beta _{k}^{PKT}>0.\) Moreover, if \(0<g_{k}^{T}g_{k-1}<\Vert g_{k}\Vert ^{2},\) then
$$\begin{aligned} \beta _{k}^{PKT}& = {} \frac{\Vert g_{k}\Vert ^{2}-g_{k}^{T}g_{k-1}}{\max \{d_{k-1}^{T}y_{k-1},-g_{k-1}^{T}d_{k-1}\}} \\&< {} \frac{\Vert g_{k}\Vert ^{2}}{\max \{d_{k-1}^{T}y_{k-1},-g_{k-1}^{T}d_{k-1}\}}, \end{aligned}$$
and since \(\max \{d_{k-1}^{T}y_{k-1},-g_{k-1}^{T}d_{k-1}\}\ge -g_{k-1}^{T}d_{k-1},\) we have
$$\begin{aligned} \beta _{k}^{PKT}\le & {} \frac{\Vert g_{k}\Vert ^{2}}{-g_{k-1}^{T}d_{k-1}} \\& = {} \beta _{k}^{CD}. \end{aligned}$$
Hence, lemma is proved. \(\square\)

Theorem 3.1

Suppose that Assumptions 3.1 and 3.2 hold. Consider the sequences \(\{g_{k}\}\) and \(\{d_{k}\}\) generated by Algorithm 1. Then
$$\begin{aligned} \liminf _{k\rightarrow \infty } \Vert g_{k}\Vert = 0. \end{aligned}$$
(15)

Proof

Assume that (15) does not hold. Then there exists a constant \(r>0\) such that
$$\begin{aligned} \Vert g_{k}\Vert \ge r,\ \forall k\ge 0. \end{aligned}$$
(16)
Letting \(\xi _{k}= 1+\beta _{k}^{PKT}\frac{d_{k-1}^{T}g_{k}}{\Vert g_{k}\Vert ^{2}},\) it follows from (8) that
$$\begin{aligned} d_{k}+\xi _{k}g_{k}=\beta _{k}^{PKT}d_{k-1}. \end{aligned}$$
Squaring both sides gives
$$\begin{aligned}&\Vert d_{k}\Vert ^{2}+ 2\xi _{k}d_{k}^{T}g_{k}+\xi _{k}^{2}\Vert g_{k}\Vert ^{2} = (\beta _{k}^{PKT})^{2}\Vert d_{k-1}\Vert ^{2}, \\&\quad \Rightarrow \Vert d_{k}\Vert ^{2} = (\beta _{k}^{PKT})^{2}\Vert d_{k-1}\Vert ^{2} - 2\xi _{k}d_{k}^{T}g_{k} -\xi _{k}^{2}\Vert g_{k}\Vert ^{2}. \end{aligned}$$
Now, dividing by \((g_{k}^{T}d_{k})^{2}\) and applying the descent condition \(g_{k}^{T}d_{k}= -\Vert g_{k}\Vert ^{2}\) yields
$$\begin{aligned} \frac{\Vert d_{k}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}} = (\beta _{k}^{PKT})^{2}\frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}}+ \frac{2\xi _{k}}{\Vert g_{k}\Vert ^{2}}-\frac{\xi _{k}^{2}}{\Vert g_{k}\Vert ^{2}}. \end{aligned}$$
Since \(\beta _{k}^{PKT}\le \beta _{k}^{CD}=\frac{\Vert g_{k}\Vert ^{2}}{-g_{k-1}^{T}d_{k-1}},\) we obtain
$$\begin{aligned} \frac{\Vert d_{k}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}}&\le {} \left( \frac{\Vert g_{k}\Vert ^{2}}{-g_{k-1}^{T}d_{k-1}}\right) ^{2} \frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}}+ \frac{2\xi _{k}}{\Vert g_{k}\Vert ^{2}}-\frac{\xi _{k}^{2}}{\Vert g_{k}\Vert ^{2}} \nonumber \\&= {} \frac{\Vert g_{k}\Vert ^{4}}{(g_{k-1}^{T}d_{k-1})^{2}} \frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}}+ \frac{2\xi _{k}}{\Vert g_{k}\Vert ^{2}} -\frac{\xi _{k}^{2}}{\Vert g_{k}\Vert ^{2}} \nonumber \\& = {} \frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k-1}^{T}d_{k-1})^{2}} -\frac{1}{\Vert g_{k}\Vert ^{2}}(\xi _{k}^{2}-2\xi _{k}+1-1) \nonumber \\& = {} \frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k-1}^{T}d_{k-1})^{2}} -\frac{(\xi _{k}-1)^{2}}{\Vert g_{k}\Vert ^{2}}+\frac{1}{\Vert g_{k}\Vert ^{2}}\nonumber \\&\le {} \frac{\Vert d_{k-1}\Vert ^{2}}{(g_{k-1}^{T}d_{k-1})^{2}}+\frac{1}{\Vert g_{k}\Vert ^{2}}. \end{aligned}$$
(17)
Noting that
$$\begin{aligned} \frac{\Vert d_{0}\Vert ^{2}}{(g_{0}^{T}d_{0})^{2}}= \frac{1}{\Vert g_{0}\Vert ^{2}}, \end{aligned}$$
and using (17) recursively yields
$$\begin{aligned} \frac{\Vert d_{k}\Vert ^{2}}{(g_{k}^{T}d_{k})^{2}}\le & {} \sum _{i=0}^{k}\frac{1}{\Vert g_{i}\Vert ^{2}}. \end{aligned}$$
(18)
From (16), we have
$$\begin{aligned} \sum _{i=0}^{k } \frac{1}{\Vert g_{i}\Vert ^{2}} \le \frac{k+1}{r^{2}}. \end{aligned}$$
Thus,
$$\begin{aligned} \frac{(g_{k}^{T}d_{k})^{2}}{\Vert d_{k}\Vert ^{2}}&\ge {} \frac{r^{2}}{k+1}, \end{aligned}$$
which implies that
$$\begin{aligned} \sum _{k=0}^{\infty }\frac{(g_{k}^{T}d_{k})^{2}}{\Vert d_{k}\Vert ^{2}}&\ge {} r^{2}\sum _{k=0}^{\infty }\frac{1}{k+1}=+\infty . \end{aligned}$$
This contradicts the Zoutendjik condition (12), concluding the proof. \(\square\)

Numerical results

In this section, we analyse the numerical efficiency of our proposed \(\beta _k^{PKT}\) method, herein denoted PKT, by comparing its performance to that of Jian et al. [18], herein denoted N, and that of Alhawarat et al. [3], herein denoted AZPRP, on a set of 55 unconstrained test problems selected from [4]. We stop the iterations if either \(\Vert g_{k}\Vert \le 10^{-5}\) or a maximum of 10,000 iterations is reached. All the algorithms are coded in MATLAB R2019a. The authors for both N and AZPRP methods suggested that their algorithms will have optimal performance if they are implemented with the generalized Wolfe line search conditions (6) with the following choice of parameters: \(\sigma =0.1, \ \delta = 0.0001 \ {\text {and}}\ \sigma _{1}=1-2\delta\) for N method and \(\sigma =0.4, \ \delta = 0.0001 \ {\text {and}}\ \sigma _{1}=0.1\) for AZPRP method. Hence for our numerical experiments, we set the parameters of N and AZPRP as determined in the respective papers. For the PKT method, we implemented the strong Wolfe line search conditions with \(\delta = 0.0001 \ {\text {and}}\ \sigma =0.05.\)

Numerical results are presented in Table 1, where “Function” denotes name of test problem, “Dim” denotes dimension of test problem,“NI” denotes number of iterations, “FE” denotes number of function evaluations, “GE” denotes number of gradient evaluations, “CPU” denotes CPU time in seconds and “-” means that the method failed to solve the problem within 10,000 iterations. The bolded figures show the best performer for each problem. From Table 1, we observe that the PKT and AZPRP methods successfully solved all the problems, whereas the N method failed to solve one problem within 10,000 iterations. Moreover, the numerical results in the table indicate that the new PKT method is competitive as it is the best performer for a significant number of problems.
Table 1

Numerical results of the methods

Function

Dim

PKT

AZPRP

N

NI

FE

GE

CPU

NI

FE

GE

CPU

NI

FE

GE

CPU

Ext. QP2

10,000

6

399

171

0.307

37

289

191

0.243

33

163

83

0.220

Ext. Rosenbrock

10,000

19

101

43

0.0169

43

266

163

0.0762

71

317

149

0.0851

Ext. Penalty

500

10

49

15

0.00339

15

65

23

0.00615

24

103

25

0.00774

Ext. Beale

10,000

14

43

26

0.134

17

110

95

0.303

46

109

47

0.262

Ext. Wood

50,000

43

218

90

0.2

157

634

167

0.621

70

248

71

0.283

Ext. Denschnb

50,000

8

24

14

0.0318

9

20

11

0.0414

12

25

13

0.0484

Ext. Denschnf

20,000

13

71

31

0.0626

12

49

16

0.0361

24

86

25

0.0663

Ext. Himmelblau

50,000

9

37

15

0.0409

11

38

16

0.0495

13

41

15

0.0593

Ext. Powell

2000

41

187

106

0.197

224

693

345

0.836

947

2820

948

2.29

Ext TET

2000

6

17

11

0.00529

5

14

9

0.00415

9

19

10

0.00568

Perturbed Quad

500

122

611

241

0.116

122

489

123

0.0688

122

489

123

0.0695

DQDRTIC

10,000

10

41

14

0.0154

5

19

7

0.00603

32

109

33

0.0365

ARWHEAD

100

6

28

12

0.00204

7

27

10

0.00192

20

69

21

0.00437

QUARTC

7000

3

22

20

0.0603

10

81

80

0.178

82

449

448

1.14

Tridia

500

597

3477

1080

0.195

225

1125

284

0.0614

1496

7235

1497

0.396

LIARWHD

500

14

82

28

0.00538

26

112

32

0.00685

33

141

34

0.00836

ENGVAL1

500

19

54

31

0.00473

20

51

29

0.00446

21

45

22

0.00452

NONSCOMP

20,000

38

152

69

0.0782

38

135

64

0.0680

41

127

47

0.0758

Diagonal4

10,000

6

13

7

0.00419

6

14

8

0.0382

10

24

11

0.00871

Ext. Tridiagonal 2

1000

26

54

28

0.058

26

54

28

0.00673

29

56

30

0.0283

FLETCHCR

1000

2012

13350

4336

1.13

2986

13688

4702

1.17

4276

17273

4339

1.36

NONDIA

20,000

7

55

8

0.11

32

238

60

0.105

141

998

144

0.34

CUBE

500

1640

8511

2831

1.86

5898

26124

8561

6.05

-

-

-

-

Ext. Tridiagonal 1

20,000

12

63

53

0.302

17

76

70

0.386

334

464

450

2.95

SINQUAD

800

120

740

327

0.158

118

737

429

0.0988

379

1577

401

0.182

Almost Perturbed Quad

20,000

804

4829

808

1.7

1339

6983

1625

2.12

1314

6677

1315

3.3

Perturbed Trid Quad

50,000

1201

9608

2435

9.81

1201

7207

1202

6.76

1201

7207

1202

7.53

Cosine

5000

8

27

19

0.0196

8

27

19

0.0271

19

41

22

0.0533

FHess1

50

246

934

514

0.0686

404

1160

546

0.0894

1368

4028

1372

0.303

FHess2

500

882

5536

1202

0.775

4276

25496

4530

3.11

4166

24,977

4167

3.13

FHess3

10,000

2

13

3

0.00619

2

13

3

0.00588

2

13

3

0.00525

Ext. BD1

50,000

9

35

22

0.101

11

41

29

0.103

17

46

28

0.122

Perturbed Quad Diagonal

100,000

343

1356

393

39.9

2334

11467

2452

248

1907

9528

1908

193

Gen. Quartic

50,000

9

26

16

0.0379

6

14

8

0.0299

7

15

\(\mathbf 8\)

0.031

Quadratic QF1

500

122

610

362

0.0411

122

489

241

0.0317

196

676

197

0.0468

Quadratic QF2

500

214

1083

224

0.0615

221

893

230

0.0469

218

873

219

0.132

Diagonal 5

5000

2

5

5

0.0105

4

5

5

0.00479

4

5

5

0.00447

Diagonal 5

1000

2

5

5

0.00339

3

4

4

0.0013

3

4

4

0.00152

Diagonal 2

5000

280

1341

1332

0.965

430

7187

7186

3.31

5623

12829

12,829

10.7

Gen. Tridiagonal 1

1000

19

57

24

0.00676

20

60

24

0.00663

26

73

27

0.0088

Gen. Tridiagonal 2

1000

39

138

52

0.0108

32

105

41

0.00812

35

106

36

0.00888

Gen. PSC1

1000

12

93

85

0.0108

17

93

90

0.0123

76

387

383

0.0507

Ext. PSC1

1000

7

26

15

0.00389

9

21

11

0.00303

10

22

11

0.00336

Dixon3dq

5000

2500

5012

2513

1.10

5000

10010

5011

1.95

5000

10007

5008

2.47

Ext. Quad Penalty QP1

500

6

27

16

0.00228

9

45

29

0.00395

15

36

17

0.00337

Biggsb1

500

500

1007

508

0.100

500

1007

508

0.0926

500

1004

505

0.0988

Ext. White & Holst

400

35

202

86

0.0266

43

203

96

0.0265

148

573

149

0.0665

NONDQUAR

1000

1006

3739

3463

2.82

1691

4045

3497

3.49

5747

7715

7201

7.44

Raydan2

200

3

5

5

0.00113

3

5

5

0.000969

56

6

6

0.00112

BDQRTIC

50

59

250

85

0.0248

102

402

132

0.055

106

386

107

0.0423

Raydan1

200

84

180

94

0.0220

85

175

88

0.0146

84

171

85

0.0146

Gen. White & Holst

50

1243

7061

1990

0.625

1174

5189

1679

0.373

1461

5830

1467

0.424

Ext Quad Exponential EP1

50

2

11

3

0.00111

3

13

4

0.000937

3

13

4

0.00110

Diagonal1

10

18

48

27

0.0433

18

63

45

0.027

19

39

20

0.0204

Diagonal2

5000

280

1323

1318

1.01

471

2102

2101

1.48

5703

13,034

13,034

12.5

To further illustrate the performance of the three methods, we adopted the performance profile tool proposed by Dolan and Morè [7]. This tool evaluates and compares the performance of \(n_{s}\) solvers running on a set of \(n_{p}\) problems. The comparison between the solvers is based on the performance ratio
$$\begin{aligned} r_{p,s}=\frac{f_{p,s}}{\min (f_{p,i}: 1\le i \le n_{s})}, \end{aligned}$$
(19)
where \(f_{p,s}\) denotes either number of functions (gradient) evaluations, number of iterations or CPU time required by solver s to solve problem p. The overall evaluation of the performance of the solvers is then given by the performance profile function
$$\begin{aligned} \phi _{s}(\tau ) = \frac{1}{n_{p}} {\text {size}}\{p:1\le p \le n_{p},\ \ln (r_{p,s})\le \tau \}, \end{aligned}$$
(20)
where \(\tau \ge 0.\) If solver s fails to solve a problem p, we set the ratio \(r_{p,s}\) to some sufficiently large number.
The corresponding profiles are plotted in Figs. 1, 2, 3 and 4, where Fig. 1 shows the performance profile of number of iterations, Fig. 2 shows the performance profile of number of gradient evaluations, Fig. 3 shows the performance profile of function evaluations and Fig. 4 shows the performance profile of CPU time. The figures illustrate that the new method outperforms the AZPRP and N conjugate gradient methods.
Fig. 1

Iterations performance profile

Fig. 2

Gradient evaluations performance profile

Fig. 3

Function evaluations performance profile

Fig. 4

Cpu performance profile

Conclusion

In this paper, we developed a new hybrid conjugate gradient method that inherits the features of the famous Liu and Storey (LS), Hestenes and Stiefel (HS), Dai and Yuan (DY) and Conjugate Descent (CD) conjugate gradient methods. The global convergence of the proposed method was established under the strong Wolfe line search conditions. We compared the performance of our method with those of Jian et al. [18] and Alhawarat et al. [3] on a number of benchmark unconstrained optimization problems. Evaluation of performance based on the tool of Dolan-Mor\(\acute{e}\) [7] showed that the proposed method is both efficient and effective.

References

  1. 1.
    Abbo, K.K., Hameed, N.H.: New hybrid conjugate gradient method as a convex combination of Liu-Storey and Dixon methods. J. Multidiscip. Model. Optim. 1(2), 91–99 (2018)Google Scholar
  2. 2.
    Abdullahi, I., Ahmad, R.: Global convergence analysis of a new hybrid conjugate gradient method for unconstrained optimization problems. Malays. J. Fundam. Appl. Sci. 13(2), 40–48 (2017)Google Scholar
  3. 3.
    Alhawarat, A., Salleh, Z., Mamat, M., Rivaie, M.: An efficient modified Polak-Ribi\(\grave{e}\)re-Polyak conjugate gradient method with global convergence properties. Optim. Methods Softw. 32(6), 1299–1312 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147–161 (2008)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Babaie-Kafaki, S.: A quadratic hybridization of Polak-Ribi\(\grave{e}\)re-Polyak and Fletcher-Reeves conjugate gradient methods. J. Optim. Theory Appl. 154(3), 916–932 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Dolan, E.D., Morè, J.J.: Benchmarking optimization software with performance profiles, Math. Program., 91, 201–214 (2002)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Djordjević, S.S.: New hybrid conjugate gradient method as a convex combination of LS and FR methods. Acta Math. Sci. 39(1), 214–228 (2019)Google Scholar
  9. 9.
    Faramarzi, P., Amini, K.: A modified spectral conjugate gradient method with global convergence. J. Optim. Theory Appl. 182(2), 667–690 (2019)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fletcher, R.: Practical Methods of Optimization, Unconstrained Optimization. Wiley, New York (1987)zbMATHGoogle Scholar
  11. 11.
    Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. 28(7), 1646–1656 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hager, W.W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16(1), 170–192 (2005)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hager, W.W., Zhang, H.: The limited memory conjugate gradient method. SIAM J. Optim. 23(4), 2150–2168 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Hassan, B.A., Alashoor, H.A.: A new hybrid conjugate gradient method with guaranteed descent for unconstraint optimization. Al-Mustansiriyah J. Sci. 28(3), 193–199 (2017)CrossRefGoogle Scholar
  16. 16.
    Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49(6), 409–436 (1952)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Huang, H., Wei, Z., Yao, S.: The proof of the sufficient descent condition of the Wei–Yao–Liu conjugate gradient method under the strong Wolfe–Powell line search. Appl. Math. Comput. 189(2), 1241–1245 (2007)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Jian, J., Han, L., Jiang, X.: A hybrid conjugate gradient method with descent property for unconstrained optimization. Appl. Math. Model. 39, 1281–1290 (2015)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Khakrah, E., Razani, A., Mirzaei, R., Oveisiha, M.: Some metric characterization of well-posedness for hemivariational-like inequalities. J. Nonlinear Funct. Anal. 2017, 44 (2017)CrossRefGoogle Scholar
  20. 20.
    Khakrah, E., Razani, A., Oveisiha, M.: Pascoletti–Serafini scalarization and vector optimization with a vaiable ordering structure. J. Stat. Manag. Syst. 21(6), 917–931 (2018)CrossRefGoogle Scholar
  21. 21.
    Khakrah, E., Razani, A., Oveisiha, M.: Pseudoconvex multiobjective continuous-time problems and vector variational inequalities, Int. J. Ind. Math. 9(3) (2017), Article ID IJIM-00861, 8 pagesGoogle Scholar
  22. 22.
    Li, J., Li, X., Yang, B., Sun, X.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10, 507–518 (2015)CrossRefGoogle Scholar
  23. 23.
    Li, X., Zhao, X.: A hybrid conjugate gradient method for optimization problems. Nat. Sci. 3, 85–90 (2011)Google Scholar
  24. 24.
    Liu, J.K., Li, S.J.: New hybrid conjugate gradient method for unconstrained optimization. Appl. Math. Comput. 245, 36–43 (2014)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Li, W., Yang, Y.: A nonmonotone hybrid conjugate gradient method for unconstrained optimization. J. Inequal. Appl. 2015, 124 (2015).  https://doi.org/10.1186/s13660-015-0644-1 MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, Part 1: theory. J. Optim. Theory Appl. 69(1), 129–137 (1991)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Mtagulwa, P., Kaelo, P.: An efficient modified PRP-FR hybrid conjugate gradient method for solving unconstrained optimization problems. Appl. Numer. Math. 145, 111–120 (2019)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Polak, E., Ribi\(\acute{e}\)re, G.: Note sur la convergence de directions conjugées, Rev. Francaise Informat Recherche Opèrationelle, 3e Ann\(\acute{e}\)e, 16, 35–43 (1969)Google Scholar
  29. 29.
    Polyak, B.T.: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)CrossRefGoogle Scholar
  30. 30.
    Salih, Y., Hamoda, M.A., Rivaie, M.: New hybrid conjugate gradient method with global convergence properties for unconstrained optimization. Malays. J. Comput. Appl. Math. 1(1), 29–38 (2018)Google Scholar
  31. 31.
    Touati-Ahmed, D., Storey, C.: Efficient hybrid conjugate gradient techniques. J. Optim. Theory Appl. 64(2), 379–397 (1990)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Wei, Z., Yao, S., Liu, L.: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183(2), 1341–1350 (2006)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Xu, X., Kong, F.: New hybrid conjugate gradient methods with the generalized Wolfe line search. SpringerPlus 5(1), 881 (2016).  https://doi.org/10.1186/s40064-016-2522-9 CrossRefGoogle Scholar
  34. 34.
    Yao, S., Wei, Z., Huang, H.: A note about WYL’s conjugate gradient method and its application. Appl. Math. Comput. 191(2), 381–388 (2007)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Yuan, G., Meng, Z., Li, Y.: A modified Hestenes and Stiefel conjugate gradient algorithm for large scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168, 129–152 (2016)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Zhou, Z., Wang, Y., Wu, Q.M.J., Yang, C.N., Sun, X.: Effective and efficient global context verification for image copy detection. IEEE Trans. Inf. Forensics Secur. 12(1), 48–63 (2017)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of BotswanaGaboroneBotswana

Personalised recommendations