A Dai–Liao conjugate gradient method via modified secant equation for system of nonlinear equations

Abstract

In this paper, we propose a Dai–Liao (DL) conjugate gradient method for solving large-scale system of nonlinear equations. The method incorporates an extended secant equation developed from modified secant equations proposed by Zhang et al. (J Optim Theory Appl 102(1):147–157, 1999) and Wei et al. (Appl Math Comput 175(2):1156–1188, 2006) in the DL approach. It is shown that the proposed scheme satisfies the sufficient descent condition. The global convergence of the method is established under mild conditions, and computational experiments on some benchmark test problems show that the method is efficient and robust.

Introduction

A typical system of nonlinear equations has the general form

$$\begin{aligned} F\left( x \right) =0, \end{aligned}$$
(1)

where \(F:R^{n}\rightarrow R^{n}\) is a nonlinear mapping assumed to be continuously differentiable in a neighborhood of \(R^{n}\). Systems of nonlinear equations play important role in sciences and engineering fields; therefore, solving (1) has become a subject of interest to researchers in the aforementioned areas. Numerous algorithms or schemes have been developed for solving these systems of equations. Notable among them are the Newton and quasi-Newton schemes [14, 22, 34, 52], which converge rapidly from sufficiently good starting point. However, the requirement for computation and storage of the Jacobian matrix or an approximation of it at each iteration makes the two methods unattractive for large-scale nonlinear systems [51].

The ideal method for solving large-scale systems is the conjugate gradient (CG) method, which forms an important class of algorithms used in solving large-scale unconstrained optimization problems. The method is popular with mathematicians and engineers engaged in large-scale problems because of it low memory requirement and strong global convergence properties [19]. Generally, the nonlinear conjugate gradient method is used to solve large-scale problems in the following form;

$$\begin{aligned} \hbox {min}f\left( x \right) ,\quad x\in R^{n}, \end{aligned}$$
(2)

where \(f:R^{n}\rightarrow R\) is a continuously differentiable function that is bounded from below and its gradient is available. The method generates a sequence of iterates \(x_k\) from an initial point \(x_0 \in R^{n}\) using the iterative formula

$$\begin{aligned} x_{k+1} =x_k +s_k ,\quad s_k =\alpha _k d_k ,\quad k=0,1,\ldots , \end{aligned}$$
(3)

where \(x_k\) is the current iterate, \(\alpha _k >0\) is a step length computed using suitable line search technique, and \(d_k\) is the CG search direction defined by

$$\begin{aligned} d_k =\left( \begin{array}{ll} -F_k ,&{}\quad \text {if }\quad k=0, \\ -F_k +\beta _k d_{k-1} ,&{}\quad \text {if }\quad k\ge 1, \\ \end{array}\right. \end{aligned}$$
(4)

where \(\beta _k\) is a scalar known as the CG update parameter, and \(F_k =\nabla f\left( {x_k } \right) \). It is worth noting that a crucial element in any CG algorithm is the formula definition of the update parameter \(\beta _k\) [4], which is why different CG algorithms corresponding to different choices of \(\beta _k\) in (4) have been proposed (see [8, 10,11,12,13,14, 17, 33, 50, 51, 53, 65]).

Also, some of the CG methods for unconstrained optimization are not globally convergent, so efforts have been made by researchers to develop CG methods that are not only globally convergent but also are numerically efficient. These new methods are based on secant equations. For nonlinear conjugate gradient methods, the conjugacy condition is given by

$$\begin{aligned} d_k^T y_{k-1} =0. \end{aligned}$$
(5)

Perry [44] extended (5) by exploiting the following secant condition of quasi-Newton schemes:

$$\begin{aligned} B_k s_{k-1} =y_{k-1} , \end{aligned}$$
(6)

and quasi-Newton search direction \(d_k\) given by

$$\begin{aligned} B_k d_k =-F_k , \end{aligned}$$
(7)

where \(B_k\) is a square matrix, which approximates the Hessian \(\nabla ^{2}f\left( x \right) \). By using (6) and (7), Perry gave an extension of (5) as:

$$\begin{aligned} d_k^T y_{k-1} =-F_k^T s_{k-1} , \end{aligned}$$
(8)

and using (4), the Perry search direction is given as

$$\begin{aligned} d_k =\left( \begin{array}{ll} -F_k ,&{} \quad \text {if }\quad k=0, \\ -P_k F_k =-F_k +\beta _k^P d_{k-1} ,&{} \quad \text {if }\quad k\ge 1, \\ \end{array}\right. \end{aligned}$$
(9)

where

$$\begin{aligned} B_k^P =\frac{(y_{k-1} -s_{k-1} )^{T}}{s_{k-1}^T y_{k-1} }F_k , \end{aligned}$$
(10)

and

$$\begin{aligned} P_k =I-\frac{s_{k-1} (y_{k-1} -s_{k-1} )^{T}}{s_{k-1}^T y_{k-1} }. \end{aligned}$$
(11)

Following Perry’s approach, Dai and Liao [18] incorporated a nonnegative parameter t to propose the following extension of (8):

$$\begin{aligned} d_k^T y_{k-1} =-tF_k^T s_{k-1} . \end{aligned}$$
(12)

It is noted that for \(t=0\), (12) reduces to (5), and if \(t=1\), we obtain Perry’s condition (8). Consequently, by substituting (4) into (12), Dai and Liao [18] proposed the following CG update parameter:

$$\begin{aligned} B_k^{DL} =\frac{(y_{k-1} -ts_{k-1} )^{T}F_k }{d_{k-1}^T y_{k-1} },\quad t\ge 0. \end{aligned}$$
(13)

Numerical results have shown that the DL method is effective; however, it is much dependent on the nonnegative parameter t for which there is no optimal value [4], and it may not necessarily generate descent directions [8]. That is, the method may not satisfy the descent condition

$$\begin{aligned} F_k^T d_k <0,\quad \forall k, \end{aligned}$$
(14)

or the sufficient descent condition, namely there exists a constant \(\lambda >0\) such that

$$\begin{aligned} F_k^T d_k \le -\lambda \parallel F_k \parallel ^{2},\quad \forall k. \end{aligned}$$
(15)

Based on the DL conjugacy condition (12), conjugate gradient methods have been proposed over the years using modified secant equations. For example, Babaie-Kafaki et al. [13] and Yabe and Takano [55] proposed CG methods by applying a revised form of the modified secant equation proposed by Zhang and Xu [63] and Zhang et al.[64] and the modified secant equation proposed by Li and Fukushima [36]. Li et al.[37] applied the modified secant equation proposed by Wei et al. [54], while Ford et al. [26] employed the multi-step quasi-Newton conditions proposed by Ford and Moghrabi [27, 28]. CG methods based on modified secant equations have also been studied by Narushima and Yabe [57] and Reza Arazm et al. [7]. These methods have been found to be numerically efficient and globally convergent under suitable conditions, but like the DL method, they also fail to ensure sufficient descent.

Recently, by employing Perry’s idea [44], efficient CG methods with descent directions have been proposed. Liu and Shang [39] proposed a Perry conjugate gradient method, which provides prototypes for developing other special form of the Perry method like the HS method and the DL method [18]. Liu and Xu [40] presented a new Perry CG method with sufficient descent properties, which is independent of any line search. Also, based on the self-scaling memoryless BFGS update, Andrei [6] proposed an accelerated adaptive class of Perry conjugate gradient algorithms, whose search direction is determined by symmetrization of the scaled Perry CG direction [44].

CG methods for systems of nonlinear equations are rare as most of the methods are for unconstrained optimization. However, over the years, the method has been extended to large-scale nonlinear systems of equations by researchers. Using a combination of the Polak–Ribieré–Polyak (PRP) conjugate gradient method for unconstrained optimization [45, 47] and the hyperplane projection method of Solodov and Svaiter [48], Cheng [16] proposed a PRP-type method for systems of monotone equations. Yu [58, 59] extended the PRP method [45] to solve large-scale nonlinear systems with monotone line search strategies, which are modifications of the Grippo–Lampariello–Lucidi [29] and Li–Fukushima [35] schemes. As a further research of the Perry’s conjugate gradient method, Dai et al. [21] combined the modified Perry conjugate gradient method [41] and the hyperplane projection technique of Solodov and Svaiter [48] to propose a derivative-free method for solving large-scale nonlinear monotone equations. By combining the descent Dai–Liao CG method by Babaie-Kafaki and Ghanbari [54] and the projection method in [48], Abubakar and Pumam [2] proposed a descent Dai–Liao CG method for nonlinear equations. Numerical results show the method to be efficient. Based on the projection strategy [48], Liu and Feng [38] proposed a derivative-free iterative method for large-scale nonlinear monotone equations, which can be used to solved large-scale non-smooth problems due to its lower storage and derivative-free information. Abubakar and Kumam [1] proposed an improved three-term derivative-free method for solving large-scale nonlinear equations. The method is based on a modified HS method with the projection technique of Solodov and Svaiter [48]. Abubakar et al. [3] proposed a descent Dai–Liao CG method for solving nonlinear convex constraint monotone equations. The method is an extension of the method in [2]. By using a convex combination of two different positive spectral coefficients, Mohammed and Abubakar [42] proposed a combination of positive spectral gradient-like method and projection method for solving nonlinear monotone equations. Awwal et al. [43] proposed a hybrid spectral gradient algorithm for system of nonlinear monotone equations with convex constraints. The scheme is combination of a convex combination of two different positive spectral parameters and the projection technique.

Here, based on the work of Babaie-Kafaki and Ghanbari [9], and the Dai–Liao (DL) [18] approach, we propose a Dai–Liao conjugate gradient method for system of nonlinear equations by incorporating an extended secant equation in the classical DL update.

Throughout this work, we use \(\parallel .\parallel \) to denote the Euclidean norm of vectors, \(y_{k-1} =F_k -F_{k-1} ,s_{k-1} =x_k -x_{k-1}\) and \(F_k =F\left( {x_k } \right) \). We also assume that problem (1) is Lipschitz continuous and f in (2) is specified by

$$\begin{aligned} f\left( x \right) :=\frac{1}{2}\parallel F\left( x \right) \parallel ^{2}. \end{aligned}$$
(16)

The paper is organized as follows: in Sect. 2, we present details of the method. Convergence analysis is presented in Sect. 3. Numerical results of the method are presented in Sect. 4. Finally, conclusions are made in Sect. 5.

Proposed method and its algorithm

Following the Dai–Liao approach, Babaie-Kafaki and Ghanbari [9] proposed the following extension of the PRP update parameter

$$\begin{aligned} \beta _k^{\mathrm{EPRP}} =\beta _k^{\mathrm{PRP}} -t\frac{F_k^T d_{k-1} }{\parallel F_{k-1} \parallel ^{2}}, \end{aligned}$$
(17)

where \(\beta _k^{PRP}\) is the classical PRP parameter and t is a nonnegative parameter, whose values were determined by carrying out eigenvalue analysis. Motivated by this, and employing similar approach, we propose a modification of the classical DL update parameter. In what follows, we suggest an extension of some previously modified secant equations.

By expanding (6), Zhang et al. [64] proposed the following modified secant equation

$$\begin{aligned} B_k s_{k-1} =\hat{y}_{k-1} ,\quad \hat{y}_{k-1} =y_{k-1} +\left( {\frac{\theta _{k-1} }{s_{k-1}^T \mu _{k-1} }} \right) \mu _{k-1} , \end{aligned}$$
(18)

where

$$\begin{aligned} \theta _{k-1} =6\left( {f_{k-1} -f_k } \right) +3s_{k-1}^T \left( {F_{k-1} +F_k } \right) , \end{aligned}$$
(19)

where \(\mu _{k-1} \in R^{n}\) is a vector parameter such that \(s_{k-1}^T \mu _{k-1} \ne 0\) (see [64]).

Similarly, Wei et al. [54] gave the following modified secant equation

$$\begin{aligned} B_k s_{k-1} =\bar{y}_{k-1} ,\quad \bar{y}_{k-1} =y_{k-1} +\left( {\frac{\vartheta _{k-1} }{s_{k-1}^T \mu _{k-1} }} \right) \mu _{k-1} , \end{aligned}$$
(20)

with

$$\begin{aligned} \vartheta _{k-1} =2\left( {f_{k-1} -f_k } \right) +s_{k-1}^T \left( {F_{k-1} +F_k } \right) , \end{aligned}$$
(21)

where \(\mu _{k-1} \in R^{n}\) is a vector parameter such that \(s_{k-1}^T \mu _{k-1} \ne 0\) (see [60]). Also, in (18) and (20), the vector parameter \(\mu _{k-1} =s_{k-1}\) [55].

Here, we propose the following secant equation as an extension of (6), (18), and (20):

$$\begin{aligned} B_k s_{k-1} =u_{k-1} =y_{k-1} +2\phi \frac{\vartheta _{k-1} }{s_{k-1}^T \mu _{k-1} }\mu _{k-1} , \end{aligned}$$
(22)

where \(\phi \) is a nonnegative parameter, \(\vartheta _{k-1}\) is defined by (21) and \(s_{k-1}^T \mu _{k-1} \ne 0\). We observe that for \(\phi =0\), (22) becomes the standard secant equation defined by (6), and if \(\phi =\frac{3}{2}\), (22) reduces to (19). Also, for \(\phi =\frac{1}{2}\), we see that (22) reduces to the modified secant equation proposed by Zhang et al. [64]. Substituting \(u_{k-1}\) in (22) for \(y_{k-1}\) in (13), we obtain the following version of the DL update parameter:

$$\begin{aligned} \bar{\beta }_k^{\mathrm{ADL}} =\frac{(u_{k-1} -ts_{k-1} )^{T}F_k }{d_{k-1}^T u_{k-1} },\quad t\ge 0. \end{aligned}$$
(23)

Observe that, in general, the denominator, \(d_{k-1}^T u_{k-1}\) may not be nonzero since \(\vartheta _{k-1}\) as defined in (22) may be non-positive. Therefore, we redefine \(u_{k-1}\) and obtain its revised form as

$$\begin{aligned} z_{k-1} =y_{k-1} +2\phi \frac{\hbox {max}\left\{ {\vartheta _{k-1} ,0} \right\} }{s_{k-1}^T \mu _{k-1} }\mu _{k-1} . \end{aligned}$$
(24)

Consequently, we get the revised form of (23) as

$$\begin{aligned} \hat{\beta }_k^{\mathrm{ADL}} =\frac{z_{k-1}^T F_k }{d_{k-1}^T z_{k-1} }-t\frac{s_{k-1}^T F_k }{d_{k-1}^T z_{k-1} }. \end{aligned}$$
(25)

Andrei [4] noted that the parameter t has no optimal choice and so, to obtain descent directions for our proposed method, we proceed to obtain appropriate values for t. From (4), and after some algebra, our search direction becomes:

$$\begin{aligned} d_k =-F_k +\left( {\frac{s_{k-1} z_{k-1}^T -ts_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }} \right) F_k . \end{aligned}$$
(26)

Following Perry’s approach [44], search direction of our proposed method can be written as

$$\begin{aligned} d_k =-H_k F_k ,\quad k\ge 1, \end{aligned}$$
(27)

where \(H_k \), called the search direction matrix is given by

$$\begin{aligned} H_k =I-\frac{s_{k-1} z_{k-1}^T }{s_{k-1}^T z_{k-1} }+t\frac{s_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }, \end{aligned}$$
(28)

and \(z_{k-1}\) is as defined by (24). And from (27) we can write

$$\begin{aligned} d_k^T F_k =-F_k^T H_k^T F_k =d_k^T F_k =-F_k^T \frac{H_k^T +H_k }{2}F_k , \end{aligned}$$
(29)

where

$$\begin{aligned} \bar{H}_k= & {} \frac{H_k^T +H_k }{2} \nonumber \\= & {} I-\frac{1}{2}\frac{s_{k-1} z_{k-1}^T +z_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }+t\frac{s_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} } \end{aligned}$$
(30)

Proposition 2.1

The matrix \(\bar{H}_k\) defined by (30) is a symmetric matrix.

Proof

Using direct computation, we see that \(\bar{H}_k =\bar{H}_k^T \). Hence, \(\bar{H}_k\) is symmetric.

And so, to analyze the descent property of our method, we need to find eigenvalues of \(\bar{H}_k\) and their structure.    \(\square \)

Theorem 2.2

Let the matrix \(\bar{H}_k\) be defined by (30). Then, the eigenvalues of \(\bar{H}_k\) consist of 1 with (\(n-2\) multiplicity), \(\lambda _k^+\) and \(\lambda _k^-\), where

$$\begin{aligned} \lambda _k^+= & {} \frac{1}{2}\left[ {\left( {1+a_k } \right) +\sqrt{(a_k -1)^{2}+b_k -1}} \right] \end{aligned}$$
(31)
$$\begin{aligned} \lambda _k^-= & {} \frac{1}{2}\left[ {\left( {1+a_k } \right) -\sqrt{(a_k -1)^{2}+b_k -1}} \right] \end{aligned}$$
(32)

and \(a_k =t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1}}\), \(\quad b_k =\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}\).

Furthermore, all eigenvalues of \(\bar{H}_k\) are positive real numbers.

Proof

Since \(d_{k-1}^T z_{k-1} \ne 0\), then \(s_{k-1}^T z_{k-1} \ne 0\). And so, \(s_{k-1} \ne 0\) and \(z_{k-1} \ne 0\), which implies that the vectors \(s_{k-1}\) and \(z_{k-1}\) are nonzero vectors. Suppose V is the vector space spanned by \(\left\{ {s_{k-1} ,z_{k-1} } \right\} \). Then \(dim\left( V \right) \le 2\) and \(dim\left( {V^{\bot }} \right) \ge n-2\), where \(V^{\bot }\) is the orthogonal complement of V. Therefore, there exists a set of mutually orthogonal vectors \(\{\tau _{k-1}^i \}_{i=1}^{n-2} \subset V^{\bot }\) satisfying

$$\begin{aligned} s_{k-1}^T \tau _{k-1}^i =z_{k-1}^T \tau _{k-1}^i =0. \end{aligned}$$
(33)

By multiplying both sides of (30) by \(\tau _{k-1}^i\), we obtain

$$\begin{aligned} \bar{H}_k \tau _{k-1}^i =\tau _{k-1}^i ,\quad i=1,\ldots ,n-2, \end{aligned}$$
(34)

which can be viewed as an eigenvector equation. So, \(\tau _{k-1}^i \), for \(i=1,\ldots ,n-2\) are the eigenvectors of \(\bar{H}_k\) with eigenvalue 1 each. Let \(\lambda _k^+\) and \(\lambda _k^-\) be the remaining two eigenvalues, respectively. Observe that (30) can be written as

$$\begin{aligned} \bar{H}_k =I-\frac{s_{k-1} (z_{k-1} -2ts_{k-1} )^{T}}{2s_{k-1}^T z_{k-1} }-\frac{z_{k-1} s_{k-1}^T }{2s_{k-1}^T z_{k-1} }. \end{aligned}$$
(35)

Clearly, \(\bar{H}_k\) represents a rank-two update, so from the fundamental algebra formula (see inequality \(\left( {1.2.70} \right) )\) of [49]

$$\begin{aligned} det( {I+u_1 u_2^T +u_3 u_4^T } )= ( {1+u_1^T u_2 } )( {1+u_3^T u_4 } )- ( {u_1^T u_4 } )( {u_2^T u_3 } ), \end{aligned}$$
(36)

where

$$\begin{aligned} u_1= & {} -\frac{s_{k-1} }{2s_{k-1}^T z_{k-1} }, \quad u_2 =\left( {z_{k-1} -2ts_{k-1} } \right) ,\quad u_3 =-\frac{z_{k-1} }{2s_{k-1}^T z_{k-1} },\quad u_4 =s_{k-1} \nonumber \\ det\left( {\bar{H}_k } \right)= & {} \frac{1}{4}+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{1}{4}\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}. \end{aligned}$$
(37)

Since sum of the eigenvalues of a square symmetric matrix equals to its trace, from (30), we have

$$\begin{aligned} \text {trace}\left( {\bar{H}_k } \right)= & {} n-1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} } \nonumber \\= & {} \mathop {\underbrace{1+\cdots +1}}\limits _{\left( {{n}-2} \right) \mathrm{times}} +\lambda _k^+ +\lambda _k^- , \end{aligned}$$
(38)

for which we obtain

$$\begin{aligned} \lambda _k^+ +\lambda _k^- =1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }. \end{aligned}$$
(39)

Using the relationship between trace and determinant of a matrix and its eigenvalues, we can obtain \(\lambda _k^+\) and \(\lambda _k^-\) as roots of the following quadratic polynomial:

$$\begin{aligned} \lambda ^{2}-\left( {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }} \right) \lambda +\frac{1}{4}+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{1}{4}\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}=0. \end{aligned}$$
(40)

So, the remaining two eigenvalues are obtained from (40). And applying the quadratic formula with some rearrangements, we obtain

$$\begin{aligned} \lambda _k^\pm =\frac{1}{2}\left[ {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }\pm \sqrt{\left( {t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-1} \right) ^{2}+\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}-1}} \right] \end{aligned}$$
(41)

We can write (41) as

$$\begin{aligned} \lambda _k^\pm =\frac{1}{2}\left[ {\left( {1+a_k } \right) \pm \sqrt{(a_k -1)^{2}+b_k -1}} \right] , \end{aligned}$$
(42)

which proves (31) and (32).

To obtain \(\lambda _k^+\) and \(\lambda _k^-\) as real numbers, we must have \(\Delta =(a_k -1)^{2}+b_{k-1} \ge 0\).

From Cauchy inequality, \(b_k =\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}\ge 1\), so, \(\Delta >0\). Consequently, both eigenvalues are real numbers and \(\lambda _k^+ >0\) since \(\left( {1+a_k } \right) \) is nonnegative. And to obtain \(\lambda _k^- >0\), the following must be satisfied:

$$\begin{aligned} \frac{1}{2}\left[ {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\sqrt{\left( {t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-1} \right) ^{2}+\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}-1}} \right] >0. \end{aligned}$$
(43)

After some algebra, we obtain the following estimation for the parameter t, which satisfies (43):

$$\begin{aligned} t>\frac{1}{4}\left( {\frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right) . \end{aligned}$$
(44)

So, \(\lambda _k^- >0\) if (44) is satisfied. In addition, for t satisfying (44), \(\bar{H}_k\) is nonsingular.

Therefore, all the eigenvalues of the symmetric matrix \(\bar{H}_k\) are positive real numbers, which ensures that it is a positive-definite matrix. Moreover, using (42) and (44), we obtain the following estimation for \(\lambda _k^+\) and \(\lambda _k^- \):

$$\begin{aligned} \lambda _k^+ \ge \left( {\frac{3(s_{k-1}^T z_{k-1} )^{2}+\parallel z_{k-1} \parallel ^{2}\parallel s_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}} \right) ,\quad \lambda _k^- >0. \end{aligned}$$
(45)

And the proof is complete. Hence, from (29), we have

$$\begin{aligned} d_k^T F_k =-F_k^T \bar{H}_k F_k \le -\lambda _k^- \parallel F_k \parallel ^{2}<0, \end{aligned}$$
(46)

which shows that the descent condition is satisfied. We, therefore, propose the following formula for the parameter t in the modified DL method:

$$\begin{aligned} t^{\mathrm{ADL}}=\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}, \end{aligned}$$
(47)

where \(\xi >\frac{1}{4}\) and \(\gamma <\frac{1}{4}\). \(\square \)

Remark 2.3

Since the DL parameter t is nonnegative, we restrict the values of the parameter \(\gamma \) in (47) to be negative so as to avoid a numerically unreasonable approximation [32]. So, based on the above remark, we can write the modified DL update parameter as

$$\begin{aligned} \beta _k^{\mathrm{ADL}} =\frac{F_k^T z_{k-1} }{d_{k-1}^T z_{k-1} }-t^{\mathrm{ADL}}\frac{F_k^T s_{k-1} }{d_{k-1}^T z_{k-1} }, \end{aligned}$$
(48)

with \(\xi \ge \frac{1}{4}\) and \(\gamma <0\) satisfying (47) and guaranteeing the descent condition. We also write the search direction for the proposed method as

$$\begin{aligned} d_k^{\mathrm{ADL}} =-F_k +\left( {\frac{(z_{k-1} -t_k^{\mathrm{ADL}} s_{k-1} )^{T}F_k }{d_{k-1}^T z_{k-1} }} \right) d_{k-1}. \end{aligned}$$
(49)

We use the derivative-free line search proposed by Li and Fukushima [34] to compute our step length \(\alpha _k \).

Let \(\sigma _1 >0\), \(\sigma _2 >0\) and \(r\in \left( {0,1} \right) \) be constants and let \(\left\{ {\eta _k } \right\} \) be a given positive sequence such that

$$\begin{aligned} \mathop {\sum }\limits _{k=0}^\infty \eta _k<\eta <\infty , \end{aligned}$$
(50)

and

$$\begin{aligned} \parallel F_{k+1} \parallel ^{2}-\parallel F_k \parallel ^{2}\le -\sigma _1 \parallel \alpha _k F_k \parallel ^{2}-\sigma _2 \parallel \alpha _k d_k \parallel ^{2}+\eta _k \parallel F_k \parallel ^{2}. \end{aligned}$$
(51)

Let \(i_k\) be the smallest non-negative integer i such that (51) holds for \(\alpha =r^{i}\). Let \(\alpha _k =r^{i_k }.\)

Now, we describe the algorithm of the proposed method as follows:

Algorithm 2.4

A Dai–Liao CG method (ADLCG)

Step 1 :

Given \(\varepsilon >0\), choose an initial point \(x_0 \in R^{n}\), a positive sequence \(\left\{ {\eta _k } \right\} \) satisfying (50), and constants \(r\in \left( {0,1} \right) ,\sigma _1 ,\sigma _2 >0\), \(\xi \ge \frac{1}{4}\), \(\gamma <0\). Compute \(d_0 =-F_0\) and set \(k=0\).

Step 2 :

Compute \(F\left( {x_k } \right) \). If \(\parallel F\left( {x_k } \right) \parallel \le \varepsilon \), stop. Otherwise, compute the search direction \(d_k\) by (49).

Step 3 :

Compute \(\alpha _k\) via the line search in (51).

Step 4 :

Set \(x_{k+1} =x_k +\alpha _k d_k \).

Step 5 :

Set \(k:=k+1\) and go to Step 2.

Convergence analysis

The following assumptions are required to analyze the convergence of the ADLCG algorithm.

Assumption 3.1

The level set

$$\begin{aligned} \Omega =\{x|F\left( x \right) \le F\left( {x_0 } \right) \} \end{aligned}$$
(52)

is bounded.

Assumption 3.2

  1. (1)

    The solution set of problem (1) is not empty.

  2. (2)

    F is continuously differentiable on an open convex set \(\Phi _1\) containing \(\Phi \).

  3. (3)

    F is Lipschitz continuous in some neighborhood N of \(\Phi \); namely, there exists a positive constant \(L>0\) such that,

    $$\begin{aligned} \parallel F\left( x \right) -F\left( y \right) \parallel \le L\parallel x-y\parallel , \end{aligned}$$
    (53)

    for all \(x,y\in N\).

    Assumption (3.1) and condition (3) imply that there exists a positive constant \(\omega \) such that

    $$\begin{aligned} \parallel F\left( {x_k } \right) \parallel \le \omega , \end{aligned}$$
    (54)

    for all \(x\in \Phi \), (see Proposition 1.3 of [13]).

  4. (4)

    The Jacobian of F is bounded, symmetric and positive-definite on \(\Phi _1 \), which implies that there exist constants \(m_2 \ge m_1 >0\) such that

    $$\begin{aligned} \parallel F^{\prime }\left( x \right) \parallel \le m_2 ,\quad \forall x\in \Phi _1 , \end{aligned}$$
    (55)

    and

    $$\begin{aligned} m_1 \parallel d\parallel ^{2}\le d^{T}F^{\prime }\left( x \right) d,\quad \forall x\in \Phi _1 ,d\in R^{n}. \end{aligned}$$
    (56)

Lemma 3.3

Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then \(d_k\) is a descent direction for \(F\left( {x_k } \right) \) at \(x_k\). i.e.,

$$\begin{aligned} F(x)^{T}d_k <0. \end{aligned}$$
(57)

Proof

By (46), the Lemma is true and we can deduce that the norm function \(f\left( {x_k } \right) \) is a descent along the direction \(d_k \). i.e., \(\parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \) is true \(\forall k\). \(\square \)

Lemma 3.4

Suppose Assumptions 3.1 and 3.2 hold. Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then \(\left\{ {x_k } \right\} \subset \Omega \). Moreover, \(\parallel F_k \parallel \}\) converges.

Proof

By Lemma 3.3, we have \(\{\parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \). So, by Lemma 3.3 in [20], we conclude that \(\left\{ {\parallel F_k \parallel } \right\} \) converges. Moreover, for all k, we have

$$\begin{aligned} \parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \le \parallel F\left( {x_{k-1} } \right) \parallel \cdots \le \parallel F\left( {x_0 } \right) \parallel . \end{aligned}$$
(58)

This implies that \(\left\{ {x_k } \right\} \subset \Omega \)\(\square \)

Lemma 3.5

Suppose Assumption 3.1 and 3.2 hold. Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel \alpha _k d_k \parallel =\mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel s_k \parallel =0, \end{aligned}$$
(59)

and

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel \alpha _k F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(60)

Proof

From the line search (51) and for all \(k>0\), we obtain

$$\begin{aligned} \sigma _2 \parallel \alpha _k d_k \parallel ^{2}\le & {} \sigma _1 \parallel \alpha _k F_k \parallel ^{2}+\sigma _2 \parallel \alpha _k d_k \parallel ^{2} \nonumber \\\le & {} \parallel F_k \parallel ^{2}-\parallel F_{k+1} \parallel ^{2}+\eta _k \parallel F_k \parallel ^{2}. \end{aligned}$$
(61)

And by summing up the above k inequality, we obtain

$$\begin{aligned} \sigma _2 \mathop {\sum }\limits _{i=0}^k \parallel \alpha _k d_k \parallel ^{2}\le & {} \mathop {\sum }\limits _{i=0}^k \left( {\parallel F\left( {x_i } \right) \parallel ^{2}-\parallel F\left( {x_{i+1} } \right) \parallel ^{2}} \right) +\mathop {\sum }\limits _{i=0}^k \eta _i \parallel F\left( {x_i } \right) \parallel ^{2} \nonumber \\= & {} \parallel F\left( {x_0 } \right) \parallel ^{2}-\parallel F\left( {x_{k+1} } \right) \parallel ^{2}+\mathop {\sum }\limits _{i=0}^k \eta _i \parallel F\left( {x_i } \right) \parallel ^{2}\nonumber \\\le & {} \parallel F\left( {x_0 } \right) \parallel ^{2}+\parallel F\left( {x_0 } \right) \parallel ^{2}\mathop {\sum }\limits _{i=0}^k \eta _i \nonumber \\\le & {} \parallel F\left( {x_0 } \right) \parallel ^{2}+\parallel F\left( {x_0 } \right) \parallel ^{2}\mathop {\sum }\limits _{i=0}^\infty \eta _i . \end{aligned}$$
(62)

Therefore, by (52) and since \(\left\{ {\eta _i } \right\} \) satisfies (50), then the series \(\mathop {\sum }\nolimits _{i=0}^k \parallel \alpha _k d_k \parallel ^{2}\) is convergent, which implies that (59) holds. Using the same argument as above, with \(\sigma _1 \parallel \alpha _k F\left( {x_k } \right) \parallel ^{2}\) on the left-hand sides, we obtain (60). \(\square \)

Lemma 3.6

[62] Suppose Assumptions 3.1 and 3.2 hold and \(\left\{ {x_k } \right\} \) be generated by Algorithm 2.4. Then, there exists a constant \(m>0\) such that,

$$\begin{aligned} y_k^T s_k \ge m\parallel s_k \parallel ^{2}>0,\quad \forall k\ge 1. \end{aligned}$$
(63)

Proof

By mean-value theorem, we have

$$\begin{aligned} y_k^T s_k =s_k^T \left( {F\left( {x_{k+1} } \right) -F\left( {x_k } \right) } \right) =s_k^T {F}'', \end{aligned}$$
(64)

where \(\varphi =\lambda x_k +\left( {1-\lambda } \right) x_{k+1}\), for some \(\lambda \in \left( {0,1} \right) \). We obtain the last inequality from (56). Letting \(m_1 =m\), the proof is established. \(\square \)

Lemma 3.7

Suppose Assumptions 3.1 and 3.2 hold. Let the sequence \(\left\{ {x_k } \right\} \) be generated by Algorithm 2.4 with update parameter \(\beta _k^{\mathrm{ADL}} \). Then, there exists \(M>0\) such that

$$\begin{aligned} \parallel d_k^{\mathrm{ADL}} \parallel \le M,\quad \forall k. \end{aligned}$$
(65)

Proof

Using (24) and (64), we get

$$\begin{aligned} s_{k-1}^T z_{k-1} =s_{k-1}^T y_{k-1} +2\phi \frac{\hbox {max}\left\{ {\vartheta _{k-1} ,0} \right\} }{s_{k-1}^T \mu _{k-1} }s_{k-1}^T \mu _{k-1} \ge s_{k-1}^T y_{k-1} \ge m\parallel s_{k-1} \parallel ^{2}. \end{aligned}$$
(66)

Applying the mean-value theorem, we have

$$\begin{aligned} \left| {\vartheta _{k-1} } \right|= & {} |2\left( {f_k -f_{k+1} } \right) +(F_{k-1} +F_k )^{T}s_{k-1} | \nonumber \\= & {} |(-2\nabla f\left( \varphi \right) +\nabla f\left( {x_k } \right) +\nabla f\left( {x_{k+1} } \right) )^{T}s_{k-1} |, \end{aligned}$$
(67)

where \(\varphi =\lambda x_k +\left( {1-\lambda } \right) x_{k+1} \), for some \(\lambda \in \left( {0,1} \right) \).

Hence from (53), we have

$$\begin{aligned} \left| {\vartheta _{k-1} } \right|\le & {} (\parallel \nabla f\left( {x_k } \right) -\nabla f\left( \varphi \right) \parallel +\parallel \nabla f\left( {x_{k+1} -\nabla f\left( \varphi \right) \parallel } \right) \parallel s_{k-1} \parallel \nonumber \\\le & {} \left( {L\left( {1-\lambda } \right) \parallel s_{k-1} \parallel +L\lambda \parallel s_{k-1} \parallel } \right) \parallel s_{k-1} \parallel \nonumber \\= & {} L\parallel s_{k-1} \parallel ^{2}. \end{aligned}$$
(68)

Utilizing (24), (53), (68), and setting \(\mu _{k-1} =s_{k-1} \), we obtain

$$\begin{aligned} \parallel z_{k-1} \parallel\le & {} \parallel y_{k-1} \parallel +2\phi \frac{\left| {\vartheta _{k-1} } \right| }{\left| {s_{k-1}^T s_{k-1} } \right| }\parallel s_{k-1} \parallel \nonumber \\\le & {} L\parallel s_{k-1} \parallel +2\phi L\frac{\parallel s_{k-1} \parallel ^{2}}{\parallel s_{k-1} \parallel ^{2}}\parallel s_{k-1} \parallel \nonumber \\= & {} \left( {L+2\phi L} \right) \parallel s_{k-1} \parallel . \end{aligned}$$
(69)

And using (47), (53) and (69), we get

$$\begin{aligned} \left| {t^{\mathrm{ADL}}} \right|= & {} \left| {\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right| \nonumber \\\le & {} \left| {\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }} \right| +\left| {\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right| \nonumber \\\le & {} \xi \frac{\left( \left( {L+2\phi L} \right) \parallel s_{k-1} \parallel \right) ^{2}}{m\parallel s_{k-1} \parallel ^{2}}+\left| \gamma \right| \frac{m\parallel s_{k-1} \parallel ^{2}}{\parallel s_{k-1} \parallel ^{2}} \nonumber \\= & {} \xi \frac{(L+2\phi L)^{2}}{m}+m\left| \gamma \right| . \end{aligned}$$
(70)

By utilizing (4), (47), (48), (69) and (70) we obtain,

$$\begin{aligned} \Vert d_{k}^{\mathrm{ADL}}\Vert= & {} \Vert -F(x_{k})+\beta _{k}^{\mathrm{ADL}}d{k-1}\Vert \nonumber \\\le & {} \Vert F(x_{k})\Vert +|\beta _{k}^{\mathrm{ADL}}|\Vert d{k-1}\Vert \nonumber \\= & {} \Vert F(x_{k})\Vert +\frac{\Vert F(x_k)\Vert \Vert z_{k-1}\Vert }{s_{k-1}^Tz_{k-1}}\Vert s_{k-1}\Vert +|t^{\mathrm{ADL}}|\frac{\Vert F(x_k)\Vert \Vert s_{k-1}\Vert }{s_{k-1}^Tz_{k-1}}\Vert s_{k-1}\Vert \nonumber \\\le & {} \Vert F(x_{k})\Vert +\frac{\Vert F(x_k)\Vert \Vert (L+2\phi L)}{m} +\left( \xi \frac{(L+2\phi L)^2}{m} +m|\gamma |\right) \frac{\Vert F(x_k)\Vert }{m} \nonumber \\= & {} \left( 1+\frac{(L+2\phi L)}{m}+\left( \xi \frac{(L+2\phi L)^2}{m^2}+|\gamma | \right) \right) \Vert F(x_k)\Vert \nonumber \\= & {} \frac{(m^2+m(L+2\phi L)+((L+2\phi L)^2\xi +|\gamma |))\Vert F(x_k)\Vert }{m^2} \nonumber \\= & {} \frac{c_1\Vert F(x_k)\Vert }{m^2}, \end{aligned}$$
(71)

where \(c_1 =(m^{2}+m\left( {L+2\phi L} \right) +\left( {\left( {L+2\phi L)^{2}\xi +\left| \gamma \right| } \right) } \right) \).

Setting \(M:=\frac{c_1 \parallel F\left( {x_k } \right) \parallel }{m^{2}}\), we obtain the required result.

In the next, we prove the global convergence of the ADLCG method. \(\square \)

Theorem 3.8

Suppose Assumption 3.1 and 3.2 hold and that the sequence \(\left\{ {x_k } \right\} \) is generated by Algorithm 2.4. Also, assume that for all \(k>0\)

$$\begin{aligned} \alpha _k \ge c\frac{|F(x_k )^{T}d_k |}{\parallel d_k \parallel ^{2}}, \end{aligned}$$
(72)

where c is some positive constant. Then, \(\left\{ {x_k } \right\} \) converges globally to a solution of problem (1); i.e,

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(73)

Proof

By (59) and the boundedness of \(\left\{ {\parallel d_k \parallel } \right\} \), we have

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \alpha _k \parallel d_k \parallel ^{2}=0. \end{aligned}$$
(74)

From (72) and (74), we have

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } |F(x_k )^{T}d_k |=0. \end{aligned}$$
(75)

On the other hand, from (46), and (45), we have

$$\begin{aligned} F(x_k )^{T}d_k= & {} -\lambda _k^- \parallel F\left( {x_k } \right) \parallel ^{2}\nonumber \\ \parallel F\left( {x_k } \right) \parallel ^{2}= & {} \left\| -\frac{1}{\lambda _k^- }F(x_k )^{T}d_k\right\| \nonumber \\\le & {} |F(x_k )^{T}d_k |\left| {\frac{1}{\lambda _k^- }} \right| . \end{aligned}$$
(76)

But from (45), we have

$$\begin{aligned} \lambda _k^+>\lambda _k^- >0,\quad \forall k. \end{aligned}$$
(77)

Thus, from (76) and applying the sandwich theorem, we obtain

$$\begin{aligned} 0\le \parallel F\left( {x_k } \right) \parallel ^{2}\le |F(x_k )^{T}d_k |\left( {\frac{1}{\lambda _k^- }} \right) \rightarrow 0. \end{aligned}$$
(78)

Therefore,

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(79)

And the proof is completed. \(\square \)

Numerical result

In this section, we test the efficiency and robustness of our proposed approach using the following method in the literature:

A new derivative-free conjugate gradient method for solving large-scale nonlinear systems of equations (NDFCG) [24]. All the codes used were written in MATLAB R2014a environment and run on a personal computer (2.20GHZ CPU, 8GB RAM). Also, the two algorithms used in the experiment were implemented with the same line search procedure, and the parameters are set to \(\sigma _1 =\sigma _2 =10^{-4}\), \(\alpha _0 =0.1\), \(r=0.2\) and \(\eta _k =\frac{1}{(k+1)^{2}}\). In addition, we set \(\xi =0.5\), \(\gamma =-0.5\) and \(\mu _{k-1} =s_{k-1}\) for the ADLCG method. Also, the iteration was set to terminate if it exceeds 2000 or the inequality \(\parallel F_k \parallel \le 10^{-10}\) is satisfied (Table 1).

The two algorithms were tested using the following test problems with various sizes:

Problem 4.1

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =2x_i -sin\left| {x_i } \right| , \quad \quad i=1,\ldots ,n. \end{aligned}$$

Problem 4.2

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =\hbox {log}\left( {x_i +1} \right) -\frac{x_i }{n}, \quad \quad i=2,\ldots ,n. \end{aligned}$$

Problem 4.3

[67] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} 2x_1 +\hbox {sin}\left( {x_1 } \right) -1, \\ F_i \left( x \right)= & {} -2x_{i-1} +2x_i +\hbox {sin}\left( {x_i } \right) -1, \quad \quad i=2,\ldots ,n-1, \\ F_n \left( x \right)= & {} 2x_n +\hbox {sin}\left( {x_n } \right) -1. \end{aligned}$$

Problem 4.4

[56] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =x_i -\frac{1}{n}x_i^2 +\frac{1}{n}\mathop {\sum }\limits _{i=1}^n x_i +i, \quad \quad i=1,2,\ldots ,n.. \end{aligned}$$

Problem 4.5

[38] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =2x_i -\hbox {sin}\left( {x_i } \right) ,\quad i=1,2,\ldots ,n. \end{aligned}$$

Problem 4.6

[51] The function \(F\left( x \right) \) is given by

$$\begin{aligned} F\left( x \right) =Ax+b_1 , \end{aligned}$$

where \(b_1 =(e_1^x -1,\ldots ,e_n^x -1)^{T}\), and

$$\begin{aligned} A=\left( {{\begin{array}{ccccc} 2 &{}\quad {-1} &{} &{} &{} \\ {-1} &{}\quad 2 &{}\quad {-1} &{} &{} \\ &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{} \\ &{} &{}\quad \ddots &{}\quad \ddots &{}\quad {-1} \\ &{} &{} &{}\quad {-1} &{}\quad 2 \\ \end{array} }} \right) \end{aligned}$$

Problem 4.7

[61] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right)= & {} \sqrt{10^{-5}}\left( {x_i -1} \right) , \\ F_n \left( x \right)= & {} \frac{1}{4n}\mathop {\sum }\limits _{j=1}^n x_j^2 -\frac{1}{4}, \quad i=2,3,\ldots ,n-1. \end{aligned}$$

Problem 4.8

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =e^{x_i }-1,\quad i=1,2,\ldots ,n. \end{aligned}$$

Problem 4.9

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} x_1 \left( {x_1^2 +x_2^2 } \right) -1, \\ F_i \left( x \right)= & {} x_i \left( {x_{i-1}^2 +2x_i^2 +x_{i+1}^2 } \right) -1,\quad i=2,3,\ldots ,n-1, \\ F_n \left( x \right)= & {} x_n \left( {x_{n-1}^2 +x_n^2 } \right) . \end{aligned}$$

Problem 4.10

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} x_1 -e \left( \text {cos}\frac{x_1 +x_2 }{n+1}\right) ,\\ F_i \left( x \right)= & {} x_i -e \left( \text {cos}\frac{x_{1-1} +xi+x_{i+1} }{n+1}\right) ,\quad i=2,3,\ldots ,n-1, \\ F_n \left( x \right)= & {} x_n -e \left( \text {cos}\frac{x_{n-1} +x_n }{n+1}\right) . \end{aligned}$$
Table 1 Initial starting points used for the test problems
Table 2 Number of problems and percentage for which each method is a winner with respect to iterations and CPU time

Using the performance profile of Dolan and More [23], we generate Figs. 1 and 2 to show the performance and efficiency of each of the two methods. To better illustrate the performance of the two methods, a summary of the results is presented in Table 2. The summarized data show the number of problems for which each method is a winner in terms of number of iterations and CPU time, respectively. The corresponding percentages of number of problems solved are also indicated

In Figs. 1 and 2, we observed that the curve representing the ADLCG method is above the curve representing the NDFCG method. This is a measure of the efficiency of the ADLCG method compared to the NDFCG scheme.

Fig. 1
figure1

Performance profile for number of iterations

Fig. 2
figure2

Performance profile for the CPU time

Similarly, the summary reported in Table 2 indicated that the ADLCG method is a winner with respect to number of iterations and CPU time. The table shows that the ADLCG method solves 95% (76 out of 80) of the problems with less number of iterations compared to the NDFCG method, which solves only 3.75% (3 out of 80). The summarized result also shows that both methods solve 1 problem with the same number of iteration, which translates to 1.25% and is reported as undecided. Also, the summary indicated that the ADLCG method outperforms the NDFCG scheme as it solves 72.5% (58 out of 80) of the problems with less CPU time compared to 27.5% (22 out of 80) solved by the NDFCG. Therefore, it is clear from Figs. 1 and 2 and the summarized result in Table 2 that our method is more efficient than the NDFCG method and better for large-scale nonlinear systems.

Conclusion

In this work, we proposed a Dai–Liao conjugate gradient method via modified secant equation for systems of nonlinear equations. This was achieved by finding appropriate values for the nonnegative parameter in the DL method using of an extended secant equation developed from the work of Zhang et al. [64] and Wei et al [54]. Numerical comparisons with some existing methods and Global convergence show that the method is efficient.

References

  1. 1.

    Abubakar, A.B.; Kumam, P.: An improved three-term derivative-free method for solving nonlinear equations. Comput. Appl. Math. 37(5), 6760–6773 (2018)

    MATH  Article  MathSciNet  Google Scholar 

  2. 2.

    Abubakar, A.B.; Kumam, P.: A descent Dai–Liao conjugate gradient method for nonlinear equations. Numer. Algorithms (2018). https://doi.org/10.1007/s11075-018-0541-z

    Article  MATH  Google Scholar 

  3. 3.

    Abubakar, A.B.; Kumam, P.; Auwal, A.M.: A descent Dai–Liao projection method for convex constrained nonlinear monotone equations with applications. Thai J. Math. 17(1), 128–152 (2018)

    MATH  MathSciNet  Google Scholar 

  4. 4.

    Andrei, N.: Open problems in conjugate gradient algorithms for unconstrained optimization. Bull. Malays. Math. Sci. Soc. 34(2), 319–330 (2011)

    MATH  MathSciNet  Google Scholar 

  5. 5.

    Andrei, N.: An adaptive conjugate gradient algorithm for large-scale unconstrained optimization. J. Comput. Appl. Math. 292, 8391 (2016)

    Article  MathSciNet  Google Scholar 

  6. 6.

    Andrei, N.: Accelerated adaptive Perry conjugate gradient algorithms based on the selfscaling BFGS update. J. Comput. Appl. Math. 325, 149–164 (2017)

    MATH  Article  MathSciNet  Google Scholar 

  7. 7.

    Arazm, M.R.; Babaie-Kafaki, S.; Ghanbari, R.: An extended Dai–Liao conjugate gradient method with global convergence for nonconvex functions. Glas. Mat. 52(72), 361–375 (2017)

    MATH  Article  MathSciNet  Google Scholar 

  8. 8.

    Babaie-Kafaki, S.; Ghanbari, R.: A descent family of Dai–Liao conjugate gradient methods. Optim. Methods Softw. 29(3), 583–591 (2013)

    MATH  Article  MathSciNet  Google Scholar 

  9. 9.

    Babaie-Kafaki, S.; Ghanbari, R.: A descent extension of the Polak–Ribier’\(e\)–Polyak conjugate gradient method. Comput. Math. Appl. 68, 2005–2011 (2014)

    MATH  Article  MathSciNet  Google Scholar 

  10. 10.

    Babaie-Kafaki, S.; Ghanbari, R.: Two modified three-term conjugate gradient methods with sufficient descent property. Optim. Lett. 8(8), 2285–2297 (2014)

    MATH  Article  MathSciNet  Google Scholar 

  11. 11.

    Babaie-Kafaki, S.; Ghanbari, R.: The Dai–Liao nonlinear conjugate gradient method with optimal parameter choices. Eur. J. Oper. Res. 234, 625–630 (2014)

    MATH  Article  MathSciNet  Google Scholar 

  12. 12.

    Babaie-Kafaki, S.; Ghanbari, R.: Two optimal Dai–Liao conjugate gradient methods. Optimization 64, 2277–2287 (2015)

    MATH  Article  MathSciNet  Google Scholar 

  13. 13.

    Babaie-Kafaki, S.; Ghanbari, R.; Mahdavi-Amiri, N.: Two new conjugate gradient methods based on modified secant equations. J. Comput. Appl. Math. 234(5), 13741386 (2010)

    MATH  Article  MathSciNet  Google Scholar 

  14. 14.

    Broyden, C.G.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19, 577–593 (1965)

    MATH  Article  MathSciNet  Google Scholar 

  15. 15.

    Cheng, W.: A two-term PRP-based descent method. Numer. Funct. Anal. Optim. 28, 1217–1230 (2007)

    MATH  Article  MathSciNet  Google Scholar 

  16. 16.

    Cheng, W.: A PRP type method for systems of monotone equations. Math. Comput. Model. 50, 15–20 (2009)

    MATH  Article  MathSciNet  Google Scholar 

  17. 17.

    Cheng, W.; Xiao, Y.; Hu, Q.: A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations. J. Comput. Appl. Math. 224, 11–19 (2009)

    MATH  Article  MathSciNet  Google Scholar 

  18. 18.

    Dai, Y.H.; Liao, L.Z.: New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43(1), 87–101 (2001)

    MATH  Article  MathSciNet  Google Scholar 

  19. 19.

    Dai, Y.H.; Han, J.Y.; Liu, G.H.; Sun, D.F.; Yin, H.X.; Yuan, Y.X.: Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 10(2), 348–358 (1999)

    MATH  MathSciNet  Google Scholar 

  20. 20.

    Dai, Y.H.; Han, J.Y.; Liu, G.H.; Sun, D.F.; Yin, H.X.; Yuan, Y.X.: Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 10(2), 348–358 (1999)

    MATH  MathSciNet  Google Scholar 

  21. 21.

    Dai, Z.; Chen, X.; Wen, F.: A modified Perrys conjugate gradient method-based derivativefree method for solving large-scale nonlinear monotone equation. Appl. Math. Comput. 270, 378–386 (2015)

    MathSciNet  Google Scholar 

  22. 22.

    Dauda, M.K.; Mamat, M.; Mohamed, M.A.; Waziri, M.Y.: Improved quasi-Newton method via SR1 update for solving symmetric systems of nonlinear equations. Malay. J. Fundam. Appl. Sci. 15(1), 117–120 (2019)

    Article  Google Scholar 

  23. 23.

    Dolan, E.D.; Mor, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2, Ser. A), 201–2013 (2002)

    MATH  Article  MathSciNet  Google Scholar 

  24. 24.

    Fang, X.; Ni, Q.: A new derivative-free conjugate gradient method for nonlinear system of equations. Bull. Aust. Math. Soc. 95, 1–12 (2017)

    Article  MathSciNet  Google Scholar 

  25. 25.

    Feng, D.; Sun, M.; Wang, X.: A family of conjugate gradient methods for large-scale nonlinear equations. J. Inequal. Appl. 2017, 236 (2017)

    MATH  Article  MathSciNet  Google Scholar 

  26. 26.

    Ford, J.A.; Moghrabi, I.A.: Multi-step quasi-Newton methods for optimization. J. Comput. Appl. Math. 50(1–3), 305323 (1994)

    MATH  MathSciNet  Google Scholar 

  27. 27.

    Ford, J.A.; Moghrabi, I.A.: Using function-values in multi-step quasi-Newton methods. J. Comput. Appl. Math. 66(12), 201211 (1996)

    MATH  MathSciNet  Google Scholar 

  28. 28.

    Ford, J.A.; Narushima, Y.; Yabe, H.: Multi-step nonlinear conjugate gradient methods for unconstrained minimization. Comput. Optim. Appl. 40(2), 191–216 (2008)

    MATH  Article  MathSciNet  Google Scholar 

  29. 29.

    Grippo, L.; Lampariello, F.; Lucidi, S.: A nonmonotone linesearch technique for Newtons method. SIAM J. Numer. Anal. 23, 707–716 (1986)

    MATH  Article  MathSciNet  Google Scholar 

  30. 30.

    Hager, W.; Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)

    MATH  MathSciNet  Google Scholar 

  31. 31.

    Hestenes, M.R.; Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409–436 (1952)

    MATH  Article  MathSciNet  Google Scholar 

  32. 32.

    Kincaid, D.; Cheney, W.: Numerical Analysis. Brooks/Cole Publishing Company, California (1991)

    Google Scholar 

  33. 33.

    Li, M.: A derivative-free PRP method for solving large-scale nonlinear systems of equations and its global convergence. Optim. Methods Softw. 29(3), 503–514 (2014)

    MATH  Article  MathSciNet  Google Scholar 

  34. 34.

    Li, D.H.; Fukushima, M.: A globally and superlinearly convergent Gauss–Newton-based BFGS method for symmetric nonlinear equations. SIAM J. Numer. Anal. 37(1), 152–172 (2000)

    MATH  Article  MathSciNet  Google Scholar 

  35. 35.

    Li, D.H.; Fukushima, M.: A derivative-free linesearch and global convergence of Broydenlike method for nonlinear equations. Optim. Methods Softw. 13, 583–599 (2000)

    Article  Google Scholar 

  36. 36.

    Li, D.H.; Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 129, 15–35 (2001)

    MATH  Article  MathSciNet  Google Scholar 

  37. 37.

    Li, G.; Tang, C.; Wei, Z.: New conjugacy condition and related new conjugate gradient methods for unconstrained optimization. J. Comput. Appl. Math. 202(2), 523539 (2007)

    Article  MathSciNet  Google Scholar 

  38. 38.

    Liu, J.; Feng, Y.: A derivative-free iterative method for nonlinear monotone equations with convex constraints. Numer. Algorithms (2018). https://doi.org/10.1007/s11075-018-06032

  39. 39.

    Liu, D.Y.; Shang, Y.F.: A new Perry conjugate gradient method with the generalized conjugacy condition. In: 2010 International Conference on Computational Intelligence and Software Engineering (CiSE), p. 1012 (2010)

  40. 40.

    Liu, D.Y.; Xu, G.Q.: A Perry descent conjugate gradient method with restricted spectrum, optimization online, nonlinear optimization (unconstrained optimization), pp. 1–19 (2011)

  41. 41.

    Livieris, I.E.; Pintelas, P.: Globally convergent modified Perrys conjugate gradient method. Appl. Math. Comput. 218, 9197–9207 (2012)

    MATH  MathSciNet  Google Scholar 

  42. 42.

    Mohammad, H.; Abubakar, A.B.: A positive spectral gradient-like method for nonlinear monotone equations. Bull. Comput. Appl. Math. 5(1), 99–115 (2017)

    MATH  MathSciNet  Google Scholar 

  43. 43.

    Muhammed, A.A.; Kumam, P.; Abubakar, A.B.; Wakili, A.; Pakkaranang, N.: A new hybrid spectral gradient projection method for monotone system of nonlinear equations with convex constraints. Thai J. Math. 16(4), 125–147 (2018)

    MathSciNet  Google Scholar 

  44. 44.

    Perry, A.: A modified conjugate gradient algorithm. Oper. Res. Tech. Notes 26(6), 10731078 (1978)

    MathSciNet  Google Scholar 

  45. 45.

    Polak, B.T.: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 4, 94–112 (1969)

    Article  Google Scholar 

  46. 46.

    Powell, M.J.D.: Restart procedures of the conjugate gradient method. Math. Prog. 2, 241–254 (1977)

    MATH  Article  MathSciNet  Google Scholar 

  47. 47.

    Ribiere, G.; Polak, E.: Note sur la convergence de directions conjugees. Rev. Fr. Inf. Rech. Oper. 16, 35–43 (1969)

    MATH  Google Scholar 

  48. 48.

    Solodov, M.V.; Svaiter, B.F.: A globally convergent inexact Newton method for systems of monotone equations. In: Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp. 355–369. Springer (2015)

  49. 49.

    Sun, W.; Yuan, Y.X.: Optimization Theory and Methods: Nonlinear Programming. Springer, NewYork (2006)

    Google Scholar 

  50. 50.

    Waziri, M.Y.; Muhammad, L.: An accelerated three-term conjugate gradient algorithm for solving large-scale systems of nonlinear equations. Sohag J. Math. 4, 1–8 (2017)

    Article  Google Scholar 

  51. 51.

    Waziri, M.Y.; Sabiu, J.: A derivative-free conjugate gradient method and its global convergence for solving symmetric nonlinear equations, Hindawi Publishing Corporation. Int. J. Math. Math. Sci. 2015 (2015)

  52. 52.

    Waziri, M.Y.; Leong, W.J.; Hassan, M.A.: Jacobian free-diagonal Newton’s method for nonlinear systems with singular Jacobian. Malay. J. Math. Sci. 5(2), 241–255 (2011)

    MATH  MathSciNet  Google Scholar 

  53. 53.

    Waziri, M.Y.; Sabiu, J.; Muhammad, L.: A simple three-term conjugate gradient algorithm for solving symmetric systems of nonlinear equations. Int. J. Adv. Appl. Sci. (IJAAS) (2016)

  54. 54.

    Wei, Z.; Li, G.; Qi, L.: New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 175(2), 1156–1188 (2006)

    MATH  MathSciNet  Google Scholar 

  55. 55.

    Yabe, H.; Takano, M.: Global convergence properties of nonlinear conjugate gradient methods with modified secant condition. Comput. Optim. Appl. 28(2), 203–225 (2004)

    MATH  Article  MathSciNet  Google Scholar 

  56. 56.

    Yan, Q.R.; Peng, X.Z.; Li, D.H.: A globally convergent derivative-free method for solving large-scale nonlinear monotone equations. J. Comput. Appl. Math. 234, 649–657 (2010)

    MATH  Article  MathSciNet  Google Scholar 

  57. 57.

    Yasushi, N.; Hiroshi, Y.: Conjugate gradient methods based on secant conditions that generate descent directions for unconstrained optimization. J. Comput. Appl. Math. 236, 4303–4317 (2012)

    MATH  Article  MathSciNet  Google Scholar 

  58. 58.

    Yu, G.: A derivative-free method for solving large-scale nonlinear systems of equations. J. Ind. Manag. Optim. 6, 149–160 (2010)

    MATH  MathSciNet  Google Scholar 

  59. 59.

    Yu, G.: Nonmonotone spectral gradient-type methods for large-scale unconstrained optimization and nonlinear systems of equations. Pac. J. Optim. 7, 387–404 (2011)

    MATH  MathSciNet  Google Scholar 

  60. 60.

    Yuan, Y.X.: A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 11, 325–332 (1991)

    MATH  Article  MathSciNet  Google Scholar 

  61. 61.

    Yuan, N.: A derivative-free projection method for solving convex constrained monotone equations. ScienceAsia 43, 195–200 (2017)

    Article  Google Scholar 

  62. 62.

    Yuan, G.; Lu, X.: A new backtracking inexact BFGS method for symmetric nonlinear equations. J. Comput. Math. Appl. 55, 116–129 (2008)

    MATH  Article  MathSciNet  Google Scholar 

  63. 63.

    Zhang, J.; Xu, C.: Properties and numerical performance of quasi-Newton methods with modified quasi-Newton equations. J. Comput. Appl. Math. 137(2), 269–278 (2001)

    MATH  Article  MathSciNet  Google Scholar 

  64. 64.

    Zhang, J.Z.; Deng, N.Y.; Chen, L.H.: New quasi-Newton equation and related methods for unconstrained optimization. J. Optim. Theory Appl. 102(1), 147–157 (1999)

    MATH  Article  MathSciNet  Google Scholar 

  65. 65.

    Zhang, L.; Zhou, W.; Li, D.: Some descent three-term conjugate gradient methods and their global convergence. Optim. Methods Softw. 22(4), 697–711 (2007)

    MATH  Article  MathSciNet  Google Scholar 

  66. 66.

    Zhou, W.; Li, D.: Limited memory BFGS method for nonlinear monotone equations. J. Comput. Math. 25(1), 89–96 (2007)

    MathSciNet  Google Scholar 

  67. 67.

    Zhou, W.; Li, D.: Limited memory BFGS method for nonlinear monotone equations. J. Comput. Math. 25(1), 89–96 (2007)

    MathSciNet  Google Scholar 

  68. 68.

    Zhou, W.; Shen, D.: An inexact PRP conjugate gradient method for symmetric nonlinear equations. Numer. Funct. Anal. Optim. 35(3), 370–388 (2014)

    MATH  Article  MathSciNet  Google Scholar 

  69. 69.

    Zhou, W.; Shen, D.: Convergence properties of an iterative method for solving symmetric non-linear equations. J. Optim. Theory Appl. 164(1), 277–289 (2015)

    MATH  Article  MathSciNet  Google Scholar 

  70. 70.

    Zhou, W.; Zhang, L.: A nonlinear conjugate gradient method based on the MBFGS secant condition. Optim. Methods Softw. 21(5), 707–714 (2006)

    MATH  Article  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. Y. Waziri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Waziri, M.Y., Ahmed, K. & Sabi’u, J. A Dai–Liao conjugate gradient method via modified secant equation for system of nonlinear equations. Arab. J. Math. 9, 443–457 (2020). https://doi.org/10.1007/s40065-019-0264-6

Download citation

Mathematics Subject Classification

  • 90C30
  • 65K05
  • 90C53