Line search fixed point algorithms based on nonlinear conjugate gradient directions: application to constrained smooth convex optimization
 1k Downloads
 1 Citations
Abstract
This paper considers the fixed point problem for a nonexpansive mapping on a real Hilbert space and proposes novel line search fixed point algorithms to accelerate the search. The termination conditions for the line search are based on the wellknown Wolfe conditions that are used to ensure the convergence and stability of unconstrained optimization algorithms. The directions to search for fixed points are generated by using the ideas of the steepest descent direction and conventional nonlinear conjugate gradient directions for unconstrained optimization. We perform convergence as well as convergence rate analyses on the algorithms for solving the fixed point problem under certain assumptions. The main contribution of this paper is to make a concrete response to an issue of constrained smooth convex optimization; that is, whether or not we can devise nonlinear conjugate gradient algorithms to solve constrained smooth convex optimization problems. We show that the proposed fixed point algorithms include ones with nonlinear conjugate gradient directions which can solve constrained smooth convex optimization problems. To illustrate the practicality of the algorithms, we apply them to concrete constrained smooth convex optimization problems, such as constrained quadratic programming problems and generalized convex feasibility problems, and numerically compare them with previous algorithms based on the Krasnosel’skiĭMann fixed point algorithm. The results show that the proposed algorithms dramatically reduce the running time and iterations needed to find optimal solutions to the concrete optimization problems compared with the previous algorithms.
Keywords
constrained smooth convex optimization fixed point problem generalized convex feasibility problem Krasnosel’skiĭMann fixed point algorithm line search method nonexpansive mapping nonlinear conjugate gradient methodsMSC
47H10 65K05 90C251 Introduction
Here, let us see how the step size conditions (1.3), (1.5), (1.12), and (1.13) affect the efficiency of Algorithm (1.2). Algorithm (1.2) with (1.3) satisfies \(\ x_{n+1}  T ( x_{n+1} ) \^{2} \leq\ x_{n}  T (x_{n}) \^{2}\) (\(n\in\mathbb{N}\)) [1], (5.14), while Algorithm (1.2) with each of (1.5) and (1.12) satisfies \(\ x_{n+1}  T ( x_{n+1} ) \^{2} < \ x_{n}  T (x_{n}) \^{2}\) (\(n\in \mathbb{N}\)). Hence, it can be expected that Algorithm (1.2) with each of (1.5) and (1.12) performs better than Algorithm (1.2) with (1.3). Since the Armijotype conditions (1.5) and (1.12) are satisfied for all sufficiently small values of \(\alpha_{n}\) [20], Subchapter 3.1, there is a possibility that Algorithm (1.2) with only the Armijotype condition (1.5) does not make reasonable progress. Meanwhile, (1.13) based on the curvature condition [20], Subchapter 3.1, is used to ensure that \(\alpha_{n}\) is not too small and that unacceptably short steps are ruled out. Therefore, the Wolfetype conditions (1.12) and (1.13) should be used to secure efficiency of the algorithm. Moreover, even when \(\alpha_{n}\) satisfying (1.5) is not small enough, it can be expected that Algorithm (1.2) with the Wolfetype conditions (1.12) and (1.13) will have a better convergence rate than Algorithm (1.2) with the Armijotype condition (1.5) because of (1.7), (1.14), and \((\alpha 1/2)^{2} \leq\alpha\) (\(\alpha\in [(2\sqrt{3})/2,1]\)). Section 3 introduces the line search algorithm [21], Algorithm 4.6, to compute step sizes satisfying (1.12) and (1.13) with appropriately chosen δ and σ and gives performance comparisons of Algorithm (1.2) with each of (1.3) and (1.5) with the one with (1.12) and (1.13).
This paper proposes iterative algorithms (Algorithm 2.1) that use the direction (1.20) and step sizes satisfying the Wolfetype conditions (1.12) and (1.13) for solving Problem (1.1) and describes their convergence analyses (Theorems 2.12.5). We also provide their convergence rate analyses (Theorem 2.6).
To verify whether the proposed nonlinear conjugate gradient algorithms are accelerations for solving practical problems, we apply them to constrained quadratic programming problems (Section 3.2) and generalized convex feasibility problems (Section 3.3) (see [6, 33] and references therein for the relationship between the generalized convex feasibility problem and signal processing problems), which are constrained smooth convex optimization problems and particularly interesting applications of Problem (1.1). Moreover, we numerically compare their abilities to solve concrete constrained quadratic programming problems and generalized convex feasibility problems with those of previous algorithms based on the Krasnosel’skiĭMann algorithm (Algorithm (1.2) with step sizes satisfying (1.3) and Algorithm (1.2) with step sizes satisfying (1.5)) and show that they can find optimal solutions to these problems faster than the previous ones.
Throughout this paper, we shall let \(\mathbb{N}\) be the set of zero and all positive integers, \(\mathbb{R}^{d}\) be a ddimensional Euclidean space, H be a real Hilbert space with inner product \(\langle\cdot, \cdot\rangle\) and its induced norm \(\ \cdot\\), and \(T\colon H \to H\) be a nonexpansive mapping with \(\operatorname{Fix}(T) := \{ x\in H \colon T(x) = x \} \neq\emptyset\).
2 Line search fixed point algorithms based on nonlinear conjugate gradient directions
Let us begin by explicitly stating our algorithm for solving Problem (1.1) discussed in Section 1.
Algorithm 2.1
 Step 0.

Take \(\delta, \sigma\in(0,1)\) with \(\delta\leq\sigma\). Choose \(x_{0} \in H\) arbitrarily and set \(d_{0} := (x_{0}  T(x_{0}))\) and \(n:= 0\).
 Step 1.
 Compute \(\alpha_{n} \in(0,1]\) satisfying$$\begin{aligned}& \bigl\Vert x_{n} ( \alpha_{n} )  T \bigl( x_{n} ( \alpha _{n} ) \bigr) \bigr\Vert ^{2}  \bigl\Vert x_{n}  T ( x_{n} ) \bigr\Vert ^{2} \leq\delta \alpha_{n} \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle , \end{aligned}$$(2.1)where \(x_{n}(\alpha_{n}) := x_{n} + \alpha_{n} d_{n}\). Compute \(x_{n+1} \in H\) by$$\begin{aligned}& \bigl\langle x_{n} ( \alpha_{n} )  T \bigl(x_{n} ( \alpha_{n} ) \bigr), d_{n} \bigr\rangle \geq\sigma \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle , \end{aligned}$$(2.2)$$ x_{n+1} := x_{n} + \alpha_{n} d_{n}. $$(2.3)
 Step 2.

If \(\ x_{n+1}  T(x_{n+1}) \= 0\), stop. Otherwise, go to Step 3.
 Step 3.
 Compute \(\beta_{n} \in\mathbb{R}\) by using each of the following formulas:where \(y_{n} := (x_{n+1}  T(x_{n+1}))  (x_{n}  T(x_{n}))\). Generate \(d_{n+1} \in H\) by$$\begin{aligned}& \beta_{n}^{\mathrm{SD}} := 0, \\& \beta_{n}^{\mathrm{HS}+} := \max \biggl\{ \frac{ \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle}{ \langle d_{n}, y_{n} \rangle}, 0 \biggr\} , \qquad \beta_{n}^{\mathrm{FR}} := \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{\Vert x_{n}  T (x_{n} ) \Vert ^{2}}, \\& \beta_{n}^{\mathrm{PRP}+} := \max \biggl\{ \frac{ \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle}{ \Vert x_{n}  T (x_{n} ) \Vert ^{2}}, 0 \biggr\} ,\qquad \beta_{n}^{\mathrm{DY}} := \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{ \langle d_{n}, y_{n} \rangle}, \end{aligned}$$(2.4)$$ d_{n+1} :=  \bigl( x_{n+1}  T (x_{n+1} ) \bigr) + \beta_{n} d_{n}. $$
 Step 4.

Put \(n := n+1\) and go to Step 1.
We need to use appropriate line search algorithms to compute \(\alpha _{n}\) (\(n\in\mathbb{N}\)) satisfying (2.1) and (2.2). In Section 3, we use a useful one (Algorithm 3.1) [21], Algorithm 4.6, that can obtain the step sizes satisfying (2.1) and (2.2) whenever the line search algorithm terminates [21], Theorem 4.7. Although the efficiency of the line search algorithm depends on the parameters δ and σ, thanks to the reference [21], Section 6.1, we can set appropriate δ and σ before executing it [21], Algorithm 4.6, and Algorithm 2.1. See Section 3 for the numerical performance of the line search algorithm [21], Algorithm 4.6, and Algorithm 2.1.
It can be seen that Algorithm 2.1 is well defined when \(\beta _{n}\) is defined by \(\beta_{n}^{\mathrm{SD}}\), \(\beta_{n}^{\mathrm{FR}}\), or \(\beta_{n}^{\mathrm{PRP}+}\). The discussion in Section 2.2 shows that Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\) is well defined (Lemma 2.3(i)). Moreover, it is guaranteed that under certain assumptions, Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm {HS}+}\) is well defined (Theorem 2.5).
2.1 Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{SD}}\)
Theorem 2.1
2.1.1 Proof of Theorem 2.1
If \(m \in\mathbb{N}\) exists such that \(\ x_{m}  T(x_{m}) \ = 0\), Theorem 2.1 holds. Accordingly, it can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds.
First, the following lemma can be proven by referring to [18, 19, 32].
Lemma 2.1
Proof
Lemma 2.1 leads to the following.
Lemma 2.2
 (i)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
 (ii)
\((\ x_{n}  x \)_{n\in\mathbb{N}}\) is monotone decreasing for all \(x\in\operatorname{Fix}(T)\).
 (iii)
\((x_{n})_{n\in\mathbb{N}}\) weakly converges to a point in \(\operatorname{Fix}(T)\).
Items (i) and (iii) in Lemma 2.2 indicate that Theorem 2.1 holds under the assumption that \(\ x_{n}  T (x_{n}) \ \neq0\) (\(n\in\mathbb{N}\)).
Proof
(i) In the case where \(\beta_{n} := \beta_{n}^{\mathrm{SD}} = 0\) (\(n\in \mathbb{N}\)), \(d_{n} =  (x_{n}  T(x_{n}))\) holds for all \(n\in\mathbb{N}\). Hence, \(\langle x_{n}  T(x_{n}), d_{n} \rangle=  \x_{n}  T(x_{n})\^{2} < 0\) (\(n\in\mathbb{N}\)). Lemma 2.1 thus guarantees that \(\sum_{n=0}^{\infty}\ x_{n}  T ( x_{n} ) \^{2} < \infty\), which implies \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
(ii) The triangle inequality and the nonexpansivity of T ensure that, for all \(n\in\mathbb{N}\) and for all \(x\in\operatorname{Fix}(T)\), \(\ x_{n+1}  x \ = \ x_{n} + \alpha_{n} ( T (x_{n})  x_{n} )  x \ \leq(1\alpha_{n} ) \ x_{n}  x \ + \alpha_{n} \T (x_{n})  T (x)\ \leq\ x_{n}  x \\).
2.2 Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{DY}}\)
The following is a convergence analysis of Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\).
Theorem 2.2
2.2.1 Proof of Theorem 2.2
Since the existence of \(m\in\mathbb{N}\) such that \(\ x_{m}  T(x_{m}) \ = 0\) implies that Theorem 2.2 holds, it can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds. Theorem 2.2 can be proven by using the ideas presented in the proof of [28], Theorem 3.3. The proof of Theorem 2.2 is divided into three steps.
Lemma 2.3
 (i)
\(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in\mathbb{N}\)).
 (ii)
\(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
 (iii)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
Proof
2.3 Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{FR}}\)
The following is a convergence analysis of Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{FR}}\).
Theorem 2.3
2.3.1 Proof of Theorem 2.3
It can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds. Theorem 2.3 can be proven by using the ideas in the proof of [30], Theorem 2.
Lemma 2.4
 (i)
\(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in\mathbb{N}\)).
 (ii)
\(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
 (iii)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
Proof
(iii) A discussion similar to the one in the proof of Lemma 2.3(iii) leads to Lemma 2.4(iii). This completes the proof. □
2.4 Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{PRP}+}\)
It is well known that the convergence of the nonlinear conjugate gradient method with \(\beta_{n}^{\mathrm{PRP}}\) defined as in (1.19) for a general nonlinear function is uncertain [23], Section 5. To guarantee the convergence of the PRP method for unconstrained optimization, the following modification of \(\beta _{n}^{\mathrm{PRP}}\) was presented in [35]: for \(\beta _{n}^{\mathrm{PRP}}\) defined as in (1.19), \(\beta_{n}^{\mathrm {PRP}+} := \max\{ \beta_{n}^{\mathrm{PRP}}, 0 \}\). On the basis of the idea behind this modification, this subsection considers Algorithm 2.1 with \(\beta_{n}^{\mathrm{PRP}+}\) defined as in (2.4).
Theorem 2.4
2.4.1 Proof of Theorem 2.4
It can be assumed that \(\ x_{n}  T (x_{n}) \ \neq0\) holds for all \(n\in \mathbb{N}\). Let us first show the following lemma by referring to the proof of [31], Lemma 4.1.
Lemma 2.5
Let \((x_{n})_{n\in\mathbb{N}}\) and \((d_{n})_{n\in\mathbb{N}}\) be the sequences generated by Algorithm 2.1 with \(\beta_{n} \geq0\) (\(n\in\mathbb{N}\)) and assume that there exists \(c > 0\) such that \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\) for all \(n\in\mathbb{N}\). If there exists \(\varepsilon> 0\) such that \(\x_{n}  T(x_{n})\ \geq \varepsilon\) for all \(n\in\mathbb{N}\), then \(\sum_{n=0}^{\infty}\ u_{n+1}  u_{n} \^{2}< \infty\), where \(u_{n} := d_{n}/\d_{n}\\) (\(n\in\mathbb{N}\)).
Proof
 Property (⋆).
 Suppose that there exist positive constants γ and γ̄ such that \(\gamma\leq\ x_{n}  T(x_{n}) \ \leq\bar{\gamma}\) for all \(n\in\mathbb{N}\). Then Property (⋆) holds if \(b > 1\) and \(\lambda> 0\) exist such that, for all \(n\in\mathbb{N}\),$$ \vert \beta_{n} \vert \leq b \quad \text{and}\quad \Vert x_{n+1}  x_{n} \Vert \leq\lambda\quad \text{implies} \quad \vert \beta_{n} \vert \leq\frac{1}{2b}. $$
The proof of the following lemma can be omitted since it is similar to the proof of [31], Lemma 4.2.
Lemma 2.6
Let \((x_{n})_{n\in\mathbb{N}}\) and \((d_{n})_{n\in\mathbb{N}}\) be the sequences generated by Algorithm 2.1 and assume that there exist \(c > 0\) and \(\gamma> 0\) such that \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\) and \(\x_{n}  T(x_{n})\ \geq\gamma\) for all \(n\in\mathbb{N}\). Suppose also that Property (⋆) holds. Then there exists \(\lambda> 0\) such that, for all \(\Delta\in\mathbb {N} \backslash\{0\}\) and for any index \(k_{0}\), there is \(k \geq k_{0}\) such that \( \mathcal {K}_{k,\Delta}^{\lambda} > \Delta/2\), where \(\mathcal{K}_{k,\Delta}^{\lambda}:= \{ i\in\mathbb{N} \backslash\{ 0\} \colon k \leq i \leq k + \Delta1, \ x_{i}  x_{i1} \ > \lambda\}\) (\(k\in\mathbb{N}\), \(\Delta\in\mathbb{N} \backslash\{0\}\), \(\lambda> 0\)) and \(\mathcal{K}_{k,\Delta}^{\lambda}\) stands for the number of elements of \(\mathcal{K}_{k,\Delta}^{\lambda}\).
The following can be proven by referring to the proof of [31], Theorem 4.3.
Lemma 2.7
Let \((x_{n})_{n\in\mathbb{N}}\) be the sequence generated by Algorithm 2.1 with \(\beta_{n} \geq0\) (\(n\in\mathbb{N}\)) and assume that there exists \(c > 0\) such that \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\) for all \(n\in\mathbb{N}\) and Property (⋆) holds. If \((x_{n})_{n\in\mathbb{N}}\) is bounded, \(\liminf_{n\to\infty} \x_{n}  T (x_{n} ) \ = 0\).
Proof
Now we are in the position to prove Theorem 2.4.
Proof
2.5 Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{HS}+}\)
The convergence properties of the nonlinear conjugate gradient method with \(\beta_{n}^{\mathrm{HS}}\) defined as in (1.19) are similar to those with \(\beta_{n}^{\mathrm{PRP}}\) defined as in (1.19) [23], Section 5. On the basis of this fact and the modification of \(\beta_{n}^{\mathrm {PRP}}\) in Section 2.4, this subsection considers Algorithm 2.1 with \(\beta _{n}^{\mathrm{HS}+}\) defined by (2.4).
Lemma 2.7 leads to the following.
Theorem 2.5
Proof
2.6 Convergence rate analyses of Algorithm 2.1
Sections 2.12.5 show that Algorithm 2.1 with equations (2.4) satisfies \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\) under certain assumptions. The next theorem establishes rates of convergence for Algorithm 2.1 with equations (2.4).
Theorem 2.6
 (i)
 (ii)Under the strong Wolfetype conditions (2.1) and (2.9), Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\) satisfies, for all \(n\in\mathbb{N}\),$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\frac{1}{1+\sigma} \delta\sum_{k=0}^{n} \alpha_{k}}}. $$
 (iii)Under the strong Wolfetype conditions (2.1) and (2.9), Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{FR}}\) satisfies, for all \(n\in\mathbb{N}\),$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\frac{1}{1\sigma} \delta\sum_{k=0}^{n} ( 12\sigma+ \sigma ^{k} ) \alpha_{k}}}. $$
 (iv)
 (v)
Proof
In general, the step sizes satisfying (1.3) do not coincide with those satisfying the Armijotype condition (1.5) or the Wolfetype conditions (2.1) and (2.2). This is because the line search methods based on the Armijotype conditions (1.5) and (2.1) determine step sizes at each iteration n so as to satisfy \(\ x_{n+1}  T(x_{n+1}) \ < \x_{n}  T(x_{n})\\), while the constant step sizes satisfying (1.3) do not change at each iteration. Accordingly, it would be difficult to evaluate the efficiency of these algorithms by using only the theoretical convergence rates in (2.13), (2.14), and Theorem 2.6. To verify whether Algorithm 2.1 with the convergence rates in Theorem 2.6 converges faster than the previous algorithms [8], Propositions 10 and 11, [17], Theorem 5, with convergence rates (2.13) and (2.14), the next section numerically compares their abilities to solve concrete constrained smooth convex optimization problems.
3 Application of Algorithm 2.1 to constrained smooth convex optimization
3.1 Experimental conditions and fixed point and line search algorithms used in the experiment
 SD1:

Algorithm (3.2) with constant step sizes \(\alpha_{n} := 0.5\) (\(n\in\mathbb{N}\)) [1], Theorem 5.14.
 SD2:

Algorithm (3.2) with \(\alpha_{n}\) satisfying the Armijotype condition (1.5) when \(\beta= 0.5\) [17], Theorems 4 and 8.
 SD3:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{SD}}\) (Theorem 2.1).
 FR:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{FR}}\) (Theorem 2.3).
 PRP+:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{PRP}+}\) (Theorem 2.4).
 HS+:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{HS}+}\) (Theorem 2.5).
 DY:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{DY}}\) (Theorem 2.2).
 HZ:

Algorithm (3.3) with \(\alpha_{n}\) satisfying the Wolfetype conditions (2.1) and (2.2) and \(\beta_{n} := \beta_{n}^{\mathrm{HZ}}\) defined by (3.4) [29, 36].

I: the number of initial points;

\(x_{0}^{(i)}\): the initial point chosen randomly (\(i=1,2,\ldots, I\));

ALGO: each of Algorithms SD1, SD2, SD3, FR, PRP+, HS+, DY, and HZ (\(\mathrm{ALGO} \in\{\mathrm{SD}\mbox{}1, \mathrm{SD}\mbox{}2, \mathrm{SD}\mbox{}3, \mathrm{FR}, \mathrm{PRP}{+}, \mathrm{HS}{+}, \mathrm{DY}, \mathrm{HZ}\}\));

\(N_{1} (x_{0}^{(i)}, \mathrm{ALGO})\): the number of step sizes computed by Algorithm 3.1 for ALGO with \(x_{0}^{(i)}\) before ALGO satisfies the stopping condition (3.5);

\(N_{2} (x_{0}^{(i)}, \mathrm{ALGO})\): the number of iterations needed to satisfy the stopping condition (3.5) for ALGO with \(x_{0}^{(i)}\).
3.2 Constrained quadratic programming problem
In this subsection, let us consider the following constrained quadratic programming problem:
Problem 3.1
Since f above is convex and \(\nabla f(x) = Qx +b\) (\(x\in\mathbb {R}^{d}\)) is Lipschitz continuous such that the Lipschitz constant of ∇f is the maximum eigenvalue \(\lambda _{\mathrm{max}}\) of Q, Problem 3.1 is an example of Problem (3.1).
3.3 Generalized convex feasibility problem
This subsection considers the following generalized convex feasibility problem [33], Section I, Framework 2, [37], Section 2.2, [6], Definition 4.1:
Problem 3.2
\(C_{f}\) is a subset of \(C_{0}\) having the elements closest to \(C_{i}\) (\(i=1,2,\ldots,m\)) in terms of the weighted mean square norm. Even if \(\bigcap_{i=0}^{m} C_{i} = \emptyset\), \(C_{f}\) is well defined because \(C_{f}\) is the set of all minimizers of f over \(C_{0}\). The condition \(C_{f} \neq\emptyset\) holds when \(C_{0}\) is bounded [6], Remark 4.3(a). Moreover, \(C_{f} = \bigcap_{i=0}^{m} C_{i}\) holds when \(\bigcap_{i=0}^{m} C_{i} \neq\emptyset\). Accordingly, Problem 3.2 is a generalization of the convex feasibility problem [5] of finding a point in \(\bigcap_{i=0}^{m} C_{i} \neq\emptyset\).
The convex function f in Problem 3.2 satisfies \(\nabla f = \mathrm{Id}  \sum_{i=1}^{m} w_{i} P_{C_{i}}\). Hence, ∇f is Lipschitz continuous when its Lipschitz constant is two. This means Problem 3.2 is an example of Problem (3.1). Since Problem 3.2 can be expressed as the problem of finding a fixed point of \(T = P_{C_{0}} (\mathrm{Id}  \lambda\nabla f) = P_{C_{0}} (\mathrm{Id}  \lambda(\mathrm{Id}  \sum_{i=1}^{m} w_{i} P_{C_{i}}) )\) for \(\lambda\in(0,1]\), we used T with \(\lambda=1\); i.e., \(T := P_{C_{0}} (\sum_{i=1}^{m} w_{i} P_{C_{i}})\).
From the above numerical results, we can conclude that the proposed algorithms can find optimal solutions to Problems 3.1 and 3.2 faster than the previous fixed point algorithms can. In particular, it can be seen that the algorithms for which the SRs of Algorithm 3.1 are high converge quickly to solutions of Problems 3.1 and 3.2.
4 Conclusion and future work
This paper discussed the fixed point problem for a nonexpansive mapping on a real Hilbert space and presented line search fixed point algorithms for solving it on the basis of nonlinear conjugate gradient methods for unconstrained optimization and their convergence analyses and convergence rate analyses. Moreover, we used these algorithms to solve concrete constrained quadratic programming problems and generalized convex feasibility problems and numerically compared them with the previous fixed point algorithms based on the Krasnosel’skiĭMann fixed point algorithm. The numerical results showed that the proposed algorithms can find optimal solutions to these problems faster than the previous algorithms.
In the experiment, the line search algorithm (Algorithm 3.1) could not compute appropriate step sizes for fixed point algorithms other than Algorithms SD2, SD3, and PRP+. In the future, we should consider modifying the algorithms to enable the line search to compute appropriate step sizes. Or we may need to develop new line searches that can be applied to all of the fixed point algorithms considered in this paper.
Footnotes
 1.
See Theorem 2.6(i) for the details of the convergence rate of the proposed algorithm when \(d_{n} :=  (x_{n}  T(x_{n}))\) (\(n\in\mathbb{N}\)).
 2.
To guarantee the convergence of the PRP and HS methods for unconstrained optimization, the formulas \(\beta_{n}^{\mathrm{PRP}+} := \max\{\beta_{n}^{\mathrm{PRP}}, 0\}\) and \(\beta_{n}^{\mathrm{HS}+} := \max\{\beta_{n}^{\mathrm{HS}}, 0\}\) were presented in [35]. We use the modifications to perform the convergence analyses on the proposed line search fixed point algorithms.
Notes
Acknowledgements
I am sincerely grateful to the editor, Juan Jose Nieto, the anonymous associate editor, and the anonymous reviewers for helping me improve the original manuscript. The author thanks Mr. Kazuhiro Hishinuma for his discussion of the numerical experiments. This work was supported by the Japan Society for the Promotion of Science through a GrantinAid for Scientific Research (C) (15K04763).
References
 1.Bauschke, HH, Combettes, PL: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011) CrossRefMATHGoogle Scholar
 2.Goebel, K, Kirk, WA: Topics in Metric Fixed Point Theory. Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1990) CrossRefMATHGoogle Scholar
 3.Goebel, K, Reich, S: Uniform Convexity, Hyperbolic Geometry, and Nonexpansive Mappings. Dekker, New York (1984) MATHGoogle Scholar
 4.Takahashi, W: Nonlinear Functional Analysis. Yokohama Publishers, Yokohama (2000) MATHGoogle Scholar
 5.Bauschke, HH, Borwein, JM: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367426 (1996) MathSciNetCrossRefMATHGoogle Scholar
 6.Yamada, I: The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of nonexpansive mappings. In: Butnariu, D, Censor, Y, Reich, S (eds.) Inherently Parallel Algorithms for Feasibility and Optimization and Their Applications, pp. 473504. Elsevier, Amsterdam (2001) CrossRefGoogle Scholar
 7.Berinde, V: Iterative Approximation of Fixed Points. Springer, Berlin (2007) MATHGoogle Scholar
 8.Cominetti, R, Soto, JA, Vaisman, J: On the rate of convergence of Krasnosel’skiĭMann iterations and their connection with sums of Bernoulli’s. Isr. J. Math. 199, 757772 (2014) MathSciNetCrossRefMATHGoogle Scholar
 9.Krasnosel’skiĭ, MA: Two remarks on the method of successive approximations. Usp. Mat. Nauk 10, 123127 (1955) MathSciNetGoogle Scholar
 10.Mann, WR: Mean value methods in iteration. Proc. Am. Math. Soc. 4, 506510 (1953) MathSciNetCrossRefMATHGoogle Scholar
 11.Halpern, B: Fixed points of nonexpanding maps. Bull. Am. Math. Soc. 73, 957961 (1967) MathSciNetCrossRefMATHGoogle Scholar
 12.Wittmann, R: Approximation of fixed points of nonexpansive mappings. Arch. Math. 58, 486491 (1992) MathSciNetCrossRefMATHGoogle Scholar
 13.Nakajo, K, Takahashi, W: Strong convergence theorems for nonexpansive mappings and nonexpansive semigroups. J. Math. Anal. Appl. 279, 372379 (2003) MathSciNetCrossRefMATHGoogle Scholar
 14.Solodov, MV, Svaiter, BF: Forcing strong convergence of proximal point iterations in a Hilbert space. Math. Program. 87, 189202 (2000) MathSciNetMATHGoogle Scholar
 15.Boţ, RI, Csetnek, ER: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dyn. Differ. Equ. (2015). doi: 10.1007/s108840159438x Google Scholar
 16.Combettes, PL, Pesquet, JC: A DouglasRachford splitting approach to nonsmooth convex variational signal recovery. IEEE J. Sel. Top. Signal Process. 1, 564574 (2007) CrossRefGoogle Scholar
 17.Magnanti, TL, Perakis, G: Solving variational inequality and fixed point problems by line searches and potential optimization. Math. Program. 101, 435461 (2004) MathSciNetCrossRefMATHGoogle Scholar
 18.Wolfe, P: Convergence conditions for ascent methods. SIAM Rev. 11, 226235 (1969) MathSciNetCrossRefMATHGoogle Scholar
 19.Wolfe, P: Convergence conditions for ascent methods. II: some corrections. SIAM Rev. 13, 185188 (1971) MathSciNetCrossRefMATHGoogle Scholar
 20.Nocedal, J, Wright, SJ: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, Berlin (2006) MATHGoogle Scholar
 21.Lewis, AS, Overton, ML: Nonsmooth optimization via quasiNewton methods. Math. Program. 141, 135163 (2013) MathSciNetCrossRefMATHGoogle Scholar
 22.Iiduka, H: Iterative algorithm for solving triplehierarchical constrained optimization problem. J. Optim. Theory Appl. 148, 580592 (2011) MathSciNetCrossRefMATHGoogle Scholar
 23.Hager, WW, Zhang, H: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 3558 (2006) MathSciNetMATHGoogle Scholar
 24.Hestenes, MR, Stiefel, EL: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409436 (1952) MathSciNetCrossRefMATHGoogle Scholar
 25.Fletcher, R, Reeves, C: Function minimization by conjugate gradients. Comput. J. 7, 149154 (1964) MathSciNetCrossRefMATHGoogle Scholar
 26.Polak, E, Ribière, G: Note sur la convergence de directions conjugées. Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér. 3, 3543 (1969) MATHGoogle Scholar
 27.Polyak, BT: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 9, 94112 (1969) CrossRefMATHGoogle Scholar
 28.Dai, YH, Yuan, Y: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10, 177182 (1999) MathSciNetCrossRefMATHGoogle Scholar
 29.Hager, WW, Zhang, H: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16, 170192 (2005) MathSciNetCrossRefMATHGoogle Scholar
 30.AlBaali, M: Descent property and global convergence of the FletcherReeves method with inexact line search. IMA J. Numer. Anal. 5, 121124 (1985) MathSciNetCrossRefMATHGoogle Scholar
 31.Gilbert, JC, Nocedal, J: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 2142 (1992) MathSciNetCrossRefMATHGoogle Scholar
 32.Zoutendijk, G: Nonlinear programming, computational methods. In: Abadie, J (ed.) Integer and Nonlinear Programming, pp. 3738. NorthHolland, Amsterdam (1970) Google Scholar
 33.Combettes, PL, Bondon, P: Hardconstrained inconsistent signal feasibility problems. IEEE Trans. Signal Process. 47, 24602468 (1999) CrossRefMATHGoogle Scholar
 34.Opial, Z: Weak convergence of the sequence of successive approximation for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591597 (1967) MathSciNetCrossRefMATHGoogle Scholar
 35.Powell, MJD: Nonconvex minimization calculations and the conjugate gradient method. In: Numerical Analysis (Dundee, 1983). Lecture Notes in Mathematics, vol. 1066, pp. 122141. Springer, Berlin (1984) CrossRefGoogle Scholar
 36.Hager, WW, Zhang, H: Algorithm 851: CG_DESCENT: a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32, 113137 (2006) MathSciNetCrossRefGoogle Scholar
 37.Iiduka, H: Iterative algorithm for triplehierarchical constrained nonconvex optimization problem and its application to network bandwidth allocation. SIAM J. Optim. 22, 862878 (2012) MathSciNetCrossRefMATHGoogle Scholar
 38.Iiduka, H: Acceleration method for convex optimization over the fixed point set of a nonexpansive mapping. Math. Program. 149, 131165 (2015) MathSciNetCrossRefMATHGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.