Gradient Sampling Methods for Nonsmooth Optimization

Burke, James V.; Curtis, Frank E.; Lewis, Adrian S.; Overton, Michael L.; Simões, Lucas E. A.

doi:10.1007/978-3-030-34910-3_6

James V. Burke⁶,
Frank E. Curtis⁷,
Adrian S. Lewis⁸,
Michael L. Overton⁹ &
…
Lucas E. A. Simões¹⁰

2038 Accesses
34 Citations

Abstract

This article reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. We state an intuitively straightforward gradient sampling algorithm and summarize its convergence properties. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We provide an overview of various enhancements that have been proposed to improve practical performance, as well as an overview of several extensions that have been proposed in the literature, such as to solve constrained problems. We also clarify certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.

Dedicated to Krzysztof Kiwiel, in recognition of his fundamental work on algorithms for nonsmooth optimization

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although this fact has been known for decades, most of the examples that appear in the literature are rather artificial since they were designed with exact line searches in mind. Analyses showing that the steepest descent method with inexact line searches converges to nonstationary points of some simple convex nonsmooth functions have appeared recently in [1, 22].
2.
This oversight went unnoticed for 12 years until J. Portegies and T. Mitchell brought it to our attention recently.
3.
www.cs.nyu.edu/overton/software/hanso/.
4.
www.cs.nyu.edu/overton/software/hifoo/.

References

Asl, A., Overton, M.L.: Analysis of the gradient method with an Armijo–Wolfe line search on a class of nonsmooth convex functions. Optim. Method Softw. (2017). https://doi.org/10.1080/10556788.2019.1673388
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet Google Scholar
Birgin, E., Martinez, J., Raydan, M.: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), 1–21 (2014)
Article Google Scholar
Burke, J.V., Lin, Q.: The gradient sampling algorithm for directionally Lipschitzian functions (in preparation)
Google Scholar
Burke, J.V., Overton, M.L.: Variational analysis of non-Lipschitz spectral functions. Math. Program. 90(2, Ser. A), 317–351 (2001)
Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27(3), 567–584 (2002)
Article MathSciNet Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: Two numerical methods for optimizing matrix stability. Linear Algebra Appl. 351/352, 117–145 (2002)
Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Article MathSciNet Google Scholar
Burke, J.V., Henrion, D., Lewis, A.S., Overton, M.L.: HIFOO—a MATLAB package for fixed-order controller design and H _∞ optimization. In: Fifth IFAC Symposium on Robust Control Design, Toulouse (2006)
Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983). Reprinted by SIAM, Philadelphia, 1990
Google Scholar
Crema, A., Loreto, M., Raydan, M.: Spectral projected subgradient with a momentum term for the Lagrangean dual approach. Comput. Oper. Res. 34(10), 3174–3186 (2007)
Article MathSciNet Google Scholar
Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22(2), 474–500 (2012)
Article MathSciNet Google Scholar
Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for nonsmooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013)
Article MathSciNet Google Scholar
Curtis, F.E., Que, X.: A quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Math. Program. Comput. 7(4), 399–428 (2015)
Article MathSciNet Google Scholar
Curtis, F.E., Mitchell, T., Overton, M.L.: A BFGS-SQP method for nonsmooth, nonconvex, constrained optimization and its evaluation using relative minimization profiles. Optim. Methods Softw. 32(1), 148–181 (2017)
Article MathSciNet Google Scholar
Curtis, F.E., Robinson, D.P., Zhou, B.: A self-correcting variable-metric algorithm framework for nonsmooth optimization. IMA J. Numer. Anal. (2019). https://doi.org/10.1093/imanum/drz008; https://academic.oup.com/imajna/advance-article/doi/10.1093/imanum/drz008/5369122?guestAccessKey=a7e5eee5-9ed6-4a95-9f6c-f305237d0849
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019). https://doi.org/10.1137/18M1178244
Article MathSciNet Google Scholar
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. (2019). https://doi.org/10.1007/s10208-018-09409-5
Estrada, A., Mitchell, I.M.: Control synthesis and classification for unicycle dynamics using the gradient and value sampling particle filters. In: Proceedings of the IFAC Conference on Analysis and Design of Hybrid Systems, pp. 108–114 (2018).
Google Scholar
Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987)
MATH Google Scholar
Fletcher, R.: On the Barzilai-Borwein method. In: Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, pp. 235–256. Springer, Boston (2005)
Chapter Google Scholar
Guo, J., Lewis, A.S.: Nonsmooth variants of Powell’s BFGS convergence theorem. SIAM J. Optim. 28(2), 1301–1311 (2018). https://doi.org/10.1137/17M1121883
Article MathSciNet Google Scholar
Hare, W., Nutini, J.: A derivative-free approximate gradient sampling algorithm for finite minimax problems. Comput. Optim. Appl. 56(1), 1–38 (2013). https://doi.org/10.1007/s10589-013-9547-6
Article MathSciNet Google Scholar
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the differentiability check in gradient sampling methods. Optim. Methods Softw. 31(5), 983–1007 (2016)
Article MathSciNet Google Scholar
Helou, E.S., Santos, S.A., Simões, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Appl. 175(1), 137–157 (2017)
Article MathSciNet Google Scholar
Hosseini, S., Uschmajew, A.: A Riemannian gradient sampling algorithm for nonsmooth optimization on manifolds. SIAM J. Optim. 27(1), 173–189 (2017). https://doi.org/10.1137/16M1069298
Article MathSciNet Google Scholar
Kiwiel, K.C.: A method for solving certain quadratic programming problems arising in nonsmooth optimization. IMA J. Numer. Anal. 6(2), 137–152 (1986)
Article MathSciNet Google Scholar
Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)
Article MathSciNet Google Scholar
Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010). https://doi.org/10.1137/090748408
Article MathSciNet Google Scholar
Larson, J., Menickelly, M., Wild, S.M.: Manifold sampling for ℓ ₁ nonconvex optimization. SIAM J. Optim. 26(4), 2540–2563 (2016). https://doi.org/10.1137/15M1042097
Article MathSciNet Google Scholar
Lemaréchal, C., Oustry, F., Sagastizábal, C.: The U-Lagrangian of a convex function. Trans. Am. Math. Soc. 352(2), 711–729 (2000)
Article MathSciNet Google Scholar
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)
Article MathSciNet Google Scholar
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1–2, Ser. A), 135–163 (2013). https://doi.org/10.1007/s10107-012-0514-2
Lin, Q.: Sparsity and nonconvex nonsmooth optimization. Ph.D. thesis, Department of Mathematics, University of Washington (2009)
Google Scholar
Loreto, M., Aponte, H., Cores, D., Raydan, M.: Nonsmooth spectral gradient methods for unconstrained optimization. EURO J. Comput. Optim. 5(4), 529–553 (2017)
Article MathSciNet Google Scholar
Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. 104(2-3), 583–608 (2005)
Article MathSciNet Google Scholar
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017). https://doi.org/10.1007/s10208-015-9296-2
Article MathSciNet Google Scholar
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13(3), 321–326 (1993)
Article MathSciNet Google Scholar
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), 26–33 (1997)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming. In: Sorensen, D.C., Wets, R.J.B. (eds.) Mathematical Programming Study, Mathematical Programming Studies, Chap. 3, pp. 28–66. North-Holland, Amsterdam (1982). http://www.springerlink.com/index/g03582565267714p.pdf
Google Scholar
Rockafellar R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998). https://doi.org/10.1007/978-3-642-02431-3
Tang, C.M., Liu, S., Jian, J.B., Li, J.L.: A feasible SQP-GS algorithm for nonconvex, nonsmooth constrained optimization. Numer. Algorithms 65(1), 1–22 (2014). https://doi.org/10.1007/s11075-012-9692-5
Article MathSciNet Google Scholar
Traft, N., Mitchell, I.M.: Improved action and path synthesis using gradient sampling. In: Proceedings of the IEEE Conference on Decision and Control, pp. 6016–6023 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Washington, Seattle, WA, USA
James V. Burke
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA
Frank E. Curtis
School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
Adrian S. Lewis
Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Michael L. Overton
Department of Applied Mathematics, University of Campinas, Campinas, Brazil
Lucas E. A. Simões

Authors

James V. Burke
View author publications
You can also search for this author in PubMed Google Scholar
Frank E. Curtis
View author publications
You can also search for this author in PubMed Google Scholar
Adrian S. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Overton
View author publications
You can also search for this author in PubMed Google Scholar
Lucas E. A. Simões
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael L. Overton .

Editor information

Editors and Affiliations

School of Science, Engineering and Information Technology, Federation University Australia, Ballarat, VIC, Australia
Adil M. Bagirov
Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, Rende (CS), Italy
Manlio Gaudioso
Department of Mathematics and Statistics, University of Turku, Turku, Finland
Napsu Karmitsa
Department of Mathematics and Statistics, University of Turku, Turku, Finland
Marko M. Mäkelä
School of Science, Engineering and Information Technology, Federation University Australia, Ballarat, VIC, Australia
Sona Taheri

Appendices

Appendix 1

This appendix is devoted to justifying the requirement that D, the set of points on which the locally Lipschitz function f is continuously differentiable, must be an open full-measure subset of $\mathbb {R}^n$, instead of the original assumption in [8] that D should be an open and dense set in $\mathbb {R}^n$.

There are two ways in which the analyses in [8, 28] actually depend on D having full measure:

1.
The most obvious is that both papers require that the points sampled in each iteration should lie in D, and a statement is made in both papers that this occurs with probability one, but this is not the case if D is assumed only to be an open dense subset of $\mathbb {R}^n$. However, as already noted earlier and justified in Appendix 2, this requirement can be relaxed, as in Algorithm GS given in Sect. 6.2, to require only that f be differentiable at the sampled points.
2.
The set D must have full measure for Property 6.1, stated below, to hold. The proofs in [8, 28] depend critically on this property, which follows from [6, Eq. (1.2)] (where it was stated without proof). For completeness we give a proof here, followed by an example that demonstrates the necessity of the full measure assumption.

Property 6.1

Assume that D has full measure and let

$$\displaystyle \begin{aligned} G_\epsilon({\boldsymbol x}):= \operatorname{\mathrm{cl}}\operatorname{\mathrm{conv}}\nabla f\left( \bar{B}({\boldsymbol x};\epsilon)\cap D \right). \end{aligned}$$

For all 𝜖 > 0 and all ${\boldsymbol x}\in \mathbb {R}^n$, one has ∂f(x) ⊆ G _𝜖(x), where ∂f is the Clarke subdifferential set presented in Definition 1.8.

Proof of Property 6.1

Let ${\boldsymbol x}\in \mathbb {R}^n$ and v ∈ ∂f(x). We have from [10, Theorem 2.5.1] that Theorem 1.2 can be stated in a more general manner. Indeed, for any set S with zero measure, and considering Ω _f to be the set of points at which f fails to be differentiable, the following holds:

$$\displaystyle \begin{aligned} {\partial} f({\boldsymbol x}) = \operatorname{\mathrm{conv}}\left\{ \lim_j \nabla f({\boldsymbol y}^j) : {\boldsymbol y}^j\to {\boldsymbol x}\ \text{where}\ {\boldsymbol y}^j\notin S \cup\varOmega_f\ \text{for all}\ j \in \mathbb{N}\right\}. \end{aligned}$$

In particular, since D has full measure and f is differentiable on D, it follows that

$$\displaystyle \begin{aligned} {\partial} f({\boldsymbol x}) = \operatorname{\mathrm{conv}}\left\{ \lim_j \nabla f({\boldsymbol y}^j) : {\boldsymbol y}^j\to {\boldsymbol x}\ \text{with}\ {\boldsymbol y}^j\in D\ \text{for all}\ j \in \mathbb{N}\right\}\text. \end{aligned}$$

Considering this last relation and Carathéodory’s theorem, it follows that , where, for all , one has $\boldsymbol {\xi }^i = \lim \limits _j \nabla f({\boldsymbol y}^{j,i})$ for some sequence $\{{\boldsymbol y}^{j,i}\}_{j\in \mathbb {N}} \subset D$ converging to x. Hence, there must exist a sufficiently large $j_i \in \mathbb {N}$ such that, for all j ≥ j _i, one obtains

$$\displaystyle \begin{aligned} {\boldsymbol y}^{j,i}\in \bar{B}({\boldsymbol x};\epsilon)\cap D \implies \nabla f({\boldsymbol y}^{j,i})\in \nabla f\left(\bar{B}({\boldsymbol x}; \epsilon)\cap D\right) \subseteq \operatorname{\mathrm{conv}}\nabla f\left(\bar{B}({\boldsymbol x}; \epsilon)\cap D\right)\text. \end{aligned}$$

Recalling that G _𝜖(x) is the closure of $ \operatorname {\mathrm {conv}}\nabla f\left (\bar {B}({\boldsymbol x}; \epsilon )\cap D\right )$, it follows that ξ ⁱ ∈ G _𝜖(x) for all . Moreover, since G _𝜖(x) is convex, we have v ∈ G _𝜖(x). The result follows since ${\boldsymbol x} \in \mathbb {R}^n$ and v ∈ ∂f(x) were arbitrarily chosen. $\square $

With the assumption that D has full measure, Property 6.1 holds and hence the proofs of the results in [8, 28] are all valid. In particular, the proof of (ii) in [28, Lemma 3.2], which borrows from [8, Lemma 3.2], depends on Property 6.1. See also [8, the top of p. 762].

The following example shows that Property 6.1 might not hold if D is assumed only to be an open dense set, not necessarily of full measure.

Example 6.2

Let δ ∈ (0, 1) and $\{q_k\}_{k\in \mathbb {N}}$ be the enumeration of the rational numbers in (0, 1). Define

$$\displaystyle \begin{aligned} D:= \bigcup_{k=1}^\infty \mathcal{Q}_k\text{, where }\mathcal{Q}_k:=\left(q_k - \frac{\delta}{2^{k+1}} , q_k + \frac{\delta}{2^{k+1}}\right)\text. \end{aligned}$$

Clearly, its Lebesgue measure satisfies 0 < λ(D) ≤ δ. Moreover, the set D is an open dense subset of [0, 1]. Now, let $i_{D}:[0,1]\to \mathbb {R}$ be the indicator function of the set D,

$$\displaystyle \begin{aligned} i_{D}(x) = \left\{ \begin{array}{cl} 1\text, & \text{if}\ x\in D, \\ 0\text, & \text{if}\ x\notin D\text. \end{array} \right. \end{aligned}$$

Then, considering the Lebesgue integral, we define the function $f:[0,1]\to \mathbb {R}$,

$$\displaystyle \begin{aligned} f(x) = \int_{[0,x]} i_{D}d\lambda\text. \end{aligned}$$

Let us prove that f is a Lipschitz continuous function on (0, 1). To see this, note that given any a, b ∈ (0, 1) with b > a, it follows that

$$\displaystyle \begin{aligned} |f(b) - f(a)| = \left| \int_{[0,b]} i_{D}d\lambda - \int_{[0,a]} i_{D}d\lambda \right| = \left| \int_{(a,b]} i_{D}d\lambda \right| \leq \int_{(a,b]} 1d\lambda = b-a\text, \end{aligned}$$

which ensures that f is a Lipschitz continuous function on (0, 1). Consequently, the Clarke subdifferential set of f at any point in (0, 1) is well defined. Moreover, we claim that, for all $k\in \mathbb {N}$, f is continuously differentiable at any point $q\in \mathcal {Q}_k$ and the following holds

$$\displaystyle \begin{aligned} f'(q) = i_{D}(q) = 1\text. \end{aligned} $$

(6.7)

Indeed, given any $q\in \mathcal {Q}_k$, we have

$$\displaystyle \begin{aligned} f(q+t) - f(q) = \int_{[0,q+t]}i_{D}d\lambda - \int_{[0,q]}i_{D}d\lambda = \int_{(q,q+t]}i_{D}d\lambda\text{, for }t>0\text. \end{aligned}$$

Since $\mathcal {Q}_k$ is an open set, we can find $\overline t>0$ such that $[q,q+t]\subset \mathcal {Q}_k\subset D$, for all $t\leq \overline t$. Hence, given any $t\in (0,\overline {t}]$, it follows that

$$\displaystyle \begin{aligned} f(q+t) - f(q) = \int_{(q,q+t]} 1d\lambda = t\ \Longrightarrow \lim_{t\downarrow 0}\frac{f(q+t) - f(q)}{t} = 1 = i_{D}(q)\text. \end{aligned}$$

The same reasoning can be used to see that the left derivative of f at q exists and it is equal to i _D(q). Consequently, we have f′(q) = i _D(q) = 1 for all $q\in \mathcal {Q}_k$, which yields that f is continuously differentiable on D.

By the Lebesgue differentiation theorem, we know that f′(x) = i _D(x) almost everywhere. Since the set [0, 1] ∖ D does not have measure zero, this means that there must exist z ∈ [0, 1] ∖ D such that f′(z) = i _D(z) = 0. Defining $\epsilon := \min \{z,1-z\}/2$, we see, by (6.7), that the set

$$\displaystyle \begin{aligned} G_\epsilon(z):= \operatorname{\mathrm{cl}}\operatorname{\mathrm{conv}}\nabla f([z-\epsilon,z+\epsilon]\cap D) \end{aligned}$$

is a singleton G _𝜖(z) = {1}. However, since f′(z) = 0, it follows that 0 ∈ ∂f(z), which implies ∂f(z)⊄G _𝜖(z).

Note that it is stated on [8, p. 754] and [28, p. 381] that the following holds: for all 0 ≤ 𝜖 ₁ < 𝜖 ₂ and all ${\boldsymbol x}\in \mathbb {R}^n$, one has $\bar \partial _{\epsilon _1} f({\boldsymbol x}) \subseteq G_{\epsilon _2}({\boldsymbol x})$. Property 6.1 is a special case of this statement with 𝜖 ₁ = 0, and hence this statement too holds only under the full measure assumption.

Finally, it is worth mentioning that in practice, the full measure assumption on D usually holds. In particular, whenever a real-valued function is semi-algebraic (or, more generally, “tame”)—in other words, for all practical purposes virtually always—it is continuously differentiable on an open set of full measure. Hence, the original proofs hold in such contexts.

Appendix 2

In this appendix, we summarize why it is not necessary that the iterates and sampled points of the algorithm lie in the set D in which f is continuously differentiable, and that rather it is sufficient to ensure that f is differentiable at these points, as in Algorithm GS. We do this by outlining how to modify the proofs in [28] to extend to this case.

1.
That the gradients at the sampled points {x ^{k, j}} exist follows with probability one from Rademacher’s theorem, while the existence of the gradients at the iterates {x ^k} is ensured by the statement of Algorithm GS. Notice that the proof of part (ii) of [28, Theorem 3.3] still holds in our setting with the statement that the components of the sampled points are “sampled independently and uniformly from $\bar {B}({\boldsymbol x}^k;\epsilon )\cap D$” replaced with “sampled independently and uniformly from $\bar {B}({\boldsymbol x}^k;\epsilon )$”.
2.
One needs to verify that f being differentiable at x ^k is enough to ensure that the line search procedure presented in (6.3) terminates finitely. This is straightforward. Since ∇f(x ^k) exists, it follows that the directional derivative along any vector ${\boldsymbol d}\in \mathbb {R}^n\setminus \{0\}$ is given by f′(x ^k;d) = ∇f(x ^k)^Td. Furthermore, since −∇f(x ^k)^Tg ^k ≤−∥g ^k∥² (see [8, p. 756]), it follows, for any β ∈ (0, 1), that there exists $\overline t>0$ such that
$$\displaystyle \begin{aligned} f({\boldsymbol x}^k-t{\boldsymbol g}^k) < f({\boldsymbol x}^k) - t\beta\|{\boldsymbol g}^k\|{}^2\ \ \text{for any}\ \ t \in (0,\overline t). \end{aligned}$$

This shows that the line search is well defined.
3.
The only place where we actually need to modify the proof in [28] concerns item (ii) in Lemma 3.2, where it is stated that $\nabla f({\boldsymbol x}^k) \in G_\epsilon (\bar {\boldsymbol x})$ (for a particular point $\bar {\boldsymbol x}$) because ${\boldsymbol x}^k \in \bar {B}(\bar {\boldsymbol x};\epsilon /3) \cap D$; the latter is not true if x ^k∉D. However, using Property 6.1, we have
$$\displaystyle \begin{aligned} \nabla f({\boldsymbol x}^k)\in {\partial} f({\boldsymbol x}^k)\subset G_{\epsilon/3}({\boldsymbol x}^k) \subset G_{\epsilon}(\bar {\boldsymbol x}) \text{ when } {\boldsymbol x}^k\in \bar{B}(\bar {\boldsymbol x};\epsilon/3), \end{aligned}$$

and therefore, $\nabla f({\boldsymbol x}^k) \in G_\epsilon (\bar {\boldsymbol x})$ even when x ^k∉D.

Finally, although it was convenient in Appendix 1 to state Property 1 in terms of D, it actually holds if D is replaced by any full measure set on which f is differentiable. Nonetheless, it is important to note that the proofs of the results in [8, 28] do require that f be continuously differentiable on D. This assumption is used in the proof of (i) in [28, Lemma 3.2].

Acknowledgements

The authors would like to acknowledge the following financial support. J.V. Burke was supported in part by the U.S. National Science Foundation grant DMS-1514559. F.E. Curtis was supported in part by the U.S. Department of Energy grant DE-SC0010615. A.S. Lewis was supported in part by the U.S. National Science Foundation grant DMS-1613996. M.L. Overton was supported in part by the U.S. National Science Foundation grant DMS-1620083. L.E.A. Simões was supported in part by the São Paulo Research Foundation (FAPESP), Brazil, under grants 2016/22989-2 and 2017/07265-0.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.A. (2020). Gradient Sampling Methods for Nonsmooth Optimization. In: Bagirov, A., Gaudioso, M., Karmitsa, N., Mäkelä, M., Taheri, S. (eds) Numerical Nonsmooth Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-34910-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-34910-3_6
Published: 29 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34909-7
Online ISBN: 978-3-030-34910-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics