Skip to main content

Gradient Sampling Methods for Nonsmooth Optimization

  • Chapter
  • First Online:
Numerical Nonsmooth Optimization

Abstract

This article reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. We state an intuitively straightforward gradient sampling algorithm and summarize its convergence properties. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We provide an overview of various enhancements that have been proposed to improve practical performance, as well as an overview of several extensions that have been proposed in the literature, such as to solve constrained problems. We also clarify certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.

Dedicated to Krzysztof Kiwiel, in recognition of his fundamental work on algorithms for nonsmooth optimization

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although this fact has been known for decades, most of the examples that appear in the literature are rather artificial since they were designed with exact line searches in mind. Analyses showing that the steepest descent method with inexact line searches converges to nonstationary points of some simple convex nonsmooth functions have appeared recently in [1, 22].

  2. 2.

    This oversight went unnoticed for 12 years until J. Portegies and T. Mitchell brought it to our attention recently.

  3. 3.

    www.cs.nyu.edu/overton/software/hanso/.

  4. 4.

    www.cs.nyu.edu/overton/software/hifoo/.

References

  1. Asl, A., Overton, M.L.: Analysis of the gradient method with an Armijo–Wolfe line search on a class of nonsmooth convex functions. Optim. Method Softw. (2017). https://doi.org/10.1080/10556788.2019.1673388

  2. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  Google Scholar 

  3. Birgin, E., Martinez, J., Raydan, M.: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), 1–21 (2014)

    Article  Google Scholar 

  4. Burke, J.V., Lin, Q.: The gradient sampling algorithm for directionally Lipschitzian functions (in preparation)

    Google Scholar 

  5. Burke, J.V., Overton, M.L.: Variational analysis of non-Lipschitz spectral functions. Math. Program. 90(2, Ser. A), 317–351 (2001)

    Google Scholar 

  6. Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27(3), 567–584 (2002)

    Article  MathSciNet  Google Scholar 

  7. Burke, J.V., Lewis, A.S., Overton, M.L.: Two numerical methods for optimizing matrix stability. Linear Algebra Appl. 351/352, 117–145 (2002)

    Google Scholar 

  8. Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)

    Article  MathSciNet  Google Scholar 

  9. Burke, J.V., Henrion, D., Lewis, A.S., Overton, M.L.: HIFOO—a MATLAB package for fixed-order controller design and H optimization. In: Fifth IFAC Symposium on Robust Control Design, Toulouse (2006)

    Google Scholar 

  10. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983). Reprinted by SIAM, Philadelphia, 1990

    Google Scholar 

  11. Crema, A., Loreto, M., Raydan, M.: Spectral projected subgradient with a momentum term for the Lagrangean dual approach. Comput. Oper. Res. 34(10), 3174–3186 (2007)

    Article  MathSciNet  Google Scholar 

  12. Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22(2), 474–500 (2012)

    Article  MathSciNet  Google Scholar 

  13. Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for nonsmooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013)

    Article  MathSciNet  Google Scholar 

  14. Curtis, F.E., Que, X.: A quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Math. Program. Comput. 7(4), 399–428 (2015)

    Article  MathSciNet  Google Scholar 

  15. Curtis, F.E., Mitchell, T., Overton, M.L.: A BFGS-SQP method for nonsmooth, nonconvex, constrained optimization and its evaluation using relative minimization profiles. Optim. Methods Softw. 32(1), 148–181 (2017)

    Article  MathSciNet  Google Scholar 

  16. Curtis, F.E., Robinson, D.P., Zhou, B.: A self-correcting variable-metric algorithm framework for nonsmooth optimization. IMA J. Numer. Anal. (2019). https://doi.org/10.1093/imanum/drz008; https://academic.oup.com/imajna/advance-article/doi/10.1093/imanum/drz008/5369122?guestAccessKey=a7e5eee5-9ed6-4a95-9f6c-f305237d0849

  17. Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019). https://doi.org/10.1137/18M1178244

    Article  MathSciNet  Google Scholar 

  18. Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. (2019). https://doi.org/10.1007/s10208-018-09409-5

  19. Estrada, A., Mitchell, I.M.: Control synthesis and classification for unicycle dynamics using the gradient and value sampling particle filters. In: Proceedings of the IFAC Conference on Analysis and Design of Hybrid Systems, pp. 108–114 (2018).

    Google Scholar 

  20. Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987)

    MATH  Google Scholar 

  21. Fletcher, R.: On the Barzilai-Borwein method. In: Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, pp. 235–256. Springer, Boston (2005)

    Chapter  Google Scholar 

  22. Guo, J., Lewis, A.S.: Nonsmooth variants of Powell’s BFGS convergence theorem. SIAM J. Optim. 28(2), 1301–1311 (2018). https://doi.org/10.1137/17M1121883

    Article  MathSciNet  Google Scholar 

  23. Hare, W., Nutini, J.: A derivative-free approximate gradient sampling algorithm for finite minimax problems. Comput. Optim. Appl. 56(1), 1–38 (2013). https://doi.org/10.1007/s10589-013-9547-6

    Article  MathSciNet  Google Scholar 

  24. Helou, E.S., Santos, S.A., Simões, L.E.A.: On the differentiability check in gradient sampling methods. Optim. Methods Softw. 31(5), 983–1007 (2016)

    Article  MathSciNet  Google Scholar 

  25. Helou, E.S., Santos, S.A., Simões, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Appl. 175(1), 137–157 (2017)

    Article  MathSciNet  Google Scholar 

  26. Hosseini, S., Uschmajew, A.: A Riemannian gradient sampling algorithm for nonsmooth optimization on manifolds. SIAM J. Optim. 27(1), 173–189 (2017). https://doi.org/10.1137/16M1069298

    Article  MathSciNet  Google Scholar 

  27. Kiwiel, K.C.: A method for solving certain quadratic programming problems arising in nonsmooth optimization. IMA J. Numer. Anal. 6(2), 137–152 (1986)

    Article  MathSciNet  Google Scholar 

  28. Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)

    Article  MathSciNet  Google Scholar 

  29. Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010). https://doi.org/10.1137/090748408

    Article  MathSciNet  Google Scholar 

  30. Larson, J., Menickelly, M., Wild, S.M.: Manifold sampling for 1 nonconvex optimization. SIAM J. Optim. 26(4), 2540–2563 (2016). https://doi.org/10.1137/15M1042097

    Article  MathSciNet  Google Scholar 

  31. Lemaréchal, C., Oustry, F., Sagastizábal, C.: The U-Lagrangian of a convex function. Trans. Am. Math. Soc. 352(2), 711–729 (2000)

    Article  MathSciNet  Google Scholar 

  32. Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)

    Article  MathSciNet  Google Scholar 

  33. Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1–2, Ser. A), 135–163 (2013). https://doi.org/10.1007/s10107-012-0514-2

  34. Lin, Q.: Sparsity and nonconvex nonsmooth optimization. Ph.D. thesis, Department of Mathematics, University of Washington (2009)

    Google Scholar 

  35. Loreto, M., Aponte, H., Cores, D., Raydan, M.: Nonsmooth spectral gradient methods for unconstrained optimization. EURO J. Comput. Optim. 5(4), 529–553 (2017)

    Article  MathSciNet  Google Scholar 

  36. Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. 104(2-3), 583–608 (2005)

    Article  MathSciNet  Google Scholar 

  37. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017). https://doi.org/10.1007/s10208-015-9296-2

    Article  MathSciNet  Google Scholar 

  38. Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13(3), 321–326 (1993)

    Article  MathSciNet  Google Scholar 

  39. Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), 26–33 (1997)

    Article  MathSciNet  Google Scholar 

  40. Rockafellar, R.T.: Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming. In: Sorensen, D.C., Wets, R.J.B. (eds.) Mathematical Programming Study, Mathematical Programming Studies, Chap. 3, pp. 28–66. North-Holland, Amsterdam (1982). http://www.springerlink.com/index/g03582565267714p.pdf

    Google Scholar 

  41. Rockafellar R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998). https://doi.org/10.1007/978-3-642-02431-3

  42. Tang, C.M., Liu, S., Jian, J.B., Li, J.L.: A feasible SQP-GS algorithm for nonconvex, nonsmooth constrained optimization. Numer. Algorithms 65(1), 1–22 (2014). https://doi.org/10.1007/s11075-012-9692-5

    Article  MathSciNet  Google Scholar 

  43. Traft, N., Mitchell, I.M.: Improved action and path synthesis using gradient sampling. In: Proceedings of the IEEE Conference on Decision and Control, pp. 6016–6023 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael L. Overton .

Editor information

Editors and Affiliations

Appendices

Appendix 1

This appendix is devoted to justifying the requirement that D, the set of points on which the locally Lipschitz function f is continuously differentiable, must be an open full-measure subset of \(\mathbb {R}^n\), instead of the original assumption in [8] that D should be an open and dense set in \(\mathbb {R}^n\).

There are two ways in which the analyses in [8, 28] actually depend on D having full measure:

  1. 1.

    The most obvious is that both papers require that the points sampled in each iteration should lie in D, and a statement is made in both papers that this occurs with probability one, but this is not the case if D is assumed only to be an open dense subset of \(\mathbb {R}^n\). However, as already noted earlier and justified in Appendix 2, this requirement can be relaxed, as in Algorithm GS given in Sect. 6.2, to require only that f be differentiable at the sampled points.

  2. 2.

    The set D must have full measure for Property 6.1, stated below, to hold. The proofs in [8, 28] depend critically on this property, which follows from [6, Eq. (1.2)] (where it was stated without proof). For completeness we give a proof here, followed by an example that demonstrates the necessity of the full measure assumption.

Property 6.1

Assume that D has full measure and let

$$\displaystyle \begin{aligned} G_\epsilon({\boldsymbol x}):= \operatorname{\mathrm{cl}}\operatorname{\mathrm{conv}}\nabla f\left( \bar{B}({\boldsymbol x};\epsilon)\cap D \right). \end{aligned}$$

For all 𝜖 > 0 and all \({\boldsymbol x}\in \mathbb {R}^n\), one has ∂f(x) ⊆ G 𝜖(x), where ∂f is the Clarke subdifferential set presented in Definition 1.8.

Proof of Property 6.1

Let \({\boldsymbol x}\in \mathbb {R}^n\) and v ∈ ∂f(x). We have from [10, Theorem 2.5.1] that Theorem 1.2 can be stated in a more general manner. Indeed, for any set S with zero measure, and considering Ω f to be the set of points at which f fails to be differentiable, the following holds:

$$\displaystyle \begin{aligned} {\partial} f({\boldsymbol x}) = \operatorname{\mathrm{conv}}\left\{ \lim_j \nabla f({\boldsymbol y}^j) : {\boldsymbol y}^j\to {\boldsymbol x}\ \text{where}\ {\boldsymbol y}^j\notin S \cup\varOmega_f\ \text{for all}\ j \in \mathbb{N}\right\}. \end{aligned}$$

In particular, since D has full measure and f is differentiable on D, it follows that

$$\displaystyle \begin{aligned} {\partial} f({\boldsymbol x}) = \operatorname{\mathrm{conv}}\left\{ \lim_j \nabla f({\boldsymbol y}^j) : {\boldsymbol y}^j\to {\boldsymbol x}\ \text{with}\ {\boldsymbol y}^j\in D\ \text{for all}\ j \in \mathbb{N}\right\}\text. \end{aligned}$$

Considering this last relation and Carathéodory’s theorem, it follows that , where, for all , one has \(\boldsymbol {\xi }^i = \lim \limits _j \nabla f({\boldsymbol y}^{j,i})\) for some sequence \(\{{\boldsymbol y}^{j,i}\}_{j\in \mathbb {N}} \subset D\) converging to x. Hence, there must exist a sufficiently large \(j_i \in \mathbb {N}\) such that, for all j ≥ j i, one obtains

$$\displaystyle \begin{aligned} {\boldsymbol y}^{j,i}\in \bar{B}({\boldsymbol x};\epsilon)\cap D \implies \nabla f({\boldsymbol y}^{j,i})\in \nabla f\left(\bar{B}({\boldsymbol x}; \epsilon)\cap D\right) \subseteq \operatorname{\mathrm{conv}}\nabla f\left(\bar{B}({\boldsymbol x}; \epsilon)\cap D\right)\text. \end{aligned}$$

Recalling that G 𝜖(x) is the closure of \( \operatorname {\mathrm {conv}}\nabla f\left (\bar {B}({\boldsymbol x}; \epsilon )\cap D\right )\), it follows that ξ i ∈ G 𝜖(x) for all . Moreover, since G 𝜖(x) is convex, we have v ∈ G 𝜖(x). The result follows since \({\boldsymbol x} \in \mathbb {R}^n\) and v ∈ ∂f(x) were arbitrarily chosen. \(\square \)

With the assumption that D has full measure, Property 6.1 holds and hence the proofs of the results in [8, 28] are all valid. In particular, the proof of (ii) in [28, Lemma 3.2], which borrows from [8, Lemma 3.2], depends on Property 6.1. See also [8, the top of p. 762].

The following example shows that Property 6.1 might not hold if D is assumed only to be an open dense set, not necessarily of full measure.

Example 6.2

Let δ ∈ (0, 1) and \(\{q_k\}_{k\in \mathbb {N}}\) be the enumeration of the rational numbers in (0, 1). Define

$$\displaystyle \begin{aligned} D:= \bigcup_{k=1}^\infty \mathcal{Q}_k\text{, where }\mathcal{Q}_k:=\left(q_k - \frac{\delta}{2^{k+1}} , q_k + \frac{\delta}{2^{k+1}}\right)\text. \end{aligned}$$

Clearly, its Lebesgue measure satisfies 0 < λ(D) ≤ δ. Moreover, the set D is an open dense subset of [0, 1]. Now, let \(i_{D}:[0,1]\to \mathbb {R}\) be the indicator function of the set D,

$$\displaystyle \begin{aligned} i_{D}(x) = \left\{ \begin{array}{cl} 1\text, & \text{if}\ x\in D, \\ 0\text, & \text{if}\ x\notin D\text. \end{array} \right. \end{aligned}$$

Then, considering the Lebesgue integral, we define the function \(f:[0,1]\to \mathbb {R}\),

$$\displaystyle \begin{aligned} f(x) = \int_{[0,x]} i_{D}d\lambda\text. \end{aligned}$$

Let us prove that f is a Lipschitz continuous function on (0, 1). To see this, note that given any a, b ∈ (0, 1) with b > a, it follows that

$$\displaystyle \begin{aligned} |f(b) - f(a)| = \left| \int_{[0,b]} i_{D}d\lambda - \int_{[0,a]} i_{D}d\lambda \right| = \left| \int_{(a,b]} i_{D}d\lambda \right| \leq \int_{(a,b]} 1d\lambda = b-a\text, \end{aligned}$$

which ensures that f is a Lipschitz continuous function on (0, 1). Consequently, the Clarke subdifferential set of f at any point in (0, 1) is well defined. Moreover, we claim that, for all \(k\in \mathbb {N}\), f is continuously differentiable at any point \(q\in \mathcal {Q}_k\) and the following holds

$$\displaystyle \begin{aligned} f'(q) = i_{D}(q) = 1\text. \end{aligned} $$
(6.7)

Indeed, given any \(q\in \mathcal {Q}_k\), we have

$$\displaystyle \begin{aligned} f(q+t) - f(q) = \int_{[0,q+t]}i_{D}d\lambda - \int_{[0,q]}i_{D}d\lambda = \int_{(q,q+t]}i_{D}d\lambda\text{, for }t>0\text. \end{aligned}$$

Since \(\mathcal {Q}_k\) is an open set, we can find \(\overline t>0\) such that \([q,q+t]\subset \mathcal {Q}_k\subset D\), for all \(t\leq \overline t\). Hence, given any \(t\in (0,\overline {t}]\), it follows that

$$\displaystyle \begin{aligned} f(q+t) - f(q) = \int_{(q,q+t]} 1d\lambda = t\ \Longrightarrow \lim_{t\downarrow 0}\frac{f(q+t) - f(q)}{t} = 1 = i_{D}(q)\text. \end{aligned}$$

The same reasoning can be used to see that the left derivative of f at q exists and it is equal to i D(q). Consequently, we have f′(q) = i D(q) = 1 for all \(q\in \mathcal {Q}_k\), which yields that f is continuously differentiable on D.

By the Lebesgue differentiation theorem, we know that f′(x) = i D(x) almost everywhere. Since the set [0, 1] ∖ D does not have measure zero, this means that there must exist z ∈ [0, 1] ∖ D such that f′(z) = i D(z) = 0. Defining \(\epsilon := \min \{z,1-z\}/2\), we see, by (6.7), that the set

$$\displaystyle \begin{aligned} G_\epsilon(z):= \operatorname{\mathrm{cl}}\operatorname{\mathrm{conv}}\nabla f([z-\epsilon,z+\epsilon]\cap D) \end{aligned}$$

is a singleton G 𝜖(z) = {1}. However, since f′(z) = 0, it follows that 0 ∈ ∂f(z), which implies ∂f(z)⊄G 𝜖(z).

Note that it is stated on [8, p. 754] and [28, p. 381] that the following holds: for all 0 ≤ 𝜖 1 < 𝜖 2 and all \({\boldsymbol x}\in \mathbb {R}^n\), one has \(\bar \partial _{\epsilon _1} f({\boldsymbol x}) \subseteq G_{\epsilon _2}({\boldsymbol x})\). Property 6.1 is a special case of this statement with 𝜖 1 = 0, and hence this statement too holds only under the full measure assumption.

Finally, it is worth mentioning that in practice, the full measure assumption on D usually holds. In particular, whenever a real-valued function is semi-algebraic (or, more generally, “tame”)—in other words, for all practical purposes virtually always—it is continuously differentiable on an open set of full measure. Hence, the original proofs hold in such contexts.

Appendix 2

In this appendix, we summarize why it is not necessary that the iterates and sampled points of the algorithm lie in the set D in which f is continuously differentiable, and that rather it is sufficient to ensure that f is differentiable at these points, as in Algorithm GS. We do this by outlining how to modify the proofs in [28] to extend to this case.

  1. 1.

    That the gradients at the sampled points {x k, j} exist follows with probability one from Rademacher’s theorem, while the existence of the gradients at the iterates {x k} is ensured by the statement of Algorithm GS. Notice that the proof of part (ii) of [28, Theorem 3.3] still holds in our setting with the statement that the components of the sampled points are “sampled independently and uniformly from \(\bar {B}({\boldsymbol x}^k;\epsilon )\cap D\)” replaced with “sampled independently and uniformly from \(\bar {B}({\boldsymbol x}^k;\epsilon )\)”.

  2. 2.

    One needs to verify that f being differentiable at x k is enough to ensure that the line search procedure presented in (6.3) terminates finitely. This is straightforward. Since ∇f(x k) exists, it follows that the directional derivative along any vector \({\boldsymbol d}\in \mathbb {R}^n\setminus \{0\}\) is given by f′(x k;d) = ∇f(x k)Td. Furthermore, since −∇f(x k)Tg k ≤−∥g k2 (see [8, p. 756]), it follows, for any β ∈ (0, 1), that there exists \(\overline t>0\) such that

    $$\displaystyle \begin{aligned} f({\boldsymbol x}^k-t{\boldsymbol g}^k) < f({\boldsymbol x}^k) - t\beta\|{\boldsymbol g}^k\|{}^2\ \ \text{for any}\ \ t \in (0,\overline t). \end{aligned}$$

    This shows that the line search is well defined.

  3. 3.

    The only place where we actually need to modify the proof in [28] concerns item (ii) in Lemma 3.2, where it is stated that \(\nabla f({\boldsymbol x}^k) \in G_\epsilon (\bar {\boldsymbol x})\) (for a particular point \(\bar {\boldsymbol x}\)) because \({\boldsymbol x}^k \in \bar {B}(\bar {\boldsymbol x};\epsilon /3) \cap D\); the latter is not true if x kD. However, using Property 6.1, we have

    $$\displaystyle \begin{aligned} \nabla f({\boldsymbol x}^k)\in {\partial} f({\boldsymbol x}^k)\subset G_{\epsilon/3}({\boldsymbol x}^k) \subset G_{\epsilon}(\bar {\boldsymbol x}) \text{ when } {\boldsymbol x}^k\in \bar{B}(\bar {\boldsymbol x};\epsilon/3), \end{aligned}$$

    and therefore, \(\nabla f({\boldsymbol x}^k) \in G_\epsilon (\bar {\boldsymbol x})\) even when x kD.

Finally, although it was convenient in Appendix 1 to state Property 1 in terms of D, it actually holds if D is replaced by any full measure set on which f is differentiable. Nonetheless, it is important to note that the proofs of the results in [8, 28] do require that f be continuously differentiable on D. This assumption is used in the proof of (i) in [28, Lemma 3.2].

Acknowledgements

The authors would like to acknowledge the following financial support. J.V. Burke was supported in part by the U.S. National Science Foundation grant DMS-1514559. F.E. Curtis was supported in part by the U.S. Department of Energy grant DE-SC0010615. A.S. Lewis was supported in part by the U.S. National Science Foundation grant DMS-1613996. M.L. Overton was supported in part by the U.S. National Science Foundation grant DMS-1620083. L.E.A. Simões was supported in part by the São Paulo Research Foundation (FAPESP), Brazil, under grants 2016/22989-2 and 2017/07265-0.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.A. (2020). Gradient Sampling Methods for Nonsmooth Optimization. In: Bagirov, A., Gaudioso, M., Karmitsa, N., Mäkelä, M., Taheri, S. (eds) Numerical Nonsmooth Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-34910-3_6

Download citation

Publish with us

Policies and ethics