Level-set methods for convex optimization

Aravkin, Aleksandr Y.; Burke, James V.; Drusvyatskiy, Dmitry; Friedlander, Michael P.; Roy, Scott

doi:10.1007/s10107-018-1351-8

Level-set methods for convex optimization

Full Length Paper
Series B
Published: 04 December 2018

Volume 174, pages 359–390, (2019)
Cite this article

Mathematical Programming Submit manuscript

Aleksandr Y. Aravkin ORCID: orcid.org/0000-0002-1875-1801¹,
James V. Burke²,
Dmitry Drusvyatskiy²,
Michael P. Friedlander³ &
…
Scott Roy²

1568 Accesses
18 Citations
Explore all metrics

Abstract

Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective with one of the constraint functions, and instead approximately solves a sequence of parametric level-set problems. Two Newton-like zero-finding procedures for nonsmooth convex functions, based on inexact evaluations and sensitivity information, are introduced. It is shown that they lead to efficient solution schemes for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and gauge optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Article 05 April 2024

Francis Bach

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

Finding global minima via kernel approximations

Article 04 April 2024

Alessandro Rudi, Ulysse Marteau-Ferey & Francis Bach

References

Aravkin, A.Y., Burke, J., Friedlander, M.P.: Variational properties of value functions. SIAM J. Optim. 23(3), 1689–1717 (2013)
Article MathSciNet MATH Google Scholar
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Article MathSciNet MATH Google Scholar
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)
Book MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
MATH Google Scholar
Biswas, P., Ye, Y.: Semidefinite programming for ad hoc wireless sensor network localization. In Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pp. 46–54. ACM (2004)
Biswas, P., Liang, T.C., Toh, K.C., Ye, Y., Wang, T.C.: Semidefinite programming approaches for sensor network localization with noisy distance measurements. IEEE Trans. Autom. Sci. Eng. 3(4), 360–371 (2006)
Article Google Scholar
Biswas, P., Lian, T.C., Wang, T.C., Ye, Y.: Semidefinite programming based algorithms for sensor network localization. ACM Trans. Sens. Netw. (TOSN) 2(2), 188–220 (2006)
Article Google Scholar
Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization. Theory and Examples. CMS Books in Mathematics/ouvrages de Mathématiques de la SMC, 3. Springer, New York (2000)
Book MATH Google Scholar
Boyd, S., Parikh, N., Chu, R., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Brucker, P.: An $o(n)$ algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3), 163–166 (1984)
Article MathSciNet MATH Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. Assoc. Comput. Mach. 58(3), 1–37 (2011)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)
Article MathSciNet MATH Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11–11137 (2011)
Article MathSciNet MATH Google Scholar
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66, 1241–1274 (2012)
Article MathSciNet MATH Google Scholar
Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–2357 (2012)
Article MathSciNet MATH Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)
Article MathSciNet MATH Google Scholar
Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148(1–2), 143–180 (2014)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Pataki, G., Wolkowicz, H.: Coordinate shadows of semidefinite and Euclidean distance matrices. SIAM J. Optim. 25(2), 1160–1178 (2015)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Krislock, N., Voronin, Y.L., Wolkowicz, H.: Noisy Euclidean distance realization: robust facial reduction and the Pareto frontier. Preprint arXiv:1410.6852 (2014)
Ennis, R.H., McGuire, G.C.: Computer Algebra Recipes: A Gourmet’s Guide to the Mathematical Models of Science. Springer, Berlin (2001)
Book MATH Google Scholar
Fazel, M.: Matrix rank minimization with applications. Eng. Dept, Stanford University, PhD diss, Elec (2002)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)
Article MathSciNet Google Scholar
Friedlander, M.P., Macêdo, I., Pong, T.K.: Gauge optimization and duality. SIAM J. Optim. 24(4), 1999–2022 (2014). https://doi.org/10.1137/130940785
Article MathSciNet MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Hager, W.W., Huang, S.J., Pardalos, P.M., Prokopyev, O.A. (eds.): Multiscale Optimization Methods and Applications, Nonconvex Optimization and Its Applications, vol. 82. Springer (2006)
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152(1–2, Ser. A), 75–112 (2015)
Article MathSciNet MATH Google Scholar
Martin, J.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: Proc. 30th Intern. Conf. Machine Learning (ICML-13), pp. 427–435 (2013)
Krislock, N., Wolkowicz, H.: Explicit sensor network localization using semidefinite representations and facial reductions. SIAM J. Optim. 20(5), 2679–2708 (2010)
Article MathSciNet MATH Google Scholar
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6. SIAM, Philadelphia (1998)
Book MATH Google Scholar
Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud. 3, 95–109 (1975)
Article MATH Google Scholar
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1, Ser. B), 111–147 (1995). (Nondifferentiable and large-scale optimization (Geneva, 1992))
Article MathSciNet MATH Google Scholar
Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)
Article MathSciNet MATH Google Scholar
Ling, S., Strohmer, T.: Self-calibration and biconvex compressive sensing. CoRR arXiv:1501.06864 (2015)
Luke, R., Burke, J., Lyons, R.: Optical wavefront reconstruction: theory and numerical methods. SIAM Rev. 44, 169–224 (2002)
Article MathSciNet MATH Google Scholar
Markowitz, H.M.: Mean-Variance Analysis in Portfolio Choice and Capital Markets. Frank J, Fabozzi Associates, New Hope (1987)
MATH Google Scholar
Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 431–441 (1963)
Article MathSciNet MATH Google Scholar
Miettinen, K.: Nonlinear Multi-objective Optimization. Springer, Berlin (1999)
Google Scholar
Morrison, D.D.: Methods for nonlinear least squares problems and convergence proofs. In: Lorell, J., Yagi, F. (eds.)Proceedings of the Seminar on Tracking Programs and Orbit Determination, pp. 1–9. Jet Propulsion Laboratory, Pasadena (1960)
Nemirovski, A.: Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht (2004)
Book MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
Article MathSciNet MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet MATH Google Scholar
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet MATH Google Scholar
Renegar, J.: Linear programming, complexity theory and elementary functional analysis. Math. Program. 70(3, Ser. A), 279–351 (1995)
MathSciNet MATH Google Scholar
Renegar, J.: A framework for applying subgradient methods to conic optimization problems. Preprint arXiv:1503.02611 (2015)
Rockafellar, R.T.: Convex Analysis. Priceton Landmarks in Mathematics. Princeton University Press, Princeton (1970)
Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Article MathSciNet MATH Google Scholar
van den Berg, E., Friedlander, M.P.: Sparse optimization with least-squares constraints. SIAM J. Optim. 21(4), 1201–1229 (2011)
Article MathSciNet MATH Google Scholar
van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)
Article MathSciNet MATH Google Scholar
Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Math. Program. 149(1–2), 47–81 (2015)
Article MathSciNet MATH Google Scholar
Weinberger, K.Q., Sha, F., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the Twenty-first International Conference on Machine Learning. ICML ’04, p. 106. ACM, New York (2004)
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud. 3, 145–173 (1975)
Article MathSciNet MATH Google Scholar
Wright, J., Ganesh, A., Rao, S., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization. In: Neural Information Processing Systems (NIPS) (2009)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Article MathSciNet MATH Google Scholar
Yin, W.: Analysis and generalizations of the linearized bregman method. SIAM J. Imaging Sci. 3(4), 856–877 (2010)
Article MathSciNet MATH Google Scholar
Zhang, Z., Liang, X., Ganesh, A., Ma, Y.: TILT: transform invariant low-rank textures. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) Computer Vision—Accv 2010. Vol. 6494 of Lecture Notes in Computer Science, pp. 314–328. Springer, Berlin (2011)
Zheng, P., Aravkin, A.Y., Ramamurthy, K.N., Thiagarajan, J.J: Learning robust representations for computer vision. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2017)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors extend their sincere thanks to three anonymous referees who provided an extensive list of corrections and suggestions that helped us to streamline our presentation.

Author information

Authors and Affiliations

Department of Applied Mathematics, University of Washington, Seattle, WA, USA
Aleksandr Y. Aravkin
Department of Mathematics, University of Washington, Seattle, WA, USA
James V. Burke, Dmitry Drusvyatskiy & Scott Roy
Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
Michael P. Friedlander

Authors

Aleksandr Y. Aravkin
View author publications
You can also search for this author in PubMed Google Scholar
James V. Burke
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Drusvyatskiy
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. Friedlander
View author publications
You can also search for this author in PubMed Google Scholar
Scott Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandr Y. Aravkin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Aleksandr Y. Aravkin: Research supported by the Washington Research Foundation Data Science Professorship. James V. Burke: Research supported in part by the NSF award DMS-1514559. Dmitry Drusvyatskiy: Research supported by the AFOSR YIP award FA9550-15-1-0237. Michael P. Friedlander: Research supported by the ONR award N00014-16-1-2242. Scott Roy: Research supported in part by the AFOSR YIP award FA9550-15-1-0237.

A Proofs

Proof

(Proof of Theorem 2.1) Since f is convex, the subdifferential $\partial f(\tau )$ is nonempty for all $\tau \in (a,b)$. The claim concerning finite termination is easy to deduce from convexity; we leave the details to the reader. Suppose neither sequence terminates finitely at $\tau _*$. Let us first consider the Newton iteration. Convexity of f immediately implies that the sequence $\tau _i$ is well-defined and satisfies $\tau _0<\tau _1<\tau _2<\dots < \tau _*$. Monotonicity of the subdifferential then implies $g_0\le g_1\le g_2\le \dots \le g_*<0$. Due to the inequalities $f(\tau _*)+\bar{g}(\tau _k-\tau _*)\le f(\tau _k)$ and $g_k<0$, we have

$$\begin{aligned} \frac{f(\tau _k)-f(\tau _*)}{g_k}\le -\frac{g_*}{g_k}(\tau _*-\tau _k), \end{aligned}$$

and so

$$\begin{aligned} 0<\tau _*-\tau _{k+1}= \tau _*-\tau _k+\frac{f(\tau _k)-f(\tau _*)}{g_k} \le \left( 1-\frac{g_*}{g_k}\right) (\tau _*-\tau _k). \end{aligned}$$

Upper semi-continuity of $\partial f$ on its domain implies $g_k\uparrow g_*$. Hence $\tau _k$ converge q-superlinearly to $\tau _*$.

Now consider the secant iteration. As in the Newton iteration, it is immediate from convexity that the sequence $\tau _i$ is well-defined and satisfies $\tau _0<\tau _1<\tau _2<\dots <\tau _*$. Monotonicity of the subdifferential then implies $g_0\le g_1\le g_2\le \dots \le g_*<0$. We have

$$\begin{aligned}0< g_*(\tau _k- \tau _*)\le f(\tau _k)-f(\tau _*),\end{aligned}$$

and $f(\tau _{k-1})+g_{k-1}(\tau _k-\tau _{k-1})\le f(\tau _k)$, and hence

$$\begin{aligned} \frac{\tau _k-\tau _{k-1}}{f(\tau _k)-f(\tau _{k-1})}(f(\tau _k)-f(\tau _*)) \le \frac{f(\tau _k)-f(\tau _*)}{g_{k-1}}<0. \end{aligned}$$

Combining the two inequalities yields

$$\begin{aligned} \frac{f(\tau _k)-f(\tau _*)}{f(\tau _k)-f(\tau _{k-1})}(\tau _k-\tau _{k-1}) \le \frac{f(\tau _k)-f(\tau _*)}{g_{k-1}}\le \frac{g_*}{g_{k-1}}(\tau _k-\tau _*)<0. \end{aligned}$$

Consequently, we deduce

$$\begin{aligned} 0<\tau _* -\tau _{k+1}=\tau _*-\tau _k+ \frac{f(\tau _k)-f(\tau _*)}{f(\tau _k)-f(\tau _{k-1})}(\tau _k-\tau _{k-1}) \le \left( 1-\frac{g_*}{g_{k-1}}\right) (\tau _*-\tau _k). \end{aligned}$$

The result follows. $\square $

Proof

(Proof of Theorem 2.2) It is easy to see by convexity that the iterates $\tau _k$ are strictly increasing and satisfy $f(\tau _k) >0$. For each index $j \ge 2$ for which the algorithm has not terminated, define the following quantities:

$$\begin{aligned} h_{j}:=\tau _j-\tau _{j-1},\quad \quad \theta _j:=\frac{s_{j}}{s_{j-1}},\quad \text { and } \quad \gamma _j:=\frac{\ell _j}{\ell _{j-1}}. \end{aligned}$$

Note that using the equation $\tau _{j-1}-\tau _j = \frac{\ell _{j-1}}{s_{j-1}}$, we can write $\theta _j=\frac{u_{j-1} - \ell _j}{\ell _{j-1}}$. Clearly then the bound, $0\le \theta _j\le \alpha -\gamma _j$, is valid. Define now constants $\beta _j\in [0,1]$ by the equation $\gamma _j = \beta _j \alpha $. Suppose $k \ge 2$ is an index at which the algorithm has not terminated, i.e., $u_k > \epsilon $. Taking into account the inequality $\ell _k \ge \frac{u_k}{\alpha }> \frac{\epsilon }{\alpha }$, we deduce

$$\begin{aligned} \frac{\epsilon }{\alpha } \le \ell _k = \ell _1 \prod _{j=2}^k \gamma _j \le C \alpha ^{k-1} \prod _{j=2}^k \beta _j .\ \end{aligned}$$

(A.1)

The defining equation for $\tau _{k+1}$ and the definition of $\theta _j$ yield the equality

$$\begin{aligned} h_{k+1} = \frac{\ell _k}{|s_k|} = \frac{\ell _k}{|s_1|} \cdot \prod _{j=2}^k \theta _j^{-1}. \end{aligned}$$

The bounds $\tau _* - \tau _1 \ge h_{k+1}$, $\ell _k \ge \frac{\epsilon }{\alpha }$, and $\theta _j \le \alpha - \gamma _j$ imply

$$\begin{aligned} \tau _*-\tau _1 \ge \frac{\ell _k}{|s_1|} \cdot \prod _{j=2}^k \theta _j^{-1} \ge \frac{\epsilon }{\alpha |s_1|} (\alpha ^{-1})^{k-1} \prod _{j=2}^k (1-\beta _j)^{-1}, \end{aligned}$$

and rearranging gives

$$\begin{aligned} \epsilon \le (\tau _*-\tau _1) | s_1 | \alpha ^k \prod _{j=2}^k (1 - \beta _j) \le C \alpha ^k \prod _{j=2}^k (1 - \beta _j). \end{aligned}$$

(A.2)

Combining (A.1) and (A.2), we get

$$\begin{aligned} \epsilon \le C \alpha ^k \min \left\{ \prod _{j=2}^k \beta _j, \ \prod _{j=2}^k (1 - \beta _j) \right\} . \end{aligned}$$

(A.3)

One the other hand, observe

$$\begin{aligned} \left( \prod _{j=2}^k \beta _j\right) \left( \prod _{j=2}^k ( 1- \beta _j)\right) = \prod _{j=2}^k \beta _j( 1- \beta _j) \le 0.5^{2(k-1)}, \end{aligned}$$

and hence

$$\begin{aligned} \min \left\{ \prod _{j=2}^k \beta _j, \ \prod _{j=2}^k (1 - \beta _j) \right\} \le 0.5^{k-1}. \end{aligned}$$

(A.4)

Combining Eqs. (A.4) and (A.3), the claimed estimate $ k-1 \le \log _{2/\alpha } \left( \frac{ \alpha C }{ \epsilon } \right) $ follows.

Proof

(Proof of Theorem 2.3) The proof is identical to the proof of Theorem 2.2, except for some minor modifications. The only nontrivial change is how we arrive at the bound $\theta _j \le \alpha - \gamma _j$. For this, observe $\tau _{j-1} - \tau _j = \ell _{j-1}/s_{j-1}$, and because the function $\tau \mapsto \ell _j + s_j ( \tau - \tau _j)$ minorizes f, we see

$$\begin{aligned} u_{j-1}&\ge \ell _j + s_j( \tau _{j-1} - \tau _j ) = \ell _j + s_j \left( \frac{\ell _{j-1}}{s_{j-1}} \right) = \ell _j + \theta _j \ell _{j-1}. \end{aligned}$$

After rearranging, we get the desired upper bound on $\theta _j$:

$$\begin{aligned} \theta _j&\le \frac{u_{j-1} - \ell _j}{\ell _{j-1}} \le \alpha - \gamma _j. \end{aligned}$$

Finally, we remark that with the approximate Newton method, we can start indexing at $j = 0$ instead of $j = 1$. This explains the different constants in the convergence result.

Lemma A.1

(Concavity of the parametric support function) For any convex function $f:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ and vector $z\in \mathbb {R}^n$, the univariate function $t\mapsto \delta ^*_{[f\le t]}(z)$ is concave.

Proof

It follows from convexity of f that

$$\begin{aligned} \lambda \cdot [f\le a]+(1-\lambda ) \cdot [f\le b]\subseteq [f\le \lambda a+(1-\lambda ) b] \qquad \forall a,b\in \mathbb {R}\text { and } \lambda \in [0,1], \end{aligned}$$

where $[f\le \alpha ]$ defines the $\alpha $-level set of f, and the summation of the level sets indicates their Minkowski (i.e., direct) sum. Moreover, for any convex sets $\mathcal C$ and $\mathcal D$ such that $\mathcal C\subseteq \mathcal D$, $\delta ^*_{\mathcal C}\le \delta ^*_{\mathcal D}$. Thus,

$$\begin{aligned} \lambda \cdot \delta ^*_{[f\le a]}(z)+(1-\lambda ) \cdot \delta ^*_{[f\le b]}(z)=\delta ^*_{\lambda \cdot [f\le a]+(1-\lambda ) \cdot [f\le b]}(z)\le \delta ^*_{[f\le \lambda a+(1-\lambda ) b]}(z), \end{aligned}$$

which implies concavity of the function at hand..

Proof

(Proof of Proposition 3.1) For this proof only, let $\Vert \cdot \Vert $ denote the 2-norm. Note the inclusion $s/\Vert y\Vert \in \partial _{\tau }\varPhi _1\left( y/\Vert y\Vert ,\,\tau \right) $. Use the same computation from (2.2) to deduce that the affine function

$$\begin{aligned} \tau '\mapsto (\hat{\ell }-\sigma )-\frac{s}{\Vert y\Vert }(\tau '-\tau ) \end{aligned}$$

minorizes $f_1$.

From the definition of $\hat{\ell }$, $\varPhi _1$, and $\varPhi _2$, it follows that

$$\begin{aligned} \begin{aligned} \frac{u-\sigma }{\hat{\ell }-\sigma } = \frac{(u-\sigma )\Vert y\Vert }{\varPhi _2(y,\tau ) + \frac{1}{2}\Vert y\Vert ^2 - \sigma \Vert y\Vert } = \frac{2(u-\sigma )\Vert y\Vert }{\ell ^2 + \Vert y\Vert ^2 - 2\sigma \Vert y\Vert }. \end{aligned} \end{aligned}$$

(A.5)

Taking into account the equivalence

$$\begin{aligned} \frac{u-\sigma }{\ell -\sigma } \le \alpha \quad \iff \quad \frac{u + (\alpha -1)\sigma }{\alpha } \le \ell , \end{aligned}$$

we deduce

$$\begin{aligned} \ell ^2+\Vert y\Vert ^2-2\sigma \Vert y\Vert\ge & {} \alpha ^{-2} \Big ( (u+(\alpha -1)\sigma )^2+\Vert \alpha y\Vert ^2-2\sigma \alpha \Vert \alpha y\Vert \Big )\\\ge & {} 2\alpha ^{-1}(u-\sigma )\Vert y\Vert , \end{aligned}$$

where the rightmost inequality follows from the computation

$$\begin{aligned} \begin{aligned} (u + [\alpha - 1]\sigma )^2&+ \Vert \alpha y\Vert ^2 - 2\alpha \sigma \Vert \alpha y\Vert - 2(u-\sigma )\Vert \alpha y\Vert \\ {}&= \left( u + [\alpha -1]\sigma \right) ^2 + \Vert \alpha y\Vert ^2 - 2\Vert \alpha y\Vert (u + [\alpha -1]\sigma ) \\ {}&= \left( u + [\alpha -1]\sigma - \Vert \alpha y\Vert \right) ^2 \ge 0. \end{aligned} \end{aligned}$$

Because the right-hand side of (A.5) is non-negative, we can deduce that $\hat{\ell }\ge \sigma $. Finally, the required inequality $(u-\sigma )/(\hat{\ell }-\sigma )\le \alpha $ also follows from (A.5).

Lemma A.2

$(-\lambda _{\min })^{\star }(y) = \delta _\mathcal {S}(-y)$, where $\mathcal {S}= {\mathcal {K}}^{*} \cap \left\{ x \mid {{\langle }e},{x{\rangle }} = 1 \right\} $.

Proof

The following formula is established in [49]:

$$\begin{aligned} \partial (-\lambda _{\min }) ( x ) = \left\{ -y \mid {{\langle }y},{e{\rangle }} = 1, \ {{\langle } y},{ z - (x - \lambda _{\min }(x) e ) {\rangle }} \ge 0 \text { for all } z \in \mathcal {K} \right\} \end{aligned}$$

or equivalently

$$\begin{aligned} \partial (-\lambda _{\min }) ( x )&= \left\{ -y \,\left| \ {{\langle }y},{e{\rangle }} = 1, -y\in N_{\mathcal {K}}\left( x - \lambda _{\min }(x) e\right) \right. \right\} \\&= \left\{ -y\,\left| \ {{\langle }y},{e{\rangle }} = 1, \ y \in \mathcal {K}^*, \ 0 = \lambda _{\min }(x) - {{\langle }y},{x{\rangle }} \right. \right\} . \end{aligned}$$

Here the symbol $N_{\mathcal {K}}$ denotes the normal cone to $\mathcal {K}$. Now for any $y\in \partial (-\lambda _{\min }) ( x )$, we have $\langle x,y\rangle =-\lambda _{\min }(x)$. Observe $\mathrm {range}\,\partial (-\lambda _{\min })=-\mathcal {S}$. Hence by the equality in the Fenchel-Young inequality, for any $y\in -\mathcal {S}$, we have $(-\lambda _{\min })^{\star }(y)=0$. On the other hand, for any y with $\langle y,e\rangle \ne -1$, we have $(-\lambda _{\min })^{\star }(y)\ge \langle te,y\rangle -(-\lambda _{\min })(te)=t(\langle y,e\rangle +1)$ for any $t \ge 0$. Letting $t\rightarrow \infty $, we deduce $(-\lambda _{\min })^{\star }(y)=+\infty $. Similarly, consider $y\notin -\mathcal {K}^*$. Then we may find some $x\in \mathcal {K}$ satisfying $\langle x,y\rangle >0$. We deduce $(-\lambda _{\min })^{\star }(y)\ge \langle tx,y\rangle -(-\lambda _{\min })(tx)=t(\langle y,x\rangle -(-\lambda _{\min })(x))$ for any $t \ge 0$. Letting $t\rightarrow \infty $, we deduce $(-\lambda _{\min })^{\star }(y)=+\infty $. We deduce that $(-\lambda _{\min })^{\star }$ is the indicator function of $-\mathcal {S}$, as claimed. $\square $

Remark A.1

(Projection onto a conic slice sets) This remark is standard. Fix a proper convex cone ${\mathcal {K}}$ and consider the projection problem

$$\begin{aligned} \min _{x}\ \left\{ \tfrac{1}{2}\Vert x-z\Vert ^2\,\left| \ \langle c,x\rangle =1,\, x\in \mathcal {K}\right. \right\} . \end{aligned}$$

Equivalently, we can consider the univariate concave maximization problem

$$\begin{aligned} \max _{\beta }\min _{x\in {\mathcal {K}}}\, L(x,\beta )&= \max _{\beta }\min _{x\in {\mathcal {K}}}\, \tfrac{1}{2}\Vert x-z\Vert ^2+\beta (\langle c,x\rangle -1)\\&=\max _{\beta }\min _{x\in {\mathcal {K}}}\, \tfrac{1}{2}\Vert x-(z-\beta c)\Vert ^2+\beta (\langle c,z\rangle -1)-\tfrac{1}{2}\beta ^2\Vert c\Vert ^2\\&=\max _{\beta }~ \tfrac{1}{2}\hbox {dist}^2_{\mathcal {K}}(z-\beta c)+\beta (\langle c,z\rangle -1)-\tfrac{1}{2}\beta ^2\Vert c\Vert ^2. \end{aligned}$$

We can solve this problem for example by bisection, provided projections onto ${\mathcal {K}}$ are available.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aravkin, A.Y., Burke, J.V., Drusvyatskiy, D. et al. Level-set methods for convex optimization. Math. Program. 174, 359–390 (2019). https://doi.org/10.1007/s10107-018-1351-8

Download citation

Received: 15 February 2017
Accepted: 08 November 2018
Published: 04 December 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s10107-018-1351-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Level-set methods for convex optimization

Abstract

Access this article

Similar content being viewed by others

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Proofs

Proof

Proof

Proof

Lemma A.1

Proof

Proof

Lemma A.2

Proof

Remark A.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Level-set methods for convex optimization

Abstract

Access this article

Similar content being viewed by others

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Proofs

A Proofs

Proof

Proof

Proof

Lemma A.1

Proof

Proof

Lemma A.2

Proof

Remark A.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation