Skip to main content
Log in

Level-set methods for convex optimization

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective with one of the constraint functions, and instead approximately solves a sequence of parametric level-set problems. Two Newton-like zero-finding procedures for nonsmooth convex functions, based on inexact evaluations and sensitivity information, are introduced. It is shown that they lead to efficient solution schemes for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and gauge optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aravkin, A.Y., Burke, J., Friedlander, M.P.: Variational properties of value functions. SIAM J. Optim. 23(3), 1689–1717 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)

    Book  MATH  Google Scholar 

  6. Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)

    MATH  Google Scholar 

  7. Biswas, P., Ye, Y.: Semidefinite programming for ad hoc wireless sensor network localization. In Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pp. 46–54. ACM (2004)

  8. Biswas, P., Liang, T.C., Toh, K.C., Ye, Y., Wang, T.C.: Semidefinite programming approaches for sensor network localization with noisy distance measurements. IEEE Trans. Autom. Sci. Eng. 3(4), 360–371 (2006)

    Article  Google Scholar 

  9. Biswas, P., Lian, T.C., Wang, T.C., Ye, Y.: Semidefinite programming based algorithms for sensor network localization. ACM Trans. Sens. Netw. (TOSN) 2(2), 188–220 (2006)

    Article  Google Scholar 

  10. Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization. Theory and Examples. CMS Books in Mathematics/ouvrages de Mathématiques de la SMC, 3. Springer, New York (2000)

    Book  MATH  Google Scholar 

  11. Boyd, S., Parikh, N., Chu, R., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  12. Brucker, P.: An \(o(n)\) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3), 163–166 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  13. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. Assoc. Comput. Mach. 58(3), 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11–11137 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66, 1241–1274 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–2357 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148(1–2), 143–180 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  20. Drusvyatskiy, D., Pataki, G., Wolkowicz, H.: Coordinate shadows of semidefinite and Euclidean distance matrices. SIAM J. Optim. 25(2), 1160–1178 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Drusvyatskiy, D., Krislock, N., Voronin, Y.L., Wolkowicz, H.: Noisy Euclidean distance realization: robust facial reduction and the Pareto frontier. Preprint arXiv:1410.6852 (2014)

  22. Ennis, R.H., McGuire, G.C.: Computer Algebra Recipes: A Gourmet’s Guide to the Mathematical Models of Science. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  23. Fazel, M.: Matrix rank minimization with applications. Eng. Dept, Stanford University, PhD diss, Elec (2002)

  24. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  25. Friedlander, M.P., Macêdo, I., Pong, T.K.: Gauge optimization and duality. SIAM J. Optim. 24(4), 1999–2022 (2014). https://doi.org/10.1137/130940785

    Article  MathSciNet  MATH  Google Scholar 

  26. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  27. Hager, W.W., Huang, S.J., Pardalos, P.M., Prokopyev, O.A. (eds.): Multiscale Optimization Methods and Applications, Nonconvex Optimization and Its Applications, vol. 82. Springer (2006)

  28. Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152(1–2, Ser. A), 75–112 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Martin, J.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: Proc. 30th Intern. Conf. Machine Learning (ICML-13), pp. 427–435 (2013)

  30. Krislock, N., Wolkowicz, H.: Explicit sensor network localization using semidefinite representations and facial reductions. SIAM J. Optim. 20(5), 2679–2708 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6. SIAM, Philadelphia (1998)

    Book  MATH  Google Scholar 

  32. Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud. 3, 95–109 (1975)

    Article  MATH  Google Scholar 

  33. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1, Ser. B), 111–147 (1995). (Nondifferentiable and large-scale optimization (Geneva, 1992))

    Article  MathSciNet  MATH  Google Scholar 

  34. Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  35. Ling, S., Strohmer, T.: Self-calibration and biconvex compressive sensing. CoRR arXiv:1501.06864 (2015)

  36. Luke, R., Burke, J., Lyons, R.: Optical wavefront reconstruction: theory and numerical methods. SIAM Rev. 44, 169–224 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  37. Markowitz, H.M.: Mean-Variance Analysis in Portfolio Choice and Capital Markets. Frank J, Fabozzi Associates, New Hope (1987)

    MATH  Google Scholar 

  38. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 431–441 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  39. Miettinen, K.: Nonlinear Multi-objective Optimization. Springer, Berlin (1999)

    Google Scholar 

  40. Morrison, D.D.: Methods for nonlinear least squares problems and convergence proofs. In: Lorell, J., Yagi, F. (eds.)Proceedings of the Seminar on Tracking Programs and Orbit Determination, pp. 1–9. Jet Propulsion Laboratory, Pasadena (1960)

  41. Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  42. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht (2004)

    Book  MATH  Google Scholar 

  43. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  44. Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  45. Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  46. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  47. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  48. Renegar, J.: Linear programming, complexity theory and elementary functional analysis. Math. Program. 70(3, Ser. A), 279–351 (1995)

    MathSciNet  MATH  Google Scholar 

  49. Renegar, J.: A framework for applying subgradient methods to conic optimization problems. Preprint arXiv:1503.02611 (2015)

  50. Rockafellar, R.T.: Convex Analysis. Priceton Landmarks in Mathematics. Princeton University Press, Princeton (1970)

    Google Scholar 

  51. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  52. van den Berg, E., Friedlander, M.P.: Sparse optimization with least-squares constraints. SIAM J. Optim. 21(4), 1201–1229 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  53. van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  54. Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Math. Program. 149(1–2), 47–81 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  55. Weinberger, K.Q., Sha, F., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the Twenty-first International Conference on Machine Learning. ICML ’04, p. 106. ACM, New York (2004)

  56. Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud. 3, 145–173 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  57. Wright, J., Ganesh, A., Rao, S., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization. In: Neural Information Processing Systems (NIPS) (2009)

  58. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  59. Yin, W.: Analysis and generalizations of the linearized bregman method. SIAM J. Imaging Sci. 3(4), 856–877 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  60. Zhang, Z., Liang, X., Ganesh, A., Ma, Y.: TILT: transform invariant low-rank textures. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) Computer Vision—Accv 2010. Vol. 6494 of Lecture Notes in Computer Science, pp. 314–328. Springer, Berlin (2011)

  61. Zheng, P., Aravkin, A.Y., Ramamurthy, K.N., Thiagarajan, J.J: Learning robust representations for computer vision. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2017)

  62. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors extend their sincere thanks to three anonymous referees who provided an extensive list of corrections and suggestions that helped us to streamline our presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr Y. Aravkin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Aleksandr Y. Aravkin: Research supported by the Washington Research Foundation Data Science Professorship. James V. Burke: Research supported in part by the NSF award DMS-1514559. Dmitry Drusvyatskiy: Research supported by the AFOSR YIP award FA9550-15-1-0237. Michael P. Friedlander: Research supported by the ONR award N00014-16-1-2242. Scott Roy: Research supported in part by the AFOSR YIP award FA9550-15-1-0237.

A Proofs

A Proofs

Proof

(Proof of Theorem 2.1) Since f is convex, the subdifferential \(\partial f(\tau )\) is nonempty for all \(\tau \in (a,b)\). The claim concerning finite termination is easy to deduce from convexity; we leave the details to the reader. Suppose neither sequence terminates finitely at \(\tau _*\). Let us first consider the Newton iteration. Convexity of f immediately implies that the sequence \(\tau _i\) is well-defined and satisfies \(\tau _0<\tau _1<\tau _2<\dots < \tau _*\). Monotonicity of the subdifferential then implies \(g_0\le g_1\le g_2\le \dots \le g_*<0\). Due to the inequalities \(f(\tau _*)+\bar{g}(\tau _k-\tau _*)\le f(\tau _k)\) and \(g_k<0\), we have

$$\begin{aligned} \frac{f(\tau _k)-f(\tau _*)}{g_k}\le -\frac{g_*}{g_k}(\tau _*-\tau _k), \end{aligned}$$

and so

$$\begin{aligned} 0<\tau _*-\tau _{k+1}= \tau _*-\tau _k+\frac{f(\tau _k)-f(\tau _*)}{g_k} \le \left( 1-\frac{g_*}{g_k}\right) (\tau _*-\tau _k). \end{aligned}$$

Upper semi-continuity of \(\partial f\) on its domain implies \(g_k\uparrow g_*\). Hence \(\tau _k\) converge q-superlinearly to \(\tau _*\).

Now consider the secant iteration. As in the Newton iteration, it is immediate from convexity that the sequence \(\tau _i\) is well-defined and satisfies \(\tau _0<\tau _1<\tau _2<\dots <\tau _*\). Monotonicity of the subdifferential then implies \(g_0\le g_1\le g_2\le \dots \le g_*<0\). We have

$$\begin{aligned}0< g_*(\tau _k- \tau _*)\le f(\tau _k)-f(\tau _*),\end{aligned}$$

and \(f(\tau _{k-1})+g_{k-1}(\tau _k-\tau _{k-1})\le f(\tau _k)\), and hence

$$\begin{aligned} \frac{\tau _k-\tau _{k-1}}{f(\tau _k)-f(\tau _{k-1})}(f(\tau _k)-f(\tau _*)) \le \frac{f(\tau _k)-f(\tau _*)}{g_{k-1}}<0. \end{aligned}$$

Combining the two inequalities yields

$$\begin{aligned} \frac{f(\tau _k)-f(\tau _*)}{f(\tau _k)-f(\tau _{k-1})}(\tau _k-\tau _{k-1}) \le \frac{f(\tau _k)-f(\tau _*)}{g_{k-1}}\le \frac{g_*}{g_{k-1}}(\tau _k-\tau _*)<0. \end{aligned}$$

Consequently, we deduce

$$\begin{aligned} 0<\tau _* -\tau _{k+1}=\tau _*-\tau _k+ \frac{f(\tau _k)-f(\tau _*)}{f(\tau _k)-f(\tau _{k-1})}(\tau _k-\tau _{k-1}) \le \left( 1-\frac{g_*}{g_{k-1}}\right) (\tau _*-\tau _k). \end{aligned}$$

The result follows. \(\square \)

Proof

(Proof of Theorem 2.2) It is easy to see by convexity that the iterates \(\tau _k\) are strictly increasing and satisfy \(f(\tau _k) >0\). For each index \(j \ge 2\) for which the algorithm has not terminated, define the following quantities:

$$\begin{aligned} h_{j}:=\tau _j-\tau _{j-1},\quad \quad \theta _j:=\frac{s_{j}}{s_{j-1}},\quad \text { and } \quad \gamma _j:=\frac{\ell _j}{\ell _{j-1}}. \end{aligned}$$

Note that using the equation \(\tau _{j-1}-\tau _j = \frac{\ell _{j-1}}{s_{j-1}}\), we can write \(\theta _j=\frac{u_{j-1} - \ell _j}{\ell _{j-1}}\). Clearly then the bound, \(0\le \theta _j\le \alpha -\gamma _j\), is valid. Define now constants \(\beta _j\in [0,1]\) by the equation \(\gamma _j = \beta _j \alpha \). Suppose \(k \ge 2\) is an index at which the algorithm has not terminated, i.e., \(u_k > \epsilon \). Taking into account the inequality \(\ell _k \ge \frac{u_k}{\alpha }> \frac{\epsilon }{\alpha }\), we deduce

$$\begin{aligned} \frac{\epsilon }{\alpha } \le \ell _k = \ell _1 \prod _{j=2}^k \gamma _j \le C \alpha ^{k-1} \prod _{j=2}^k \beta _j .\ \end{aligned}$$
(A.1)

The defining equation for \(\tau _{k+1}\) and the definition of \(\theta _j\) yield the equality

$$\begin{aligned} h_{k+1} = \frac{\ell _k}{|s_k|} = \frac{\ell _k}{|s_1|} \cdot \prod _{j=2}^k \theta _j^{-1}. \end{aligned}$$

The bounds \(\tau _* - \tau _1 \ge h_{k+1}\), \(\ell _k \ge \frac{\epsilon }{\alpha }\), and \(\theta _j \le \alpha - \gamma _j\) imply

$$\begin{aligned} \tau _*-\tau _1 \ge \frac{\ell _k}{|s_1|} \cdot \prod _{j=2}^k \theta _j^{-1} \ge \frac{\epsilon }{\alpha |s_1|} (\alpha ^{-1})^{k-1} \prod _{j=2}^k (1-\beta _j)^{-1}, \end{aligned}$$

and rearranging gives

$$\begin{aligned} \epsilon \le (\tau _*-\tau _1) | s_1 | \alpha ^k \prod _{j=2}^k (1 - \beta _j) \le C \alpha ^k \prod _{j=2}^k (1 - \beta _j). \end{aligned}$$
(A.2)

Combining (A.1) and (A.2), we get

$$\begin{aligned} \epsilon \le C \alpha ^k \min \left\{ \prod _{j=2}^k \beta _j, \ \prod _{j=2}^k (1 - \beta _j) \right\} . \end{aligned}$$
(A.3)

One the other hand, observe

$$\begin{aligned} \left( \prod _{j=2}^k \beta _j\right) \left( \prod _{j=2}^k ( 1- \beta _j)\right) = \prod _{j=2}^k \beta _j( 1- \beta _j) \le 0.5^{2(k-1)}, \end{aligned}$$

and hence

$$\begin{aligned} \min \left\{ \prod _{j=2}^k \beta _j, \ \prod _{j=2}^k (1 - \beta _j) \right\} \le 0.5^{k-1}. \end{aligned}$$
(A.4)

Combining Eqs. (A.4) and (A.3), the claimed estimate \( k-1 \le \log _{2/\alpha } \left( \frac{ \alpha C }{ \epsilon } \right) \) follows.

Proof

(Proof of Theorem 2.3) The proof is identical to the proof of Theorem 2.2, except for some minor modifications. The only nontrivial change is how we arrive at the bound \(\theta _j \le \alpha - \gamma _j\). For this, observe \(\tau _{j-1} - \tau _j = \ell _{j-1}/s_{j-1}\), and because the function \(\tau \mapsto \ell _j + s_j ( \tau - \tau _j)\) minorizes f, we see

$$\begin{aligned} u_{j-1}&\ge \ell _j + s_j( \tau _{j-1} - \tau _j ) = \ell _j + s_j \left( \frac{\ell _{j-1}}{s_{j-1}} \right) = \ell _j + \theta _j \ell _{j-1}. \end{aligned}$$

After rearranging, we get the desired upper bound on \(\theta _j\):

$$\begin{aligned} \theta _j&\le \frac{u_{j-1} - \ell _j}{\ell _{j-1}} \le \alpha - \gamma _j. \end{aligned}$$

Finally, we remark that with the approximate Newton method, we can start indexing at \(j = 0\) instead of \(j = 1\). This explains the different constants in the convergence result.

Lemma A.1

(Concavity of the parametric support function) For any convex function \(f:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}\) and vector \(z\in \mathbb {R}^n\), the univariate function \(t\mapsto \delta ^*_{[f\le t]}(z)\) is concave.

Proof

It follows from convexity of f that

$$\begin{aligned} \lambda \cdot [f\le a]+(1-\lambda ) \cdot [f\le b]\subseteq [f\le \lambda a+(1-\lambda ) b] \qquad \forall a,b\in \mathbb {R}\text { and } \lambda \in [0,1], \end{aligned}$$

where \([f\le \alpha ]\) defines the \(\alpha \)-level set of f, and the summation of the level sets indicates their Minkowski (i.e., direct) sum. Moreover, for any convex sets \(\mathcal C\) and \(\mathcal D\) such that \(\mathcal C\subseteq \mathcal D\), \(\delta ^*_{\mathcal C}\le \delta ^*_{\mathcal D}\). Thus,

$$\begin{aligned} \lambda \cdot \delta ^*_{[f\le a]}(z)+(1-\lambda ) \cdot \delta ^*_{[f\le b]}(z)=\delta ^*_{\lambda \cdot [f\le a]+(1-\lambda ) \cdot [f\le b]}(z)\le \delta ^*_{[f\le \lambda a+(1-\lambda ) b]}(z), \end{aligned}$$

which implies concavity of the function at hand..

Proof

(Proof of Proposition 3.1) For this proof only, let \(\Vert \cdot \Vert \) denote the 2-norm. Note the inclusion \(s/\Vert y\Vert \in \partial _{\tau }\varPhi _1\left( y/\Vert y\Vert ,\,\tau \right) \). Use the same computation from (2.2) to deduce that the affine function

$$\begin{aligned} \tau '\mapsto (\hat{\ell }-\sigma )-\frac{s}{\Vert y\Vert }(\tau '-\tau ) \end{aligned}$$

minorizes \(f_1\).

From the definition of \(\hat{\ell }\), \(\varPhi _1\), and \(\varPhi _2\), it follows that

$$\begin{aligned} \begin{aligned} \frac{u-\sigma }{\hat{\ell }-\sigma } = \frac{(u-\sigma )\Vert y\Vert }{\varPhi _2(y,\tau ) + \frac{1}{2}\Vert y\Vert ^2 - \sigma \Vert y\Vert } = \frac{2(u-\sigma )\Vert y\Vert }{\ell ^2 + \Vert y\Vert ^2 - 2\sigma \Vert y\Vert }. \end{aligned} \end{aligned}$$
(A.5)

Taking into account the equivalence

$$\begin{aligned} \frac{u-\sigma }{\ell -\sigma } \le \alpha \quad \iff \quad \frac{u + (\alpha -1)\sigma }{\alpha } \le \ell , \end{aligned}$$

we deduce

$$\begin{aligned} \ell ^2+\Vert y\Vert ^2-2\sigma \Vert y\Vert\ge & {} \alpha ^{-2} \Big ( (u+(\alpha -1)\sigma )^2+\Vert \alpha y\Vert ^2-2\sigma \alpha \Vert \alpha y\Vert \Big )\\\ge & {} 2\alpha ^{-1}(u-\sigma )\Vert y\Vert , \end{aligned}$$

where the rightmost inequality follows from the computation

$$\begin{aligned} \begin{aligned} (u + [\alpha - 1]\sigma )^2&+ \Vert \alpha y\Vert ^2 - 2\alpha \sigma \Vert \alpha y\Vert - 2(u-\sigma )\Vert \alpha y\Vert \\ {}&= \left( u + [\alpha -1]\sigma \right) ^2 + \Vert \alpha y\Vert ^2 - 2\Vert \alpha y\Vert (u + [\alpha -1]\sigma ) \\ {}&= \left( u + [\alpha -1]\sigma - \Vert \alpha y\Vert \right) ^2 \ge 0. \end{aligned} \end{aligned}$$

Because the right-hand side of (A.5) is non-negative, we can deduce that \(\hat{\ell }\ge \sigma \). Finally, the required inequality \((u-\sigma )/(\hat{\ell }-\sigma )\le \alpha \) also follows from (A.5).

Lemma A.2

\((-\lambda _{\min })^{\star }(y) = \delta _\mathcal {S}(-y)\), where \(\mathcal {S}= {\mathcal {K}}^{*} \cap \left\{ x \mid {{\langle }e},{x{\rangle }} = 1 \right\} \).

Proof

The following formula is established in [49]:

$$\begin{aligned} \partial (-\lambda _{\min }) ( x ) = \left\{ -y \mid {{\langle }y},{e{\rangle }} = 1, \ {{\langle } y},{ z - (x - \lambda _{\min }(x) e ) {\rangle }} \ge 0 \text { for all } z \in \mathcal {K} \right\} \end{aligned}$$

or equivalently

$$\begin{aligned} \partial (-\lambda _{\min }) ( x )&= \left\{ -y \,\left| \ {{\langle }y},{e{\rangle }} = 1, -y\in N_{\mathcal {K}}\left( x - \lambda _{\min }(x) e\right) \right. \right\} \\&= \left\{ -y\,\left| \ {{\langle }y},{e{\rangle }} = 1, \ y \in \mathcal {K}^*, \ 0 = \lambda _{\min }(x) - {{\langle }y},{x{\rangle }} \right. \right\} . \end{aligned}$$

Here the symbol \(N_{\mathcal {K}}\) denotes the normal cone to \(\mathcal {K}\). Now for any \(y\in \partial (-\lambda _{\min }) ( x )\), we have \(\langle x,y\rangle =-\lambda _{\min }(x)\). Observe \(\mathrm {range}\,\partial (-\lambda _{\min })=-\mathcal {S}\). Hence by the equality in the Fenchel-Young inequality, for any \(y\in -\mathcal {S}\), we have \((-\lambda _{\min })^{\star }(y)=0\). On the other hand, for any y with \(\langle y,e\rangle \ne -1\), we have \((-\lambda _{\min })^{\star }(y)\ge \langle te,y\rangle -(-\lambda _{\min })(te)=t(\langle y,e\rangle +1)\) for any \(t \ge 0\). Letting \(t\rightarrow \infty \), we deduce \((-\lambda _{\min })^{\star }(y)=+\infty \). Similarly, consider \(y\notin -\mathcal {K}^*\). Then we may find some \(x\in \mathcal {K}\) satisfying \(\langle x,y\rangle >0\). We deduce \((-\lambda _{\min })^{\star }(y)\ge \langle tx,y\rangle -(-\lambda _{\min })(tx)=t(\langle y,x\rangle -(-\lambda _{\min })(x))\) for any \(t \ge 0\). Letting \(t\rightarrow \infty \), we deduce \((-\lambda _{\min })^{\star }(y)=+\infty \). We deduce that \((-\lambda _{\min })^{\star }\) is the indicator function of \(-\mathcal {S}\), as claimed. \(\square \)

Remark A.1

(Projection onto a conic slice sets) This remark is standard. Fix a proper convex cone \({\mathcal {K}}\) and consider the projection problem

$$\begin{aligned} \min _{x}\ \left\{ \tfrac{1}{2}\Vert x-z\Vert ^2\,\left| \ \langle c,x\rangle =1,\, x\in \mathcal {K}\right. \right\} . \end{aligned}$$

Equivalently, we can consider the univariate concave maximization problem

$$\begin{aligned} \max _{\beta }\min _{x\in {\mathcal {K}}}\, L(x,\beta )&= \max _{\beta }\min _{x\in {\mathcal {K}}}\, \tfrac{1}{2}\Vert x-z\Vert ^2+\beta (\langle c,x\rangle -1)\\&=\max _{\beta }\min _{x\in {\mathcal {K}}}\, \tfrac{1}{2}\Vert x-(z-\beta c)\Vert ^2+\beta (\langle c,z\rangle -1)-\tfrac{1}{2}\beta ^2\Vert c\Vert ^2\\&=\max _{\beta }~ \tfrac{1}{2}\hbox {dist}^2_{\mathcal {K}}(z-\beta c)+\beta (\langle c,z\rangle -1)-\tfrac{1}{2}\beta ^2\Vert c\Vert ^2. \end{aligned}$$

We can solve this problem for example by bisection, provided projections onto \({\mathcal {K}}\) are available.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aravkin, A.Y., Burke, J.V., Drusvyatskiy, D. et al. Level-set methods for convex optimization. Math. Program. 174, 359–390 (2019). https://doi.org/10.1007/s10107-018-1351-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1351-8

Keywords

Mathematics Subject Classification

Navigation