Abstract
Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective with one of the constraint functions, and instead approximately solves a sequence of parametric level-set problems. Two Newton-like zero-finding procedures for nonsmooth convex functions, based on inexact evaluations and sensitivity information, are introduced. It is shown that they lead to efficient solution schemes for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and gauge optimization.
Similar content being viewed by others
References
Aravkin, A.Y., Burke, J., Friedlander, M.P.: Variational properties of value functions. SIAM J. Optim. 23(3), 1689–1717 (2013)
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Biswas, P., Ye, Y.: Semidefinite programming for ad hoc wireless sensor network localization. In Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pp. 46–54. ACM (2004)
Biswas, P., Liang, T.C., Toh, K.C., Ye, Y., Wang, T.C.: Semidefinite programming approaches for sensor network localization with noisy distance measurements. IEEE Trans. Autom. Sci. Eng. 3(4), 360–371 (2006)
Biswas, P., Lian, T.C., Wang, T.C., Ye, Y.: Semidefinite programming based algorithms for sensor network localization. ACM Trans. Sens. Netw. (TOSN) 2(2), 188–220 (2006)
Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization. Theory and Examples. CMS Books in Mathematics/ouvrages de Mathématiques de la SMC, 3. Springer, New York (2000)
Boyd, S., Parikh, N., Chu, R., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Brucker, P.: An \(o(n)\) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3), 163–166 (1984)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. Assoc. Comput. Mach. 58(3), 1–37 (2011)
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11–11137 (2011)
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66, 1241–1274 (2012)
Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–2357 (2012)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)
Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148(1–2), 143–180 (2014)
Drusvyatskiy, D., Pataki, G., Wolkowicz, H.: Coordinate shadows of semidefinite and Euclidean distance matrices. SIAM J. Optim. 25(2), 1160–1178 (2015)
Drusvyatskiy, D., Krislock, N., Voronin, Y.L., Wolkowicz, H.: Noisy Euclidean distance realization: robust facial reduction and the Pareto frontier. Preprint arXiv:1410.6852 (2014)
Ennis, R.H., McGuire, G.C.: Computer Algebra Recipes: A Gourmet’s Guide to the Mathematical Models of Science. Springer, Berlin (2001)
Fazel, M.: Matrix rank minimization with applications. Eng. Dept, Stanford University, PhD diss, Elec (2002)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)
Friedlander, M.P., Macêdo, I., Pong, T.K.: Gauge optimization and duality. SIAM J. Optim. 24(4), 1999–2022 (2014). https://doi.org/10.1137/130940785
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2(1), 17–40 (1976)
Hager, W.W., Huang, S.J., Pardalos, P.M., Prokopyev, O.A. (eds.): Multiscale Optimization Methods and Applications, Nonconvex Optimization and Its Applications, vol. 82. Springer (2006)
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152(1–2, Ser. A), 75–112 (2015)
Martin, J.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: Proc. 30th Intern. Conf. Machine Learning (ICML-13), pp. 427–435 (2013)
Krislock, N., Wolkowicz, H.: Explicit sensor network localization using semidefinite representations and facial reductions. SIAM J. Optim. 20(5), 2679–2708 (2010)
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6. SIAM, Philadelphia (1998)
Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud. 3, 95–109 (1975)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1, Ser. B), 111–147 (1995). (Nondifferentiable and large-scale optimization (Geneva, 1992))
Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)
Ling, S., Strohmer, T.: Self-calibration and biconvex compressive sensing. CoRR arXiv:1501.06864 (2015)
Luke, R., Burke, J., Lyons, R.: Optical wavefront reconstruction: theory and numerical methods. SIAM Rev. 44, 169–224 (2002)
Markowitz, H.M.: Mean-Variance Analysis in Portfolio Choice and Capital Markets. Frank J, Fabozzi Associates, New Hope (1987)
Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 431–441 (1963)
Miettinen, K.: Nonlinear Multi-objective Optimization. Springer, Berlin (1999)
Morrison, D.D.: Methods for nonlinear least squares problems and convergence proofs. In: Lorell, J., Yagi, F. (eds.)Proceedings of the Seminar on Tracking Programs and Orbit Determination, pp. 1–9. Jet Propulsion Laboratory, Pasadena (1960)
Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Renegar, J.: Linear programming, complexity theory and elementary functional analysis. Math. Program. 70(3, Ser. A), 279–351 (1995)
Renegar, J.: A framework for applying subgradient methods to conic optimization problems. Preprint arXiv:1503.02611 (2015)
Rockafellar, R.T.: Convex Analysis. Priceton Landmarks in Mathematics. Princeton University Press, Princeton (1970)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
van den Berg, E., Friedlander, M.P.: Sparse optimization with least-squares constraints. SIAM J. Optim. 21(4), 1201–1229 (2011)
van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)
Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Math. Program. 149(1–2), 47–81 (2015)
Weinberger, K.Q., Sha, F., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the Twenty-first International Conference on Machine Learning. ICML ’04, p. 106. ACM, New York (2004)
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud. 3, 145–173 (1975)
Wright, J., Ganesh, A., Rao, S., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization. In: Neural Information Processing Systems (NIPS) (2009)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Yin, W.: Analysis and generalizations of the linearized bregman method. SIAM J. Imaging Sci. 3(4), 856–877 (2010)
Zhang, Z., Liang, X., Ganesh, A., Ma, Y.: TILT: transform invariant low-rank textures. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) Computer Vision—Accv 2010. Vol. 6494 of Lecture Notes in Computer Science, pp. 314–328. Springer, Berlin (2011)
Zheng, P., Aravkin, A.Y., Ramamurthy, K.N., Thiagarajan, J.J: Learning robust representations for computer vision. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2017)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgements
The authors extend their sincere thanks to three anonymous referees who provided an extensive list of corrections and suggestions that helped us to streamline our presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Aleksandr Y. Aravkin: Research supported by the Washington Research Foundation Data Science Professorship. James V. Burke: Research supported in part by the NSF award DMS-1514559. Dmitry Drusvyatskiy: Research supported by the AFOSR YIP award FA9550-15-1-0237. Michael P. Friedlander: Research supported by the ONR award N00014-16-1-2242. Scott Roy: Research supported in part by the AFOSR YIP award FA9550-15-1-0237.
A Proofs
A Proofs
Proof
(Proof of Theorem 2.1) Since f is convex, the subdifferential \(\partial f(\tau )\) is nonempty for all \(\tau \in (a,b)\). The claim concerning finite termination is easy to deduce from convexity; we leave the details to the reader. Suppose neither sequence terminates finitely at \(\tau _*\). Let us first consider the Newton iteration. Convexity of f immediately implies that the sequence \(\tau _i\) is well-defined and satisfies \(\tau _0<\tau _1<\tau _2<\dots < \tau _*\). Monotonicity of the subdifferential then implies \(g_0\le g_1\le g_2\le \dots \le g_*<0\). Due to the inequalities \(f(\tau _*)+\bar{g}(\tau _k-\tau _*)\le f(\tau _k)\) and \(g_k<0\), we have
and so
Upper semi-continuity of \(\partial f\) on its domain implies \(g_k\uparrow g_*\). Hence \(\tau _k\) converge q-superlinearly to \(\tau _*\).
Now consider the secant iteration. As in the Newton iteration, it is immediate from convexity that the sequence \(\tau _i\) is well-defined and satisfies \(\tau _0<\tau _1<\tau _2<\dots <\tau _*\). Monotonicity of the subdifferential then implies \(g_0\le g_1\le g_2\le \dots \le g_*<0\). We have
and \(f(\tau _{k-1})+g_{k-1}(\tau _k-\tau _{k-1})\le f(\tau _k)\), and hence
Combining the two inequalities yields
Consequently, we deduce
The result follows. \(\square \)
Proof
(Proof of Theorem 2.2) It is easy to see by convexity that the iterates \(\tau _k\) are strictly increasing and satisfy \(f(\tau _k) >0\). For each index \(j \ge 2\) for which the algorithm has not terminated, define the following quantities:
Note that using the equation \(\tau _{j-1}-\tau _j = \frac{\ell _{j-1}}{s_{j-1}}\), we can write \(\theta _j=\frac{u_{j-1} - \ell _j}{\ell _{j-1}}\). Clearly then the bound, \(0\le \theta _j\le \alpha -\gamma _j\), is valid. Define now constants \(\beta _j\in [0,1]\) by the equation \(\gamma _j = \beta _j \alpha \). Suppose \(k \ge 2\) is an index at which the algorithm has not terminated, i.e., \(u_k > \epsilon \). Taking into account the inequality \(\ell _k \ge \frac{u_k}{\alpha }> \frac{\epsilon }{\alpha }\), we deduce
The defining equation for \(\tau _{k+1}\) and the definition of \(\theta _j\) yield the equality
The bounds \(\tau _* - \tau _1 \ge h_{k+1}\), \(\ell _k \ge \frac{\epsilon }{\alpha }\), and \(\theta _j \le \alpha - \gamma _j\) imply
and rearranging gives
Combining (A.1) and (A.2), we get
One the other hand, observe
and hence
Combining Eqs. (A.4) and (A.3), the claimed estimate \( k-1 \le \log _{2/\alpha } \left( \frac{ \alpha C }{ \epsilon } \right) \) follows.
Proof
(Proof of Theorem 2.3) The proof is identical to the proof of Theorem 2.2, except for some minor modifications. The only nontrivial change is how we arrive at the bound \(\theta _j \le \alpha - \gamma _j\). For this, observe \(\tau _{j-1} - \tau _j = \ell _{j-1}/s_{j-1}\), and because the function \(\tau \mapsto \ell _j + s_j ( \tau - \tau _j)\) minorizes f, we see
After rearranging, we get the desired upper bound on \(\theta _j\):
Finally, we remark that with the approximate Newton method, we can start indexing at \(j = 0\) instead of \(j = 1\). This explains the different constants in the convergence result.
Lemma A.1
(Concavity of the parametric support function) For any convex function \(f:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}\) and vector \(z\in \mathbb {R}^n\), the univariate function \(t\mapsto \delta ^*_{[f\le t]}(z)\) is concave.
Proof
It follows from convexity of f that
where \([f\le \alpha ]\) defines the \(\alpha \)-level set of f, and the summation of the level sets indicates their Minkowski (i.e., direct) sum. Moreover, for any convex sets \(\mathcal C\) and \(\mathcal D\) such that \(\mathcal C\subseteq \mathcal D\), \(\delta ^*_{\mathcal C}\le \delta ^*_{\mathcal D}\). Thus,
which implies concavity of the function at hand..
Proof
(Proof of Proposition 3.1) For this proof only, let \(\Vert \cdot \Vert \) denote the 2-norm. Note the inclusion \(s/\Vert y\Vert \in \partial _{\tau }\varPhi _1\left( y/\Vert y\Vert ,\,\tau \right) \). Use the same computation from (2.2) to deduce that the affine function
minorizes \(f_1\).
From the definition of \(\hat{\ell }\), \(\varPhi _1\), and \(\varPhi _2\), it follows that
Taking into account the equivalence
we deduce
where the rightmost inequality follows from the computation
Because the right-hand side of (A.5) is non-negative, we can deduce that \(\hat{\ell }\ge \sigma \). Finally, the required inequality \((u-\sigma )/(\hat{\ell }-\sigma )\le \alpha \) also follows from (A.5).
Lemma A.2
\((-\lambda _{\min })^{\star }(y) = \delta _\mathcal {S}(-y)\), where \(\mathcal {S}= {\mathcal {K}}^{*} \cap \left\{ x \mid {{\langle }e},{x{\rangle }} = 1 \right\} \).
Proof
The following formula is established in [49]:
or equivalently
Here the symbol \(N_{\mathcal {K}}\) denotes the normal cone to \(\mathcal {K}\). Now for any \(y\in \partial (-\lambda _{\min }) ( x )\), we have \(\langle x,y\rangle =-\lambda _{\min }(x)\). Observe \(\mathrm {range}\,\partial (-\lambda _{\min })=-\mathcal {S}\). Hence by the equality in the Fenchel-Young inequality, for any \(y\in -\mathcal {S}\), we have \((-\lambda _{\min })^{\star }(y)=0\). On the other hand, for any y with \(\langle y,e\rangle \ne -1\), we have \((-\lambda _{\min })^{\star }(y)\ge \langle te,y\rangle -(-\lambda _{\min })(te)=t(\langle y,e\rangle +1)\) for any \(t \ge 0\). Letting \(t\rightarrow \infty \), we deduce \((-\lambda _{\min })^{\star }(y)=+\infty \). Similarly, consider \(y\notin -\mathcal {K}^*\). Then we may find some \(x\in \mathcal {K}\) satisfying \(\langle x,y\rangle >0\). We deduce \((-\lambda _{\min })^{\star }(y)\ge \langle tx,y\rangle -(-\lambda _{\min })(tx)=t(\langle y,x\rangle -(-\lambda _{\min })(x))\) for any \(t \ge 0\). Letting \(t\rightarrow \infty \), we deduce \((-\lambda _{\min })^{\star }(y)=+\infty \). We deduce that \((-\lambda _{\min })^{\star }\) is the indicator function of \(-\mathcal {S}\), as claimed. \(\square \)
Remark A.1
(Projection onto a conic slice sets) This remark is standard. Fix a proper convex cone \({\mathcal {K}}\) and consider the projection problem
Equivalently, we can consider the univariate concave maximization problem
We can solve this problem for example by bisection, provided projections onto \({\mathcal {K}}\) are available.
Rights and permissions
About this article
Cite this article
Aravkin, A.Y., Burke, J.V., Drusvyatskiy, D. et al. Level-set methods for convex optimization. Math. Program. 174, 359–390 (2019). https://doi.org/10.1007/s10107-018-1351-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1351-8