Abstract
In this chapter, we present conditional gradient type methods that have attracted much attention in both machine learning and optimization community recently. These methods call a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. We will introduce the classic conditional gradient (a.k.a. Frank–Wolfe method) and a few of its variants. We will also discuss the conditional gradient sliding (CGS) algorithm which can skip the computation of gradients from time to time, and as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the LO oracle, but also the number of gradient evaluations. Extension of these methods for solving nonconvex optimization problems will also be discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
S.D. Ahipasaoglu, M.J. Todd, A modified Frank-Wolfe algorithm for computing minimum-area enclosing ellipsoidal cylinders: theory and algorithms. Comput. Geom. 46, 494–519 (2013)
F. Bach, S. Lacoste-Julien, G. Obozinski, On the equivalence between herding and conditional gradient algorithms, in The 29th International Conference on Machine Learning (2012)
A. Beck, M. Teboulle, A conditional gradient method with linear rate of convergence for solving convex linear systems. Math. Methods Oper. Res. 59, 235–247 (2004)
D.P. Bertsekas, Stochastic optimization problems with nondifferentiable cost functionals. J. Optim. Theory Appl. 12, 218–231 (1973)
K.L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4), 63:1–63:30 (2010)
B. Cox, A. Juditsky, A.S. Nemirovski, Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2014). Manuscript, School of ISyE, Georgia Tech, Atlanta
J.C. Duchi, P.L. Bartlett, M.J. Wainwright, Randomized smoothing for stochastic optimization. SIAM J. Optim. 22, 674–701 (2012)
M. Frank, P. Wolfe, An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)
R.M. Freund, P. Grigas, New analysis and results for the Frank-Wolfe method. Math. Program. 155, 199–230 (2016)
A. Gonen S. Shalev-Shwartz, O. Shamir, Large-scale convex minimization with a low rank constraint, in The 28th International Conference on Machine Learning (2011)
Z. Harchaoui, A. Juditsky, A.S. Nemirovski, Conditional gradient algorithms for machine learning, in NIPS OPT Workshop (2012)
E. Hazan, Sparse approximate solutions to semidefinite programs, in LATIN 2008: Theoretical Informatics, ed. by E. Laber, C. Bornstein, L.T. Nogueira, L. Faria. Lecture Notes in Computer Science, vol. 4957 (Springer, Berlin, 2008), pp. 306–316
M. Jaggi, Sparse convex optimization methods for machine learning. PhD thesis, ETH Zürich, 2011. https://doi.org/10.3929/ethz-a-007050453
M. Jaggi, M. Sulovský, A simple algorithm for nuclear norm regularized problems, in The 27th International Conference on Machine Learning (2010)
B. Jiang, T. Lin, S. Ma, S. Zhang, Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)
V. Katkovnik, Y. Kulchitsky, Convergence of a class of random search algorithms. Autom. Remote Control 33, 1321–1326 (1972)
G. Lan, The complexity of large-scale convex programming under a linear optimization oracle. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013. Available on http://www.optimization-online.org/
G. Lan, Y. Zhou, Conditional gradient sliding for convex optimization. Technical report, Technical Report, 2014
R. Luss, M. Teboulle, Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)
Y.E. Nesterov, Random gradient-free minimization of convex functions. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, January 2010
C. Qu, Y. Li, H. Xu, Non-convex conditional gradient sliding, in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80 (2018), pp. 4208–4217
S.J. Reddi, S. Sra, B. Poczos, A. Smola, Stochastic Frank-Wolfe methods for nonconvex optimization (2016). Preprint. arXiv: 1607.08254
R.Y. Rubinstein, Simulation and the Monte Carlo Method (Wiley, New York, 1981)
C. Shen, J. Kim, L. Wang, A. van den Hengel, Positive semidefinite metric learning using boosting-like algorithms. J. Mach. Learn. Res. 13, 1007–1036 (2012)
Z. Shen, C. Fang, P. Zhao, J. Huang, H. Qian, Complexities in projection-free stochastic non-convex minimization, in Proceedings of Machine Learning Research, PMLR 89, vol. 89 (2019), pp. 2868–2876
M. Jaggi, Revisiting Frank-Wolfe: projection-free sparse convex optimization, in The 30th International Conference on Machine Learning (2013)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Lan, G. (2020). Projection-Free Methods. In: First-order and Stochastic Optimization Methods for Machine Learning. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39568-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-39568-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39567-4
Online ISBN: 978-3-030-39568-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)