Skip to main content

Projection-Free Methods

  • Chapter
  • First Online:

Part of the book series: Springer Series in the Data Sciences ((SSDS))

Abstract

In this chapter, we present conditional gradient type methods that have attracted much attention in both machine learning and optimization community recently. These methods call a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. We will introduce the classic conditional gradient (a.k.a. Frank–Wolfe method) and a few of its variants. We will also discuss the conditional gradient sliding (CGS) algorithm which can skip the computation of gradients from time to time, and as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the LO oracle, but also the number of gradient evaluations. Extension of these methods for solving nonconvex optimization problems will also be discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. S.D. Ahipasaoglu, M.J. Todd, A modified Frank-Wolfe algorithm for computing minimum-area enclosing ellipsoidal cylinders: theory and algorithms. Comput. Geom. 46, 494–519 (2013)

    Article  MathSciNet  Google Scholar 

  2. F. Bach, S. Lacoste-Julien, G. Obozinski, On the equivalence between herding and conditional gradient algorithms, in The 29th International Conference on Machine Learning (2012)

    Google Scholar 

  3. A. Beck, M. Teboulle, A conditional gradient method with linear rate of convergence for solving convex linear systems. Math. Methods Oper. Res. 59, 235–247 (2004)

    Article  MathSciNet  Google Scholar 

  4. D.P. Bertsekas, Stochastic optimization problems with nondifferentiable cost functionals. J. Optim. Theory Appl. 12, 218–231 (1973)

    Article  MathSciNet  Google Scholar 

  5. K.L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4), 63:1–63:30 (2010)

    Google Scholar 

  6. B. Cox, A. Juditsky, A.S. Nemirovski, Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2014). Manuscript, School of ISyE, Georgia Tech, Atlanta

    Google Scholar 

  7. J.C. Duchi, P.L. Bartlett, M.J. Wainwright, Randomized smoothing for stochastic optimization. SIAM J. Optim. 22, 674–701 (2012)

    Article  MathSciNet  Google Scholar 

  8. M. Frank, P. Wolfe, An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  9. R.M. Freund, P. Grigas, New analysis and results for the Frank-Wolfe method. Math. Program. 155, 199–230 (2016)

    Article  MathSciNet  Google Scholar 

  10. A. Gonen S. Shalev-Shwartz, O. Shamir, Large-scale convex minimization with a low rank constraint, in The 28th International Conference on Machine Learning (2011)

    Google Scholar 

  11. Z. Harchaoui, A. Juditsky, A.S. Nemirovski, Conditional gradient algorithms for machine learning, in NIPS OPT Workshop (2012)

    Google Scholar 

  12. E. Hazan, Sparse approximate solutions to semidefinite programs, in LATIN 2008: Theoretical Informatics, ed. by E. Laber, C. Bornstein, L.T. Nogueira, L. Faria. Lecture Notes in Computer Science, vol. 4957 (Springer, Berlin, 2008), pp. 306–316

    Google Scholar 

  13. M. Jaggi, Sparse convex optimization methods for machine learning. PhD thesis, ETH Zürich, 2011. https://doi.org/10.3929/ethz-a-007050453

  14. M. Jaggi, M. Sulovský, A simple algorithm for nuclear norm regularized problems, in The 27th International Conference on Machine Learning (2010)

    Google Scholar 

  15. B. Jiang, T. Lin, S. Ma, S. Zhang, Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)

    Article  MathSciNet  Google Scholar 

  16. V. Katkovnik, Y. Kulchitsky, Convergence of a class of random search algorithms. Autom. Remote Control 33, 1321–1326 (1972)

    MathSciNet  MATH  Google Scholar 

  17. G. Lan, The complexity of large-scale convex programming under a linear optimization oracle. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013. Available on http://www.optimization-online.org/

  18. G. Lan, Y. Zhou, Conditional gradient sliding for convex optimization. Technical report, Technical Report, 2014

    Google Scholar 

  19. R. Luss, M. Teboulle, Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)

    Article  MathSciNet  Google Scholar 

  20. Y.E. Nesterov, Random gradient-free minimization of convex functions. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, January 2010

    Google Scholar 

  21. C. Qu, Y. Li, H. Xu, Non-convex conditional gradient sliding, in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80 (2018), pp. 4208–4217

    Google Scholar 

  22. S.J. Reddi, S. Sra, B. Poczos, A. Smola, Stochastic Frank-Wolfe methods for nonconvex optimization (2016). Preprint. arXiv: 1607.08254

    Google Scholar 

  23. R.Y. Rubinstein, Simulation and the Monte Carlo Method (Wiley, New York, 1981)

    Book  Google Scholar 

  24. C. Shen, J. Kim, L. Wang, A. van den Hengel, Positive semidefinite metric learning using boosting-like algorithms. J. Mach. Learn. Res. 13, 1007–1036 (2012)

    MathSciNet  MATH  Google Scholar 

  25. Z. Shen, C. Fang, P. Zhao, J. Huang, H. Qian, Complexities in projection-free stochastic non-convex minimization, in Proceedings of Machine Learning Research, PMLR 89, vol. 89 (2019), pp. 2868–2876

    Google Scholar 

  26. M. Jaggi, Revisiting Frank-Wolfe: projection-free sparse convex optimization, in The 30th International Conference on Machine Learning (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lan, G. (2020). Projection-Free Methods. In: First-order and Stochastic Optimization Methods for Machine Learning. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39568-1_7

Download citation

Publish with us

Policies and ethics