Skip to main content

Frank-Wolfe Style Algorithms for Large Scale Optimization

  • Chapter
  • First Online:
Large-Scale and Distributed Optimization

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2227))

Abstract

We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate \(\mathcal {O}\left (\frac {1}{k}\right )\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. V. Chandrasekaran, B. Recht, P.A. Parrilo, A.S. Willsky, The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)

    Article  MathSciNet  Google Scholar 

  2. R.M. Freund, P. Grigas, R. Mazumder, An extended Frank-Wolfe method with “in- face” directions, and its application to low-rank matrix completion. SIAM J. Optim. 27(1), 319–346 (2017)

    Article  MathSciNet  Google Scholar 

  3. E. Hazan, Sparse approximate solutions to semidefinite programs. Lect. Notes Comput. Sci. 4957, 306–316 (2008)

    Article  MathSciNet  Google Scholar 

  4. E. Hazan, H. Luo, Variance-reduced and projection-free stochastic optimization, in International Conference on Machine Learning (2016), pp. 1263–1271

    Google Scholar 

  5. M. Jaggi, Revisiting Frank-Wolfe: projection-free sparse convex optimization, in Proceedings of the 30th International Conference on Machine Learning ICML (1) (2013), pp. 427–435

    Google Scholar 

  6. R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems (2013), pp. 315–323

    Google Scholar 

  7. J. Kuczynóski, H. Wozóniakowski, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13(4), 1094–1122 (1992)

    Article  MathSciNet  Google Scholar 

  8. R.B. Lehoucq, D.C. Sorensen, C. Yang, ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods (SIAM, Philadelphia, 1998)

    Book  Google Scholar 

  9. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, New York, 2013)

    MATH  Google Scholar 

  10. J.A. Tropp, A. Yurtsever, M. Udell, V. Cevher, Randomized single-view algorithms for low-rank matrix approximation (2016). arXiv preprint arXiv:1609.00048

    Google Scholar 

  11. A. Yurtsever, M. Udell, J.A. Tropp, V. Cevher, Sketchy decisions: convex low-rank matrix optimization with optimal storage (2017). arXiv preprint arXiv:1702.06838

    Google Scholar 

Download references

Acknowledgements

This work was supported by DARPA Award FA8750-17-2-0101. The authors are grateful for helpful discussions with Joel Tropp, Volkan Cevher, and Alp Yurtsever.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madeleine Udell .

Editor information

Editors and Affiliations

Appendix

Appendix

We prove the following simple proposition about L-smooth functions used in Sect. 9.2.

Proposition 3

If f is a real valued differentiable convex function with domain R n and satisfies ∥∇f(x) −∇f(y)∥≤ Lx  y, then for all x, y R n ,

$$\displaystyle \begin{aligned}f(x)\leq f(y) + \nabla f(y)^T (x-y)+\frac{L}{2}\|x-y\|{}^2.\end{aligned}$$

Proof

The inequality follows from the following computation:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x) - f(y) -\nabla f(y)^T(x-y)&\displaystyle =&\displaystyle \int_0^1 \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle -\left.\nabla f(y)\right)^T(x-y)dt \\ &\displaystyle \leq&\displaystyle \int_0^1 \| \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle -\left.\nabla f(y)\right)^T(x-y)\|dt \\ &\displaystyle \leq&\displaystyle \int_0^1 \| \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle \left.-\nabla f(y)\right)\|\|(x-y)\|dt \\ &\displaystyle \leq&\displaystyle \int_0^1 Lt\|x-y\|{}^2dt\\ &\displaystyle =&\displaystyle \frac{L}{2}\|x-y\|{}^2. \end{array} \end{aligned} $$
(9.26)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ding, L., Udell, M. (2018). Frank-Wolfe Style Algorithms for Large Scale Optimization. In: Giselsson, P., Rantzer, A. (eds) Large-Scale and Distributed Optimization. Lecture Notes in Mathematics, vol 2227. Springer, Cham. https://doi.org/10.1007/978-3-319-97478-1_9

Download citation

Publish with us

Policies and ethics