Frank-Wolfe Style Algorithms for Large Scale Optimization

Ding, Lijun; Udell, Madeleine

doi:10.1007/978-3-319-97478-1_9

Lijun Ding¹⁴ &
Madeleine Udell¹⁴

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2227))

2141 Accesses
2 Citations

Abstract

We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $\mathcal {O}\left (\frac {1}{k}\right )$.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

V. Chandrasekaran, B. Recht, P.A. Parrilo, A.S. Willsky, The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)
Article MathSciNet Google Scholar
R.M. Freund, P. Grigas, R. Mazumder, An extended Frank-Wolfe method with “in- face” directions, and its application to low-rank matrix completion. SIAM J. Optim. 27(1), 319–346 (2017)
Article MathSciNet Google Scholar
E. Hazan, Sparse approximate solutions to semidefinite programs. Lect. Notes Comput. Sci. 4957, 306–316 (2008)
Article MathSciNet Google Scholar
E. Hazan, H. Luo, Variance-reduced and projection-free stochastic optimization, in International Conference on Machine Learning (2016), pp. 1263–1271
Google Scholar
M. Jaggi, Revisiting Frank-Wolfe: projection-free sparse convex optimization, in Proceedings of the 30th International Conference on Machine Learning ICML (1) (2013), pp. 427–435
Google Scholar
R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems (2013), pp. 315–323
Google Scholar
J. Kuczynóski, H. Wozóniakowski, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13(4), 1094–1122 (1992)
Article MathSciNet Google Scholar
R.B. Lehoucq, D.C. Sorensen, C. Yang, ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods (SIAM, Philadelphia, 1998)
Book Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, New York, 2013)
MATH Google Scholar
J.A. Tropp, A. Yurtsever, M. Udell, V. Cevher, Randomized single-view algorithms for low-rank matrix approximation (2016). arXiv preprint arXiv:1609.00048
Google Scholar
A. Yurtsever, M. Udell, J.A. Tropp, V. Cevher, Sketchy decisions: convex low-rank matrix optimization with optimal storage (2017). arXiv preprint arXiv:1702.06838
Google Scholar

Download references

Acknowledgements

This work was supported by DARPA Award FA8750-17-2-0101. The authors are grateful for helpful discussions with Joel Tropp, Volkan Cevher, and Alp Yurtsever.

Author information

Authors and Affiliations

Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
Lijun Ding & Madeleine Udell

Authors

Lijun Ding
View author publications
You can also search for this author in PubMed Google Scholar
Madeleine Udell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madeleine Udell .

Editor information

Editors and Affiliations

Department of Automatic Control, Lund University, Lund, Sweden
Pontus Giselsson
Department of Automatic Control, Lund University, Lund, Sweden
Anders Rantzer

Appendix

We prove the following simple proposition about L-smooth functions used in Sect. 9.2.

Proposition 3

If f is a real valued differentiable convex function with domain R ⁿ and satisfies ∥∇f(x) −∇f(y)∥≤ L∥x − y∥, then for all x, y ∈R ⁿ ,

$$\displaystyle \begin{aligned}f(x)\leq f(y) + \nabla f(y)^T (x-y)+\frac{L}{2}\|x-y\|{}^2.\end{aligned}$$

Proof

The inequality follows from the following computation:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x) - f(y) -\nabla f(y)^T(x-y)&\displaystyle =&\displaystyle \int_0^1 \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle -\left.\nabla f(y)\right)^T(x-y)dt \\ &\displaystyle \leq&\displaystyle \int_0^1 \| \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle -\left.\nabla f(y)\right)^T(x-y)\|dt \\ &\displaystyle \leq&\displaystyle \int_0^1 \| \left(\nabla f(y+t(x-y))\right.\\ &\displaystyle &\displaystyle \left.-\nabla f(y)\right)\|\|(x-y)\|dt \\ &\displaystyle \leq&\displaystyle \int_0^1 Lt\|x-y\|{}^2dt\\ &\displaystyle =&\displaystyle \frac{L}{2}\|x-y\|{}^2. \end{array} \end{aligned} $$

(9.26)

□

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ding, L., Udell, M. (2018). Frank-Wolfe Style Algorithms for Large Scale Optimization. In: Giselsson, P., Rantzer, A. (eds) Large-Scale and Distributed Optimization. Lecture Notes in Mathematics, vol 2227. Springer, Cham. https://doi.org/10.1007/978-3-319-97478-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-97478-1_9
Published: 12 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97477-4
Online ISBN: 978-3-319-97478-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Frank-Wolfe Style Algorithms for Large Scale Optimization

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proposition 3

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation