Advertisement

Gradient Methods for Problems with Inexact Model of the Objective

  • Fedor S. StonyakinEmail author
  • Darina Dvinskikh
  • Pavel Dvurechensky
  • Alexey Kroshnin
  • Olesya Kuznetsova
  • Artem Agafonov
  • Alexander Gasnikov
  • Alexander Tyurin
  • César A. Uribe
  • Dmitry Pasechnyuk
  • Sergei Artamonov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11548)

Abstract

We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [16] and relative smoothness condition [36]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [41]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.

Keywords

Gradient method Inexact oracle Strong convexity Relative smoothness Bregman divergence 

Notes

Acknowledgments

The work in Sects. 4 and 5 was funded by Russian Science Foundation (project 18-71-10108). The work in Subsect. 2.1 and Sect. 3 was supported by Russian Foundation for Basic Research 18-31-20005 mol\(\_\)a\(\_\)ved. The work of F.  Stonyakin on Algorithm 2 and Theorem 2 was supported by Russian Science Foundation (project 18-71-00048). The work of A. Gasnikov in Sect. 2 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of A. Kroshnin in Sect. 3 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of S. Artamonov in Sect. 3 was supported by Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2019–2020 (grant No 19-01-024) and by the Russian Academic Excellence Project “5-100”.

References

  1. 1.
    Altschuler, J., Bach, F., Rudi, A., Weed, J.: Approximating the quadratic transportation metric in near-linear time. arXiv:1810.10046 (2018)
  2. 2.
    Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approxfimation algorithms for optimal transport via sinkhorn iteration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1961–1971. Curran Associates, Inc. (2017)Google Scholar
  3. 3.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 (2017)
  4. 4.
    Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization (lecture notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
  5. 5.
    Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bigot, J., Klein, T., et al.: Consistent estimation of a population barycenter in the Wasserstein space. arXiv:1212.2562 (2012)
  7. 7.
    Blanchet, J., Jambulapati, A., Kent, C., Sidford, A.: Towards optimal running times for optimal transport. arXiv:1810.07717 (2018)
  8. 8.
    Bogolubsky, L., et al.: Learning supervised PageRank with gradient-based and gradient-free optimization methods. In: NIPS 2016 (2016). http://papers.nips.cc/paper/6565-learning-supervised-pagerank-with-gradient-based-and-gradient-free-optimization-methods.pdf
  9. 9.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. arXiv:1708.04044 (2018)
  10. 10.
    Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)
  12. 12.
    Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)Google Scholar
  13. 13.
    Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, Bejing, China, 22–24 June 2014, pp. 685–693. PMLR (2014). http://proceedings.mlr.press/v32/cuturi14.html
  14. 14.
    d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008).  https://doi.org/10.1137/060676386MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Del Barrio, E., Lescornel, H., Loubes, J.M.: A statistical analysis of a deformation model with Wasserstein barycenters: estimation procedure and goodness of fit test. arXiv:1508.06465 (2015)
  16. 16.
    Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014).  https://doi.org/10.1007/s10107-013-0677-5MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Devolder, O., Glineur, F., Nesterov, Y., et al.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016 (2013)Google Scholar
  18. 18.
    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. arXiv:1610.03446 (2016)
  19. 19.
    Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization. arXiv:1703.09180 (2017)
  20. 20.
    Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10793. NeurIPS 2018, Curran Associates, Inc. (2018). arXiv:1802.04367
  21. 21.
    Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016).  https://doi.org/10.1007/s10957-016-0999-6MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)
  23. 23.
    Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)
  24. 24.
    Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. arXiv:1712.06036 (2017)
  25. 25.
    Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
  26. 26.
    Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Generalized Mirror Prox: Solving variational inequalities with monotone operator, inexact oracle, and unknown Hölder parameters (2018). https://arxiv.org/abs/1806.05140
  27. 27.
    Dvurechensky, P., Gasnikov, A., Tiurin, A.: Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). https://arxiv.org/abs/1707.08486
  28. 28.
    Ebert, J., Spokoiny, V., Suvorikova, A.: Construction of non-asymptotic confidence sets in 2-Wasserstein space (2017). https://arxiv.org/abs/1703.03658
  29. 29.
    Gasnikov, A.: Universal gradient descent (2017). https://arxiv.org/abs/1711.00394
  30. 30.
    Gasnikov, A., et al.: Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems (2015). https://arxiv.org/abs/1506.00292
  31. 31.
    Kantorovich, L.: On the translocation of masses. Doklady Acad. Sci. USSR (N.S.) 37(7–8), 227–229 (1942)MathSciNetGoogle Scholar
  32. 32.
    Kroshnin, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Tupitsa, N., Uribe, C.: On the complexity of approximating Wasserstein barycenter (2019). https://arxiv.org/abs/1901.08686
  33. 33.
    Kroshnin, A., Spokoiny, V., Suvorikova, A.: Statistical inference for bures-Wasserstein barycenters (2019). https://arxiv.org/abs/1901.00226
  34. 34.
    Le Gouic, T., Loubes, J.M.: Existence and consistency of Wasserstein barycenters. Probab. Theory Relat. Fields 168(3–4), 901–917 (2017)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in o (vrank) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science Foundations of Computer Science (FOCS), pp. 424–433 (2014)Google Scholar
  36. 36.
    Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)Google Scholar
  38. 38.
    Monge, G.: Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)Google Scholar
  39. 39.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)CrossRefGoogle Scholar
  40. 40.
    Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. CORE Discussion Papers 2018005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), March 2018. https://ideas.repec.org/p/cor/louvco/2018005.html
  41. 41.
    Nesterov, Y.: Soft clustering by convex electoral model. CORE Discussion Papers 2018001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), January 2018. https://ideas.repec.org/p/cor/louvco/2018001.html
  42. 42.
    Nesterov, Y., Polyak, B.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2019)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 460–467 (2009)Google Scholar
  45. 45.
    Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)CrossRefGoogle Scholar
  46. 46.
    Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)zbMATHGoogle Scholar
  47. 47.
    Quanrud, K.: Approximating optimal transport with linear programs. In: 2nd Symposium on Simplicity in Algorithms (SOSA 2019), vol. 69, pp. 6:1–6:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)Google Scholar
  48. 48.
    Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems (2016). https://arxiv.org/abs/1610.06519
  49. 49.
    Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. II. Proc. Amer. Math. Soc. 45(2), 195–198 (1974)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Solomon, J., Rustamov, R.M., Guibas, L., Butscher, A.: wasserstein propagation for semi-supervised learning. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. 306–314. PMLR (2014)Google Scholar
  51. 51.
    Stonyakin, F., et al.: Gradient methods for problems with inexact model of the objective. arXiv:1902.09001 (2019)
  52. 52.
    Stonyakin, F., et al.: Inexact Model: A Framework for Optimization and Variational Inequalities (2019). https://arxiv.org/abs/1902.00990
  53. 53.
    Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a \((\delta , {L}) \)-model of a function in a requested point. Comput. Math. Math. Phys. (2019, accepted). https://arxiv.org/abs/1711.02747
  55. 55.
    Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6544–6549 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.V.I. Vernadsky Crimean Federal UniversitySimferopolRussia
  2. 2.Weierstrass Institute for Applied Analysis and StochasticsBerlinGermany
  3. 3.Institute for Information Transmission Problems RASMoscowRussia
  4. 4.Moscow Institute of Physics and TechnologiesMoscowRussia
  5. 5.National Research University Higher School of EconomicsMoscowRussia
  6. 6.Massachusetts Institute of TechnologyCambridgeUSA
  7. 7.239-th School of St. PetersburgSaint PetersburgRussia

Personalised recommendations