A Stable Alternative to Sinkhorn’s Algorithm for Regularized Optimal Transport

  • Pavel DvurechenskyEmail author
  • Alexander Gasnikov
  • Sergey Omelchenko
  • Alexander Tiurin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12095)


In this paper, we are motivated by two important applications: entropy-regularized optimal transport problem and road or IP traffic demand matrix estimation by entropy model. Both of them include solving a special type of optimization problem with linear equality constraints and objective given as a sum of an entropy regularizer and a linear function. It is known that the state-of-the-art solvers for this problem, which are based on Sinkhorn’s method (also known as RSA or balancing method), can fail to work, when the entropy-regularization parameter is small. We consider the above optimization problem as a particular instance of a general strongly convex optimization problem with linear constraints. We propose a new algorithm to solve this general class of problems. Our approach is based on the transition to the dual problem. First, we introduce a new accelerated gradient method with adaptive choice of gradient’s Lipschitz constant. Then, we apply this method to the dual problem and show, how to reconstruct an approximate solution to the primal problem with provable convergence rate. We prove the rate \(O(1/k^2)\), k being the iteration counter, both for the absolute value of the primal objective residual and constraints infeasibility. Our method has similar to Sinkhorn’s method complexity of each iteration, but is faster and more stable numerically, when the regularization parameter is small. We illustrate the advantage of our method by numerical experiments for the two mentioned applications. We show that there exists a threshold, such that, when the regularization parameter is smaller than this threshold, our method outperforms the Sinkhorn’s method in terms of computation time.


Smooth convex optimization Linear constraints First-order methods Accelerated gradient descent Algorithm complexity Entropy-linear programming Dual problem Primal-dual method Sinkhorn’s fixed point algorithm Entropy-regularized optimal transport Traffic demand matrix estimation 

Supplementary material

495544_1_En_28_MOESM1_ESM.pdf (149 kb)
Supplementary material 1 (pdf 149 KB)


  1. 1.
    Allen-Zhu, Z., Li, Y., Oliveira, R., Wigderson, A.: Much faster algorithms for matrix scaling. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 890–901 (2017). arXiv:1704.02315
  2. 2.
    Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1961–1971. Curran Associates, Inc. (2017). arXiv:1705.09634
  3. 3.
    Anikin, A.S., Gasnikov, A.V., Dvurechensky, P.E., Tyurin, A.I., Chernov, A.V.: Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints. Comput. Math. Math. Phys. 57(8), 1262–1276 (2017)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Baimurzina, D.R., et al.: Universal method of searching for equilibria and stochastic equilibria in transportation networks. Comput. Math. Math. Phys. 59(1), 19–33 (2019). arXiv:1701.02473
  5. 5.
    Beck, A., Teboulle, M.: A fast dual proximal gradient algorithm for convex minimization and applications. Oper. Res. Lett. 42(1), 1–6 (2014)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)Google Scholar
  7. 7.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)zbMATHGoogle Scholar
  8. 8.
    Bregman, L.: Proof of the convergence of Sheleikhovskii’s method for a problem with transportation constraints. USSR Comput. Math. Math. Phys. 7(1), 191–204 (1967)Google Scholar
  9. 9.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Chernov, A., Dvurechensky, P., Gasnikov, A.: Fast primal-dual gradient method for strongly convex minimization problems with linear constraints. In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 391–403. Springer, Cham (2016). Scholar
  11. 11.
    Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling algorithms for unbalanced optimal transport problems. Math. Comput. 87(314), 2563–2609 (2018). arXiv:1607.05816
  12. 12.
    Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)Google Scholar
  13. 13.
    Cuturi, M., Peyré, G.: A smoothed dual approach for variational Wasserstein problems. SIAM J. Imaging Sci. 9(1), 320–343 (2016)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Dünner, C., Forte, S., Takáč, M., Jaggi, M.: Primal-dual rates and certificates. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48. pp. 783–792. (2016)Google Scholar
  15. 15.
    Dvinskikh, D., Gorbunov, E., Gasnikov, A., Dvurechensky, P., Uribe, C.A.: On primal and dual approaches for distributed stochastic convex optimization over networks. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 7435–7440 (2019). arXiv:1903.09844
  16. 16.
    Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, NeurIPS 2018, pp. 10783–10793. Curran Associates, Inc. (2018). arXiv:1806.03915
  17. 17.
    Dvurechensky, P., Gasnikov, A., Gasnikova, E., Matsievsky, S., Rodomanov, A., Usik, I.: Primal-dual method for searching equilibrium in hierarchical congestion population games. In: Supplementary Proceedings of the 9th International Conference on Discrete Optimization and Operations Research and Scientific School (DOOR 2016) Vladivostok, Russia, 19–23 September 2016, pp. 584–595 (2016). arXiv:1606.08988
  18. 18.
    Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
  19. 19.
    Dvurechensky, P., Nesterov, Y., Spokoiny, V.: Primal-dual methods for solving infinite-dimensional games. J. Optim. Theory Appl. 166(1), 23–51 (2015)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Fang, S.-C., Rajasekera, J. R., Tsao, H.-S. J.: Entropy Optimization and Mathematical Programming. Kluwer’ International Series. Springer, Boston (1997)Google Scholar
  21. 21.
    Franklin, J., Lorenz, J.: On the scaling of multidimensional matrices. Linear Algebra Appl. 114, 717–735 (1989). Special Issue Dedicated to Alan J. HoffmanMathSciNetzbMATHGoogle Scholar
  22. 22.
    Gasnikov, A.V., Gasnikova, E.V., Nesterov, Y.E., Chernov, A.V.: Efficient numerical methods for entropy-linear programming problems. Comput. Math. Math. Phys. 56(4), 514–524 (2016)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Gasnikov, A., Gasnikova, E., Mendel, M., Chepurchenko, K.: Evolutionary derivations of entropy model for traffic demand matrix calculation. Matematicheskoe Modelirovanie 28(4), 111–124 (2016). (in Russian)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Golan, A., Judge, G., Miller, D.: Maximum Entropy Econometrics: Robust Estimation with Limited Data. Wiley, Chichester (1996)zbMATHGoogle Scholar
  25. 25.
    Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Guminov, S.V., Nesterov, Y.E., Dvurechensky, P.E., Gasnikov, A.V.: Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Dokl. Math. 99(2), 125–128 (2019)zbMATHGoogle Scholar
  27. 27.
    Guminov, S., Dvurechensky, P., Tupitsa, N., Gasnikov, A.: Accelerated alternating minimization, accelerated Sinkhorn’s algorithm and accelerated Iterative Bregman Projections (2019). arXiv:1906.03622
  28. 28.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001). Scholar
  29. 29.
    Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Kalantari, B., Khachiyan, L.: On the rate of convergence of deterministic and randomized RAS matrix scaling algorithms. Oper. Res. Lett. 14(5), 237–244 (1993)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Kantorovich, L.: On the translocation of masses. Doklady Acad. Sci. USSR (N.S.) 37, 199–201 (1942)Google Scholar
  32. 32.
    Kapur, J.: Maximum – Entropy Models in Science and Engineering. Wiley, New York (1989)zbMATHGoogle Scholar
  33. 33.
    Kroshnin, A., Tupitsa, N., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Uribe, C.: On the complexity of approximating Wasserstein barycenters. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, 09–15 June 2019, vol. 97, pp. 3530–3540. PMLR (2019). arXiv:1901.08686
  34. 34.
    Li, J., Wu, Z., Wu, C., Long, Q., Wang, X.: An inexact dual fast gradient-projection method for separable convex optimization with linear coupled constraints. J. Optim. Theory Appl. 168(1), 153–171 (2016)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Lin, T., Ho, N., Jordan, M.: On efficient optimal transport: an analysis of greedy and accelerated mirror descent algorithms. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, 09–15 June 2019, vol. 97, pp. 3982–3991. PMLR (2019)Google Scholar
  36. 36.
    Malitsky, Y., Pock, T.: A first-order primal-dual algorithm with linesearch. SIAM J. Optim. 28(1), 411–432 (2018)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)zbMATHGoogle Scholar
  38. 38.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw., 1–28 (2020). arXiv:1809.05895
  40. 40.
    Ogaltsov, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Spokoiny, V.: Adaptive gradient descent for convex and non-convex stochastic optimization (2019). arXiv:1911.08380
  41. 41.
    Ouyang, Y., Chen, Y., Lan, G., Eduardo Pasiliao, J.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Patrascu, A., Necoara, I., Findeisen, R.: Rate of convergence analysis of a dual fast gradient method for general convex optimization. In: 2015 54th IEEE Conference on Decision and Control (CDC), pp. 3311–3316 (2015)Google Scholar
  43. 43.
    Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: Precup, A., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, International Convention Centre, Sydney, Australia, 06–11 August 2017, pp. 3027–3036. PMLR (2017)Google Scholar
  44. 44.
    Schmitzer, B.: Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM J. Sci. Comput. 41(3), A1443–A1481 (2019). arXiv:1610.06519
  45. 45.
    Shvetsov, V.I.: Mathematical modeling of traffic flows. Autom. Remote Control 64(11), 1651–1689 (2003)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. II. Proc. Am. Math. Soc. 45, 195–198 (1974)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Stonyakin, F.S., et al.: Gradient methods for problems with inexact model of the objective. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds.) MOTOR 2019. LNCS, vol. 11548, pp. 97–114. Springer, Cham (2019). arXiv:1902.09001Google Scholar
  48. 48.
    Tran-Dinh, Q., Cevher, V.: Constrained convex minimization via model-based excessive gap. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS 2014, pp. 721–729. MIT Press, Cambridge (2014)Google Scholar
  49. 49.
    Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018). arXiv:1507.06243
  50. 50.
    Tupitsa, N., Dvurechensky, P., Gasnikov, A., Uribe, C.A.: Multimarginal optimal transport by accelerated gradient descent (2020). arXiv:2004.02294
  51. 51.
    Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6544–6549 (2018). arXiv:1803.02933
  52. 52.
    Wilson, A.: Entropy in Urban and Regional Modelling. Monographs in Spatial and Environmental Systems Analysis. Routledge, Abingdon (2011)Google Scholar
  53. 53.
    Yurtsever, A., Tran-Dinh, Q., Cevher, V.: A universal primal-dual convex optimization framework. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, pp. 3150–3158. MIT Press, Cambridge (2015)Google Scholar
  54. 54.
    Zhang, Y., Roughan, M., Lund, C., Donoho, D.L.: Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach. IEEE/ACM Trans. Netw. 13(5), 947–960 (2005)zbMATHGoogle Scholar
  55. 55.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Weierstrass Institute for Applied Analysis and StochasticsBerlinGermany
  2. 2.Moscow Institute of Physics and TechnologyMoscowRussia
  3. 3.Institute for Information Transmission Problems RASMoscowRussia
  4. 4.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations