Communication-efficient algorithms for decentralized and stochastic optimization

  • Guanghui LanEmail author
  • Soomin Lee
  • Yi Zhou
Full Length Paper Series A


We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. Our major contribution is to present a new class of decentralized primal–dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can find an \(\epsilon \)-solution both in terms of functional optimality gap and feasibility residual in \({{\mathcal {O}}}(1/\epsilon )\) (resp., \({{\mathcal {O}}}(1/\sqrt{\epsilon })\)) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the \({{\mathcal {O}}}(1/\epsilon ^2)\) (resp., \(\mathcal{O}(1/\epsilon )\)) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the (stochastic) subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization under certain conditions on the target accuracy.


Decentralized optimization Decentralized machine learning Communication efficient Stochastic programming Nonsmooth functions Primal–dual method Complexity 

Mathematics Subject Classification

90C25 90C06 90C22 49M37 93A14 90C15 



  1. 1.
    Arrow, K., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford Mathematical Studies in the Social Sciences. Stanford University Press, Stanford (1958)zbMATHGoogle Scholar
  2. 2.
    Aybat, N.S., Hamedani, E.Y.: A primal-dual method for conic constrained distributed optimization problems. In: Advances in Neural Information Processing Systems, pp. 5049–5057 (2016)Google Scholar
  3. 3.
    Aybat, N.S., Wang, Z., Lin, T., Ma, S.: Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Trans. Autom. Control 63(1), 5–20 (2018)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129, 163–195 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Bertsekas, D.P.: Incremental aggregated proximal and augmented lagrangian algorithms. Technical Report LIDS-P-3176, Laboratory for Information and Decision Systems (2015)Google Scholar
  6. 6.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)zbMATHCrossRefGoogle Scholar
  7. 7.
    Bradley, Paul S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90 (1998)Google Scholar
  8. 8.
    Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Chang, T., Hong, M.: Stochastic proximal gradient consensus over random networks. arxiv:1511.08905 (2015)
  12. 12.
    Chang, T., Hong, M., Wang, X.: Multi-agent distributed optimization via inexact consensus admm. arxiv:1402.6065 (2014)
  13. 13.
    Chen, A., Ozdaglar, A.: A fast distributed proximal gradient method. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 601–608 (2012)Google Scholar
  14. 14.
    Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Dang, C., Lan, G.: Randomized first-order methods for saddle point optimization. Technical Report 32611, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL (2015)Google Scholar
  16. 16.
    Deng, Q., Lan, G., Rangarajan, A.: Randomized block subgradient methods for convex nonsmooth and stochastic optimization. arXiv preprint arXiv:1509.04609 (2015)
  17. 17.
    Duchi, J., Agarwal, A., Wainwright, M.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Durham, J.W., Franchi, A., Bullo, F.: Distributed pursuit-evasion without mapping or global localization via local frontiers. Auton. Robots 32(1), 81–95 (2012)CrossRefGoogle Scholar
  19. 19.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Gurbuzbalaban, M., Ozdaglar, A., Parrilo, P.: On the convergence rate of incremental aggregated gradient algorithms. arxiv:1506.02081 (2015)
  22. 22.
    He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. J. Comput. Optim. Appl. 103, 127–152 (2015)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Hom, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge UP, New York (1991)Google Scholar
  25. 25.
    Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Autom. Control 48(6), 988–1001 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Jakovetic, D., Xavier, J., Moura, J.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1145 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Lan, G., Nemirovski, A., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134(2), 425–458 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. arxiv:1507.02000 (2015)
  31. 31.
    Luenberger, D.G., Ye, Y., et al.: Linear and Nonlinear Programming, vol. 2. Springer, Berlin (1984)zbMATHGoogle Scholar
  32. 32.
    Makhdoumi, A., Ozdaglar, A.: Convergence rate of distributed admm over networks. arxiv:1601.00194 (2016)
  33. 33.
    Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: Dqm: Decentralized quadratically approximated alternating direction method of multipliers. arxiv:1508.02073 (2015)
  34. 34.
    Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A decentralized second-order method with exact linear convergence rate for consensus optimization. arxiv:1602.00596 (2016)
  35. 35.
    Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Monteiro, R.D.C., Svaiter, B.F.: Complexity of variants of Tseng’s modified F-B splitting and korpelevich’s methods for hemivariational inequalities with applications to saddle-point and convex optimization problems. SIAM J. Optim. 21(4), 1688–1720 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal projection method for the iterates and the ergodic mean. SIAM J. Optim. 20, 2755–2787 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Nedić, A.: Asynchronous broadcast-based convex optimization over a network. IEEE Trans. Autom. Control 56(6), 1337–1351 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    Nedić, A., Bertsekas, D.P., Borkar, V.S.: Distributed asynchronous incremental subgradient methods. Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, pp. 311–407 (2001)Google Scholar
  41. 41.
    Nedić, A., Olshevsky, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Nedić, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. arxiv:1607.03218 (2016)
  43. 43.
    Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  44. 44.
    Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  46. 46.
    Nemirovski, A.S., Yudin, D.: Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. Wiley, XV (1983)Google Scholar
  47. 47.
    Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 61(2), 275–319 (2015)MathSciNetGoogle Scholar
  48. 48.
    Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr., E.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. arxiv:1605.07112 (2016)
  50. 50.
    Rabbat, M.: Multi-agent mirror descent for decentralized stochastic optimization. In: 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 517–520 (2015)Google Scholar
  51. 51.
    Rabbat, M., Nowak, R.D.: Distributed optimization in sensor networks. In: IPSN, pp. 20–27 (2004)Google Scholar
  52. 52.
    Ram, S.S., Nedić, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  53. 53.
    Ram, S.S., Nedić, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Ram, S.S., Veeravalli, V.V., Nedić, A.: Distributed non-autonomous power control through distributed convex optimization. In: IEEE INFOCOM, pp. 3001–3005 (2009)Google Scholar
  55. 55.
    Shi, W., Ling, Q., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Sig. Process. 62(7), 1750–1761 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  56. 56.
    Shi, W., Ling, Q., Wu, G., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  57. 57.
    Shi, W., Ling, Q., Wu, G., Yin, W.: A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Sig. Process. 63(22), 6013–6023 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  58. 58.
    Simonetto, A., Kester, L., Leus, G.: Distributed time-varying stochastic optimization and utility-based communication. arxiv:1408.5294 (2014)
  59. 59.
    Terelius, H., Topcu, U., Murray, R.: Decentralized multi-agent optimization via dual decomposition. IFAC Proc. Vol. 44(1), 11245–11251 (2011)CrossRefGoogle Scholar
  60. 60.
    Tsianos, K., Lawlor, S., Rabbat, M.: Consensus-based distributed optimization: practical issues and applications in large-scale machine learning. In: Proceedings of the 50th Allerton Conference on Communication, Control, and Computing (2012)Google Scholar
  61. 61.
    Tsianos, K., Rabbat, M.: Consensus-based distributed online prediction and optimization. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 807–810 (2013)Google Scholar
  62. 62.
    Tsitsiklis, J., Bertsekas, D., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  63. 63.
    Tsitsiklis, J.N.: Problems in Decentralized Decision Making and Computation. Ph.D. thesis, Massachusetts Inst. Technol., Cambridge, MA (1984)Google Scholar
  64. 64.
    Wang, M., Bertsekas, D.P.: Incremental constraint projection-proximal methods for nonsmooth convex optimization. Technical Report LIDS-P-2907, Laboratory for Information and Decision Systems (2013)Google Scholar
  65. 65.
    Wei, E., Ozdaglar, A.: On the \({O}(1/k)\) convergence of asynchronous distributed alternating direction method of multipliers. arxiv:1307.8254 (2013)
  66. 66.
    Xi, C., Wu, Q., Khan, U.A.: Distributed mirror descent over directed graphs. arxiv:1412.5526 (2014)
  67. 67.
    Zhu, J., Rosset, S., Tibshirani, R., Hastie, T.J.: 1-norm support vector machines. In: Advances in neural information processing systems, pp. 49–56 (2004)Google Scholar
  68. 68.
    Zhu, M., Martinez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 151–164 (2012)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2018

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations