Regularized dual gradient distributed method for constrained convex optimization over unbalanced directed graphs


This paper investigates a distributed optimization problem over a cooperative multi-agent time–varying network, where each agent has its own decision variables that should be set so as to minimize its individual objective subjected to global coupled constraints. Based on push-sum protocol and dual decomposition, we design a regularized dual gradient distributed algorithm to solve this problem, in which the algorithm is implemented in unbalanced time–varying directed graphs only requiring the column stochasticity of communication matrices. By augmenting the corresponding Lagrangian function with a quadratic regularization term, we first obtain the bound of the Lagrangian multipliers which does not require constructing a compact set containing the dual optimal set when compared with most of primal-dual based methods. Then, we obtain that the convergence rate of the proposed method can achieve the order of \(\mathcal {O}(\ln T/T)\) for strongly convex objective functions, where T is the number of iterations. Moreover, the explicit bound of constraint violations is also given. Finally, numerical results on the network utility maximum problem are used to demonstrate the efficiency of the proposed algorithm.


In recent years, it is witnessed the unprecedented growth in the research for solving many optimization problems over multi-agent networks [1,2,3,4]. Various multi-agent optimization problems have been investigated by many researchers and found in a lot of application domains, such as distributed finite-time optimal rendezvous problem [5], wireless and social networks [6, 7], power systems [8, 9], and robotics [10]. In the optimization community of this class of problems, there is indeed a long history, see [11].

Based on consensus schemes, there are mainly three categories of algorithms designed for distributed optimization in the literatures, including primal consensus distributed algorithms, dual consensus distributed algorithms, and primal-dual consensus distributed algorithms, see [1, 12,13,14,15,16,17]. In most previous works, the communication graphs are required to be balanced, i.e., the communication weight matrices are doubly stochastic. The paper [18] considered a fixed and directed graph with the requirement of a balanced graph. The work in [19] proposed distributed subgradient-based algorithms in directed and fixed topologies, in which the messages among agents are propagated by “push-sum” protocol. However, the communication protocol is required to know the number of agents or the graph. In general, push-sum protocol is attractive for implementations since it can easily operate over directed communication topologies, and thus avoids incidents of deadlock that may occur in practice when using undirected communication topologies [4]. Nedić et al. in [4] designed a subgradient-push distributed method for a class of unconstrained optimization problems, where the requirement of a balanced graph was canceled. Their proposed method has a slower convergence rate with order of \(\mathcal {O}(\ln T/\sqrt {T})\). Later, Nedić et al. in [20] improved the convergence rate from \(\mathcal {O}(\ln T/\sqrt {T})\) to \(\mathcal {O}(\ln T/T)\) under the condition of strong convexity. However, they only considered unconstrained optimization problems.

The methods for solving distributed optimization problems subject to equality or (and) inequality constraints have received considerable attention [21,22,23]. The authors in [14] first proposed a distributed Lagrangian primal-dual subgradient method by characterizing the primal-dual optimal solutions as the saddle points of the Lagrangian function related to the problem under consideration. The work [24] developed a variant of the distributed primal-dual subgradient method by introducing multistep consensus mechanism. Aybat et al. [25, 26] investigated a consensus optimization with agent-specific private conic constraints over time-varying networks and proposed a novel ADMM–based distributed method. For more general distributed optimization problem with inequality constraints that couple all the agents’ decision variables, Chang et al. [27] designed a novel distributed primal-dual perturbed subgradient method and analyzed the convergence of the algorithm. The implementation of the algorithms aforementioned usually involves projections onto some primal and dual constrained sets, respectively. In particular, they require constructing a compact set that contains the dual optimal set, and projecting the dual variable onto this set to guarantee the boundedness of dual iterates, which is of importance in establishing the convergence of the algorithms. However, the construction of this compact set is impractical since it involves each agent solving a general constrained convex problem [28, 29]. To ensure the boundedness of the norm of the dual variables, Yuan et al. in [28] proposed a regularized primal-dual distributed algorithm. However, the optimization problem only includes one constraint. Later, Khuzani et al. in [29] investigated a distributed optimization with several inequality constraints, and established the convergence of their proposed distributed deterministic and stochastic primal-dual algorithms, respectively. Very recently, Falsone et al. [30] designed a dual decomposition–based distributed method for solving a separable convex optimization with coupled inequality constraints and provided the convergence analysis, but none of explicit convergence rate of their algorithm was given. Most of aforementioned works operating over undirected networks with the usage of doubly stochastic matrices are possible. However, it turns out that directed graphs depending on doubly stochastic matrices may be undesirable for a variety of reasons, see [4, 20].

In this paper, we propose a distributed regularized dual gradient method for solving convex optimization problem subjected to local and coupling constraints over time-varying directed networks. The proposed method is based on push-sum protocol and dual decomposition. Each agent is only required to know its out-degree at each time, without requiring knowledge of either the number of agents or the graph sequence. By augmenting the corresponding Lagrangian function with a quadratic regularization term, the norm of the multipliers is bounded, which does not require constructing a compact set containing the dual optimal set when compared with existing most of primal-dual methods. The convergence rate of the method with the order of \(\mathcal {O}(\ln T/T)\) for strongly convex objective functions is obtained. Moreover, the explicit bound on the constraint violations is also provided.

The main contributions of this paper can be summarized as follows:

  1. (i)

    For the proposed algorithm, we obtain the explicit convergence rate of the objective values. Moreover, we give the explicit bound estimate on the constraint violation. By comparisons, the work in [30] only establishes the convergence of their approach, where no explicit convergence rate is characterized.

  2. (ii)

    Our proposed algorithm removes the requirement of balanced graphs. Balanced communication graphs have been used in [28,29,30]. In contrast, more general interaction graphs, i.e., unbalanced graphs, are considered in this paper by exploiting the push-sum protocol [4].

  3. (iii)

    We provide the upper bound on the norm of dual variables by resorting to the regularized Lagrangian function, without the requirement of constructing a compact set to contain the dual optimal set when compared with the work [14, 27].

The remainder of this paper is organized as follows. In Section 2, we state the related problem, useful assumptions, and preparatory work. In Section 3, we propose the distributed regularized dual gradient algorithm and give main results. In Section 4, we give some Lemmas and the proof of main results. Numerical simulations are given in Section 5. Finally, Section 6 draws some conclusions.

Notation: We use boldface to distinguish between the scalars and vectors in \(\mathbb {R}^{n}\). For example, vi[t] is a scalar and ui[t] is a vector. For a matrix W, we will use the (W)ij to show its i, j’th entry. We use the ||x|| to denote the Euclidean norm of a vector x, and 1 for the vector of ones. A convex function \(f: \mathbb {R}^{n}\rightarrow \mathbb {R}\) is \(\widetilde {\gamma }\)-strongly convex with \(\widetilde {\gamma }>0\) if the following relation holds, for all \(\mathbf {x}, \mathbf {y}\in \mathbb {R}^{n}\)

$$f(\mathbf{x})-f(\mathbf{y}) \geq \nabla f(\mathbf{y})^{\top}(\mathbf{x}-\mathbf{y}) + \frac{\widetilde{\gamma}}{2}||\mathbf{x}-\mathbf{y}||^{2},$$

where ∇f(y) is the subgradient of f at y. The function f(x) is \(\widetilde {\gamma }\)-strongly concave if − f(x) is \(\widetilde {\gamma }\)-strongly convex.

Distributed optimization problem with equality constraints

Constrained multi-agent optimization

Consider the following constrained optimization problem

$$ \min\limits_{\{\mathbf{x}_{i}\in \mathbf{X}_{i}\}_{i=1}^{m}} ~~F(\mathbf{x}):=\sum\limits_{i=1}^{m} f_{i}(\mathbf{x}_{i}) ~~~~\text{s.t.}~~~~ \sum\limits_{i=1}^{m}(A_{i}\mathbf{x}_{i}-\mathbf{b}_{i})=\mathbf{0}, $$

where there are m agents associated with a time-varying network. Each agent i only knows its own objective function fi(xi): \(\mathbb {R}^{n_{i}}\rightarrow \mathbb {R}\) and its own constraints \(\mathbf {X}_{i}\in \mathbb {R}^{n_{i}}\), and all agents subject to the coupling equality constraints \({\sum }_{i=1}^{m}(A_{i}\mathbf {x}_{i}-\mathbf {b}_{i})=\mathbf {0}\), \(A_{i}\in \mathbb {R}^{p\times n_{i}}\) and \(\mathbf {b}_{i}\in \mathbb {R}^{p}\). \(\mathbf {x}:=(\mathbf {x}_{1}^{\top },\cdots , \mathbf {x}_{m}^{\top })^{\top }\) with \(n={\sum }_{i=1}^{m} n_{i}\), belongs to X := X1 ×⋯ ×Xm.

Problem (1) is quite general arising in diverse applications, for examples, distributed model predictive control [22], network utility maximization [31, 32], and economic dispatch problems for smart grid [8, 9].

To decouple the coupling equality constraints, we introduce a regularized Lagrangian function \(\mathcal {L}(\mathbf {x},\lambda )\) of problem (1), given by

$$ \begin{array}{@{}rcl@{}} \mathcal{L} (\mathbf{x},\lambda) := \sum\limits_{i=1}^{m} [f_{i}(\mathbf{x}_{i}) + \lambda^{\top}(A_{i}\mathbf{x}_{i}-\mathbf{b}_{i}) - \frac{\gamma_{i}}{2}\lambda^{\top}\lambda]=\sum\limits_{i=1}^{m} \mathcal{L}_{i}(\mathbf{x}_{i},\lambda) , \end{array} $$

where \(\mathcal {L}_{i}(\mathbf {x}_{i},\lambda )=f_{i}(\mathbf {x}_{i}) + \lambda ^{\top }(A_{i}\mathbf {x}_{i}-\mathbf {b}_{i}) - \frac {\gamma _{i}}{2}\lambda ^{\top }\lambda \) is the regularized Lagrangian function associated with the i th agent, and γi > 0 is a regularization parameter for i = 1,…, m.

Define a regularized dual function of problem (1) as follows

$$\phi(\lambda):= \min\limits_{\mathbf{x}\in \mathbf{X}}\mathcal{L}(\mathbf{x},\lambda).$$

Note that the regularized Lagrangian function \(\mathcal {L}(\mathbf {x},\lambda )\) defined in (2) is separable with respect to xi, i = 1,…, m. Thus, the regularized dual function ϕ(λ) can be rewritten as

$$ \phi(\lambda)= \sum\limits_{i=1}^{m} \phi_{i}(\lambda), $$

where \(\phi _{i}(\lambda ):=\min _{\mathbf {x}_{i}\in \mathbf {X}_{i}}\mathcal {L}_{i}(\mathbf {x}_{i},\lambda )\) can be regarded as the regularized dual function of agent i, i = 1,…, m. Then, the regularized dual problem of problem (1) can be written as \(\max _{\lambda }\min _{\mathbf {x}\in X} \mathcal {L}(\mathbf {x},\lambda )\), or, equivalently,

$$ \max\limits_{\lambda}\sum\limits_{i=1}^{m} \phi_{i}(\lambda). $$

The coupling equality constraints between agents is represented by the fact that λ is a common decision vector and all the agents should agree on its value.

Related assumptions

The following assumptions on the problem (1) and on the communication time–varying network are needed to show properties of convergence for the proposed method.

Assumption 1

For eachi = 1,…, m,the functionfi(⋅):\(\mathbb {R}^{n_{i}}\rightarrow \mathbb {R}\)isτi-stronglyconvex, and the set\(\mathbf {X}_{i}\subseteq \mathbb {R}^{n_{i}}\)isnon-empty, convex, and compact.

Note that, under the Assumption 1, we have:

  1. (i)

    the function ϕi(λ) defined in (4) is γi-strongly concave and differentiable and its gradient ∇ϕi(λ) = Aixi(λ) −biγiλ is Lipschitz continuous with constant ||Ai||/τi, where \(\mathbf {x}_{i}(\lambda ):=\arg \min _{\mathbf {x}_{i}\in \mathbf {X}_{i}} \mathcal {L}_{i}(\mathbf {x}_{i},\lambda )\) (see [23, 32], for more details);

  2. (ii)

    for any xiXi, there is a constant Gi > 0 such that ||Aixibi||≤ Gi, due to the compactness of Xi, i = 1,…, m.

We assume that each agent can communicate with other agents over a time-varying network. The communication topology is modeled by a directed graph \(\mathcal {G}[t]=(\mathcal {V}, \mathcal {E}[t])\) over the vertex set \(\mathcal {V}=\{1,\ldots ,m\}\) with the edge set \(\mathcal {E}[t]\subseteq \mathcal {V}\times \mathcal {V}\). Let \(\mathcal {N}_{i}^{in}[t]\) represent the collection of in-neighbors and \(\mathcal {N}_{i}^{out}[t]\) represent the collection of out-neighbors of agent i at time t, respectively. That is,

$$\mathcal{N}_{i}^{in}[t]:=\{j|(j,i)\in \mathcal{E}[t]\}\cup \{i\},$$
$$\mathcal{N}_{i}^{out}[t]:=\{j|(i,j)\in \mathcal{E}[t]\}\cup \{i\},$$

where (j, i) represents agent j may send its information to agent i. And let di(t) be the out-degree of agent i, i.e.,


We introduce a time-varying communication weight matrix W[t] with elements (W[t])ij, defined by

$$ \begin{array}{@{}rcl@{}} & (W[t])_{ij}~~ = ~~\left\{ \begin{array}{lll} &~~~~\frac{1}{d_{j}[t]},~~\text{when}~ j\in \mathcal{N}_{i}^{in}[t],~i,~j=1,\ldots,m,\\ &~~~~~~0,~~~~~~~~\text{otherwise}. \end{array} \right. \end{array} $$

We need the following assumptions on the weight matrix W[t], which can be found in [2, 4].

Assumption 2

(i) Every agentionly knows its out-degreedi[t] at every timet;(ii) The graph sequence\(\mathcal {G}[t]\)isB-stronglyconnected, namely there exists an integerB > 0 such that the sequence\(\mathcal {G}[t]\)withedge set\(\mathcal {E}[t]=\cup _{l=kB}^{(k+1)B-1}\mathcal {E}[l]\)isstrongly connected, for allk ≥ 0.

Note that the communicated weight matrix W[t] is column stochastic. In this paper, we do not require the assumption of double-stochasticity on W[t].

Algorithm and main results

Distributed regularized dual gradient algorithm

In general, the problem (1) could be solved in a centralized manner. However, if the number m of agents is large, this may turn out to be computationally challenging. Additionally, each agent would be required to share its own information, such as the objective fi, the constraints Xi and (Ai, bi), either with the other agents or with a central coordinate collecting all information, which is possibly undesirable in many cases, due to privacy concerns.

To overcome both the computational and privacy issues stated above, we propose a Distributed Regularized Dual Gradient Algorithm (DRDGA, for short) by resorting to solve the regularized dual problem (4). Our proposed algorithm DRDGA is motivated by the gradient push-sum method [4] and dual decomposition [23, 30], described as in Algorithm 1.


In Algorithm 1, each agent i broadcasts (or pushes) the quantities 𝜃i[t]/di[t] and ρi[t]/di[t] to all of the agents in its out-neighborhood \(\mathcal {N}_{i}^{out}[t]\). Then, each agent simply sums all the received messages to obtain ui[t + 1] in step 4 and ρi[t + 1] in step 5, respectively. The update rules in steps 6–8 can be implemented locally. In particular, the update of local primal vector xi[t + 1] in step 7 is performed by minimizing \(\mathcal {L}_{i}\) with respect to xi evaluated at λ = λi[t + 1], while the update of the dual vector λi[t + 1] in step 8 involves the maximization of \(\mathcal {L}_{i}\) with respect to λ = λi evaluated at xi = xi[t + 1]. Note that the term Aixi[t + 1] −biγiλi[t + 1] in step 8 is the gradient of the dual function ϕi(λ) at λ = λi[t + 1].

Remark 1

  1. (i)

    The algorith’m proposed in [30] requires that the communication weight matrices are doubly stochastic with balanced graphs, while our Algorithm 1 removes the requirement only needs the column stochasticity of communication matrices with unbalanced graphes which is more general and practical.

  2. (ii)

    Based on primal-dual methods [28, 29], a global knowledge by all agents of the coupled constraints in the primal is required and information related to the primal problem is exchanged among agents. This may raise privacy issues and often results in unnecessary computational and communication efforts [30]. By comparison, our Algorithm 1 is based on dual methods, only the local estimates of dual variables are exchanged. Thus, this secures maximum privacy among agents and the required local computational and communication effort is much smaller.

Statement of main results

In this section, we will show that the main convergence results of the proposed Algorithm 1.

It is shown in [2] that the local primal vector xi[t] does not converge to the optimal solution \(\mathbf {x}_{i}^{*}\) of problem (1) in general. Compared to xi[t], however, the following recursive auxiliary primal iterates

$$\widehat{\mathbf{x}}_{i}[T] = \frac{{\sum}_{t=1}^{T} (t-1) \mathbf{x}_{i}[t]}{\frac{T(T-1)}{2}}, ~\mathrm{for~all}~T\geq 2$$

can show better convergence properties by setting \(\widehat {\mathbf {x}}_{i}[1]=\mathbf {x}_{i}[0]\), see [14, 27, 32]. Define the averaging iterates as \(\overline {\theta }[t]=\frac {{\sum }_{i=1}^{m}\mathbf {\theta }_{i}[t]}{m}\).

The following Theorem 1 first gives an upper on the norm of dual variables. By controlling the norm of the dual variables, we in turn control the norm of the sub-gradients of the augmented Lagrangian function, which are instrumental to prove Theorem 2 and Theorem 3 below.

Theorem 1

Suppose that Assumptions 1 and 2 hold and the non-increasing stepsize sequence{β[t]}t> 0satisfies\(\lim _{t\rightarrow \infty } \beta [t] = 0\). Then, there is a finiteconstantD > 0 such that for alli = 1,…, m,

$$\sup_{t}||\lambda_{i}[t]|| \leq D. $$

where Ddepends on the parametersγi, τi,||Ai||, Gi, andδ.

In what follows, Theorem 2 shows the convergence rate of primal function’s value under Assumptions 1 and 2.

Theorem 2 (Convergence rate)

Suppose Assumptions 1 and 2 hold. Let the stepsize be takenas\(\beta [t]=\frac {q}{t}, t=1,2,\ldots \),where the constantqsatisfies\(\frac {q\gamma }{m}\geq 4\)with\(\gamma ={\sum }_{j=1}^{m}\gamma _{j}\).Then, for allT ≥ 1 andi = 1,…, m,we have

$$ \begin{array}{@{}rcl@{}} &&F(\widehat{\mathbf{x}}_{i}[T]) - F(\mathbf{x}^{*}) \\ &\leq & \frac{32}{T\delta}\sum\limits_{i=1}^{m}(G_{i}+\gamma_{i}D)\left( \frac{\eta}{ 1 - \eta}\sum\limits_{i=1}^{m}||\theta_{i}[0]||_{1}+\frac{ q m B}{1-\eta}(1+\ln T)\right)+\frac{q}{T}\sum\limits_{i=1}^{m}(G_{i} + \gamma_{i}D)^{2}. \end{array} $$

where the constant\(B=\max _{1\leq i\leq m} \sqrt {p}(G_{i} + \gamma _{i}D)\),and the scalarsη ∈ (0,1) andδ > 0 satisfy\(\delta \geq \frac {1}{m^{mB}},~~ \eta \leq (1-\frac {1}{m^{mB}})^{\frac {1}{mB}}\).

Theorem 2 shows that the iterative sequence of primal objective function \(\{F(\widehat {\mathbf {x}}[T])\}\) converges to the optimal value F(x) at a rate of O(lnT/T), i.e.,

$$F(\widehat{\mathbf{x}}[T])-F(\mathbf{x}^{*})=O\left( \frac{\ln T}{T}\right)$$

with the constant relying on the regularization parameters γi, i = 1,…, m, the bounds of dual variables D and coupling constraints Gi, i = 1,…, m, initial values \(\overline {\mu }[0]\) at the agents, and on both the speed η of the network information diffusion and the imbalance δ of influences among the agents.

In the next theorem, we show the upper bound on the constraint violation.

Theorem 3 (Constraint violation bound)

Under the conditions of Theorem 1, we have for allT ≥ 1,

$$ \begin{array}{@{}rcl@{}} &&||\sum\limits_{j=1}^{m}A_{j}\widehat{\mathbf{x}}_{j}[T]-\mathbf{b}_{j}||^{2}\\ & \leq & \frac{16\gamma}{T\delta}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)\left( \frac{8\eta}{1-\eta}\sum\limits_{j=1}^{m}||\mu_{j}[0]||_{1}+\frac{8qmB}{1-\eta}(1+\ln T)\right) + \frac{4q\gamma}{T}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)^{2}. \end{array} $$

Theorem 3 provides that the bound of constraint violation measured by \(||{\sum }_{i=1}^{m}A_{i}\widehat {\mathbf {x}}_{i}[T]-\mathbf {b}_{i}||\) is of the order \(O(\sqrt {\ln T/T})\).

Remark 2

  1. (i)

    Under unbalanced networks, Theorems 2 and 3 give the explicit convergence rates of the objective function value and constraint violation. In contrast with the work [30], only the convergence results on the dual iterates and primal iterates are obtained under balanced network, where no explicit convergence rate is provided.

  2. (ii)

    The algorithms of [27, 30] require that a Slater point exists and is known to all agents. However, our work removes these requirements and obtain the boundedness of dual variables by introducing the regularized Lagrangian function, presented as in Theorem 1.

  3. (iii)

    When the objective functions of the problem (1) considered in this paper are convex without the strong convexity assumption, the convergence rate of Algorithm 1 can be reduced to \(O(\sqrt {\ln T}/\sqrt {T})\). Please refer to our previous work [33].

Proof of main results

Before the proof of main results, we need to establish some useful auxiliary lemmas. The following Lemma 1 exploits the structure of strongly concave functions with Lipschitz gradients, whose proof is motivated by Lemma 3 in [20]. We omit the proof here.

Lemma 1

Let\(h: \mathbb {R}^{p}\rightarrow \mathbb {R}\)bea\(\widetilde {\gamma }\)-strongly concavefunction with\(\widetilde {\gamma }>0\)andhave Lipschtiz continuous gradients with constant\(\widetilde {M}>0\). Forany\(\mathbf {z}\in \mathbb {R}^{p}\),let\(\mathbf {y}\in \mathbb {R}^{p}\)be definedby

$$\mathbf{y}:=\mathbf{z} + \beta(\nabla h(\mathbf{z}) + \varphi(\mathbf{z})),$$

where\(\beta \in (0, \frac {\widetilde {\gamma }}{8\widetilde {M}^{2}}]\)and\(\varphi : \mathbb {R}^{p}\rightarrow \mathbb {R}^{p}\)is a mapping such that

$$||\varphi(\mathbf{z})||\leq c ,~~~ \forall~ \mathbf{z}\in \mathbb{R}^{p}.$$

Then, there is a compact set\(V \subset \mathbb {R}^{p}\) (which depends oncand the define of functionh, but not onβ) such that

$$ ||\mathbf{y}|| \leq \left\{ \begin{array}{lll} ||\mathbf{z}||,~~ \forall~ \mathbf{z} \notin V,\\ R ,~~ \forall~ \mathbf{z} \in V , \end{array} \right. $$

where\(R = \max _{\mathbf {v}\in V}\{||\mathbf {v}|| + \frac {\widetilde {\gamma }}{8\widetilde {M}^{2}}||\nabla h(\mathbf {v})||\} + \frac {\widetilde {\gamma }~ c }{8\widetilde {M}^{2}}\)andh(v) is the gradient atv.

Based on Lemma 1, we are ready to prove our Theorem 1.

Proof of Theorem 1

By step 5 of Algorithm 1, we have

$$\rho[t+1] = W[t]\rho[t],$$

where ρ[t] is the vector with entries ρi[t]. Further, the above relation can be recursively written as follows

$$ \begin{array}{@{}rcl@{}} \rho[t] = W[t-1]W[t-2]{\cdots} W[0]\mathbf{1},~~ \text{for}~~\text{all}~t \geq 1, \end{array} $$

where we use the fact that ρi[0] = 1, for all i = 1,…, m. Under Assumption 2, by Corollary 2(b) in [4], for all i, we have

$$ \begin{array}{@{}rcl@{}} \delta = \inf_{t = 0,1,\ldots} (\min_{1 \leq i \leq m}(W[t]W[t-1]{\ldots} W[0]\mathbf{1})_{i}) > 0. \end{array} $$

Therefore, we can obtain

$$ \begin{array}{@{}rcl@{}} \rho_{i}[t] \geq \delta, ~~\text{for}~~\text{all}~~i~~\text{and}~~t. \end{array} $$

Using steps 4 and 8 of Algorithm 1, we get

$$ \begin{array}{@{}rcl@{}} \mathbf{\theta}_{i}[t] &= & \mathbf{u}_{i}[t] + \beta[t](A_{i}\mathbf{x}_{i}[t] - \mathbf{b}_{i} - \gamma_{i}\mathbf{\lambda}_{i}[t])\\ &=& \rho_{i}[t] \left( \mathbf{\lambda}_{i}[t] + \frac{\beta[t]}{\rho_{i}[t]}(A_{i}\mathbf{x}_{i}[t] - \mathbf{b}_{i} - \gamma_{i}\mathbf{\lambda}_{i}[t]) \right), \end{array} $$

Furthermore, the above equality gives rise to

$$ \begin{array}{@{}rcl@{}} \frac{\mathbf{\theta}_{i}[t]}{\rho_{i}[t]} &=& \mathbf{\lambda}_{i}[t] + \frac{\beta[t]}{\rho_{i}[t]}(A_{i}\mathbf{x}_{i}[t] - \mathbf{b}_{i} - \gamma_{i}\mathbf{\lambda}_{i}[t])\\ &=& (1 - \frac{\gamma_{i}\beta[t]}{\rho_{i}[t]})\mathbf{\lambda}_{i}[t] + \frac{\beta[t]}{\rho_{i}[t]}(A_{i}\mathbf{x}_{i}[t] - \mathbf{b}_{i}). \end{array} $$

Since the transition matrix W[t]W[t − 1]⋯W[0] is column stochastic and ρ[0] = 1, we have that \({{\sum }_{i}^{m}}\rho _{i}[t] = m\) and

$$ \begin{array}{@{}rcl@{}} \delta \leq \rho_{i}[t] \leq m, ~i=1,\ldots,m. \end{array} $$

Together with (6) and β[t] → 0, it yields for all i

$$ \begin{array}{@{}rcl@{}} \lim\limits_{t \rightarrow \infty} \frac{\mathbf{\beta}[t]}{\rho_{i}[t]}=0. \end{array} $$

Thus, for each i, there exists a Ti > 1 such that \(\frac {\beta [t]}{\rho _{i}[t]} \leq \frac { {\gamma _{i}{\tau _{i}^{2}}}}{8||A_{i}||^{2}}\), for all tTi.

Since the function ϕi(λ) defined in (4) is γi-strongly concave, and its gradient ∇ϕi(λ) is Lipschitz continuous with constant ||Ai||/τi, thus, by Lemma 1, there is a compact set Vi and a finite Ti > 1 such that, for all tTi,

$$ \parallel\frac{\mathbf{\theta}_{i}[t]}{\rho_{i}[t]}\parallel \leq\left\{ \begin{array}{lll} ||\mathbf{\lambda}_{i}[t]||,~~ \text{if}~~ \mathbf{\lambda}_{i}[t] \notin V_{i},\\ R_{i}(\gamma_{i}) ,~~ \text{if}~~ \mathbf{\lambda}_{i}[t] \in V_{i} , \end{array} \right. $$

where \(R_{i}(\gamma _{i})=\max _{\mathbf {v}\in V_{i}}\{||\mathbf {v}||+\frac {\gamma _{i}{\tau _{i}^{2}}}{8||{A_{i}^{2}}||}||\nabla \phi _{i}(\mathbf {x}[t],\mathbf {v})||\}+\frac {c\gamma _{i}{\tau _{i}^{2}}}{8||{A_{i}^{2}}||}\). Let T0 = max1≤imTi. Now, we divide t into two parts (tT0 and 1 ≤ t < T0) to prove the boundedness of ||λi[t]||.

(i) By exploiting the mathematical induction, we will first prove that, for all tT0

$$ \begin{array}{@{}rcl@{}} \max\limits_{1\leq i\leq m} ||\lambda_{i}[t]|| \leq \widetilde{R}, \end{array} $$

where \({ \widetilde {R}=\max \{\max _{j} R_{j}(\gamma _{j}), \max _{j}||\lambda _{j}[T_{0}]||\} }\). Clearly, if t = T0, the relation (10) is true. Suppose it is true at some time tT0. Then, by (9), we have for all i

$$ \begin{array}{@{}rcl@{}} \parallel\frac{\mathbf{\theta}_{i}[t]}{\rho_{i}[t]}\parallel \leq \max\left\{ \max\limits_{j} R_{j}(\gamma_{j}), \max\limits_{j}||\mathbf{\lambda}_{j}[t]|| \right\} \leq \widetilde{R}, \end{array} $$

where the last inequality uses the relation (10).

Next, in Lemma 4 of [20], we let v = ρ[t], P = W[t], and u be taken as the vector of the s th coordinates of the vectors 𝜃i[t], i = 1,…, m, where the coordinate index s is arbitrary. By Lemma 4 of [20], we can get that each vector λi[t + 1] is a convex combination of the vector \(\frac {\mathbf {\theta }_{i}[t]}{\rho _{i}[t]}\), i.e.,

$$ \begin{array}{@{}rcl@{}} \mathbf{\lambda}_{i}[t+1] = \sum\limits_{j=1}^{m}Q_{ij}[t]\frac{\mathbf{\theta}_{j}[t]}{\rho_{j}[t]}, ~~\text{for}~~\text{all}~~i~~\text{and} ~~t\geq0, \end{array} $$

where Q[t] is a row stochastic matrix with entries \(Q_{ij}[t] = \frac {W_{ij}[t]\rho _{j}[t]}{\rho _{i}[t+1]}\). Due to the convexity of Euclidean norm ||⋅||, we further obtain

$$ \begin{array}{@{}rcl@{}} ||\mathbf{\lambda}_{i}[t+1]|| \leq \sum\limits_{j=1}^{m}Q_{ij}[t]||\frac{\mathbf{\theta}_{j}[t]}{\rho_{j}[t]}|| \leq \max\limits_{1\leq j\leq m} ||\frac{\mathbf{\theta}_{j}[t]}{\rho_{j}[t]}||, ~~\text{for}~~\text{all}~~i~~\text{and} ~~t\geq0. \end{array} $$

By (11) and (13), we have that for all i = 1,…, m and the time t + 1

$$ \begin{array}{@{}rcl@{}} \max\limits_{1\leq i \leq m} ||\mathbf{\lambda}_{i}[t+1]|| \leq \widetilde{R}. \end{array} $$

Hence, the relation (10) indeed holds, for all tT0.

(ii) We then prove that ||λi[t]|| is bounded upper for t = 1,…, T0 − 1. By (8), we have

$$1-\frac{\gamma_{j}\beta[t]}{\delta}\leq 1-\frac{\gamma_{j}\beta[t]}{\rho_{j}[t]}\leq 1-\frac{\gamma_{j}\beta[t]}{m}.$$

Note that β[t] is non-increasing in the above inequality and goes to 0 gradually as t, there exists a constant S > 0 such that for all i = 1,…, m and t = 1,…, T0 − 1

$$ \begin{array}{@{}rcl@{}} \left| 1-\frac{\gamma_{j}\beta[t]}{\rho_{j}[t]}\right|\leq S. \end{array} $$

Thus, together with (7) and (13), we can obtain that, for all t = 1,…, T0 − 1

$$ \begin{array}{@{}rcl@{}} \max\limits_{1\leq i\leq m} ||\mathbf{\lambda}_{i}[t+1]|| &\leq& \max\limits_{1\leq j\leq m} |1 - \frac{\gamma_{j}\beta[t]}{\rho_{j}[t]}|||\mathbf{\lambda}_{j}[t]|| + \frac{\beta[t]}{\rho_{j}[t]}||A_{j}\mathbf{x}_{j}[t] - \mathbf{b}_{j})||\\ &\leq & S \max\limits_{1\leq j\leq m} ||\mathbf{\lambda}_{j}[t]|| + \frac{\beta[1]}{\delta}\max\limits_{1\leq j\leq m}G_{j}, \end{array} $$

where the second inequality holds by (15) and the fact that β[1] ≥ β[t] for t ≥ 1. Thus, using the preceding relation recursively for t = 1,…, T0 − 1, and the fact that the initial points 𝜃i[0] is deterministic, we can conclude that there exists a finite constant \(\widetilde {S}>0\) such that

$$ \begin{array}{@{}rcl@{}} \max\limits_{1\leq i\leq m} ||\mathbf{\lambda}_{i}[t+1]||\leq \widetilde{S}. \end{array} $$

Let \(D=\max \{\widetilde {R},\widetilde {S}\}\). By (10) and (17), we can obtain that, for all t ≥ 1 and i = 1,…, m

$$ \begin{array}{@{}rcl@{}} ||\mathbf{\lambda}_{i}[t]||\leq D. \end{array} $$

In order to prove Theorems 2 and 3, we need to use the following result, which is a generalization of Lemma 8 in [4].

Lemma 2

Under the conditions of Theorem 1, forany\(\mathbf {\lambda }\in \mathbb {R}^{p}\)andt > 0,we have

$$ \begin{array}{@{}rcl@{}} ||\overline{\mathbf{\theta}}[t + 1] - \mathbf{\lambda}||^{2} &\!\leq\! & ||\overline{\mathbf{\theta}}[t] - \mathbf{\lambda}||^{2} + \frac{4\beta[t+1]}{m}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)||\mathbf{\lambda}_{j}[t+1] - \overline{\mathbf{\theta}}[t]||\\ && -\frac{\beta[t+1]}{m}\sum\limits_{j=1}^{m}\gamma_{j}||\mathbf{\lambda}_{j}[t+1] - \mathbf{\lambda}||^{2} + \frac{\beta^{2}[t+1]}{m}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)^{2}\\ && -\frac{2\beta[t+1]}{m}(\mathcal{L}(\mathbf{x}[t+1],\mathbf{\lambda})-\mathcal{L}(\mathbf{x},\overline{\mathbf{\theta}}[t])). \end{array} $$


We first prove that \(\overline {\mathbf {\theta }}[t]\) is bounded, for any t > 0. Since W[t] is a column stochastic matrix, we have 1y = 1W[t]y, for any vector \(\mathbf {y}\in \mathbb {R}^{m}\). By step 4 of Algorithm 1, we further have

$$\sum\limits_{i=1}^{m}\mathbf{u}_{i}[t+1]=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{m}W[t]_{ij}\mathbf{\theta}_{j}[t]=\sum\limits_{j=1}^{m}\mathbf{\theta}_{j}[t] = m\overline{\theta}[t].$$

From the definition of λi[t + 1] in step 6 of Algorithm 1, it gives rise to

$$ \begin{array}{@{}rcl@{}} \overline{\theta}[t] = \frac{1}{m}\sum\limits_{i=1}^{m}\mathbf{u}_{i}[t+1] = \frac{1}{m}\sum\limits_{i=1}^{m}\rho_{i}[t+1]\mathbf{\lambda}_{i}[t+1]. \end{array} $$

Note that \({\sum }_{i=1}^{m}\rho _{i}[t]=m\), and ρi[t] > 0 for all t and i. Thus, by the result of Theorem 1, we have, for all i = 1,…, m and t ≥ 0,

$$ \begin{array}{@{}rcl@{}} \overline{\theta}[t] \leq \max\limits_{i}||\mathbf{\lambda}_{i}[t+1]|| \leq D. \end{array} $$

Now we are beginning to prove the result of Lemma 2. From step 8 of Algorithm 1, we have

$$ \begin{array}{@{}rcl@{}} \overline{\mathbf{\theta}}[t+1] = \overline{\mathbf{\theta}}[t] + \frac{\beta[t+1]}{m}\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1]). \end{array} $$

For any \(\mathbf {\lambda }\in \mathbb {R}^{p}\), the relation (18) gives rise to

$$ \begin{array}{@{}rcl@{}} ||\overline{\mathbf{\theta}}[t + 1] - \mathbf{\lambda}||^{2}& = &||\overline{\mathbf{\theta}}[t]- \mathbf{\lambda} + \frac{\beta[t+1]}{m}\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])||^{2}\\ &\!\leq\! & ||\overline{\mathbf{\theta}}[t] - \mathbf{\lambda}||^{2} + \frac{\beta^{2}[t+1]}{m^{2}}||\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t + 1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])||^{2}\\ &&+ \frac{2\beta[t+1]}{m}\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\overline{\mathbf{\theta}}[t]- \mathbf{\lambda}). \end{array} $$

By using the inequality \(({\sum }_{j=1}^{m}a_{j})^{2} \leq m{\sum }_{j=1}^{m}{a^{2}_{j}}\), we can obtain

$$ \begin{array}{@{}rcl@{}} ||\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t + 1] - \mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t + 1]||^{2} &\!\leq\! & m\sum\limits_{j=1}^{m}|| A_{j}\mathbf{x}_{j}[t + 1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1]||^{2}\\ &\leq & m\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Thus, we have, for all t ≥ 0

$$ \begin{array}{@{}rcl@{}} &&||\overline{\mathbf{\theta}}[t+1] -\mathbf{\lambda}||^{2} \leq ||\overline{\mathbf{\theta}}[t]- \mathbf{\lambda}||^{2} + \frac{\beta^{2}[t+1]}{m}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}\\ &&+ \frac{2\beta[t+1]}{m}\sum\limits_{j=1}^{m}(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\overline{\mathbf{\theta}}[t]- \mathbf{\lambda}). \end{array} $$

We now consider the last term in the right-hand side of (19), it can rewritten as

$$ \begin{array}{@{}rcl@{}} &&(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\overline{\mathbf{\theta}}[t]- \mathbf{\lambda})\\ &=& (A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\overline{\mathbf{\theta}}[t]-\mathbf{\lambda}_{j}[t+1])\\ &&+ (A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\mathbf{\lambda}_{j}[t+1] - \mathbf{\lambda}). \end{array} $$

By the Cauchy-Schwarz inequality, we have

$$ \begin{array}{@{}rcl@{}} -(G_{j}+\gamma_{j}D)||\overline{\mathbf{\theta}}[t]-\mathbf{\lambda}_{j}[t+1]|| \leq (A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\overline{\mathbf{\theta}}[t]-\mathbf{\lambda}_{j}[t+1]). \end{array} $$

Since \(\mathcal {L}_{j}(\mathbf {x},\cdot )\) is γj-strongly concave, we have, for any \(\lambda \in \mathbb {R}^{p}\)

$$ \begin{array}{@{}rcl@{}} &&(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j} - \gamma_{j}\mathbf{\lambda}_{j}[t+1])^{\top}(\mathbf{\lambda}_{j}[t+1] - \mathbf{\lambda})\\ &\leq & \mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda}_{j}[t+1]) - \mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda}) - \frac{\gamma_{j}}{2}||\mathbf{\lambda}_{j}[t+1] - \mathbf{\lambda}||^{2}. \end{array} $$

By step 7 of Algorithm 1, for any \(\mathbf {x}_{j} \in \mathbb {R}^{n_{j}}\), we can get

$$ \begin{array}{@{}rcl@{}} &&f_{j}(\mathbf{x}_{j}[t+1]) + \mathbf{\lambda}_{j}[t+1]^{\top}(A_{j}\mathbf{x}_{j}[t+1] -\mathbf{b}_{j}) - \frac{\gamma_{j}}{2}\mathbf{\lambda}_{j}[t+1]^{\top}\mathbf{\lambda}_{j}[t+1]\\ &\leq & f_{j}(\mathbf{x}_{j}) + \mathbf{\lambda}_{j}[t+1]^{\top}(A_{j}\mathbf{x}_{j} -\mathbf{b}_{j}) - \frac{\gamma_{j}}{2}\mathbf{\lambda}_{j}[t+1]^{\top}\mathbf{\lambda}_{j}[t+1]. \end{array} $$

Subtracting \(\mathcal {L}_{j}(\mathbf {x}_{j}[t+1],\mathbf {\lambda })\) in the above relation, we obtain

$$ \begin{array}{@{}rcl@{}} &&\mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda}_{j}[t+1]) - \mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda})\\ &\leq& f_{j}(\mathbf{x}_{j}) + \mathbf{\lambda}_{j}[t+1]^{\top}(A_{j}\mathbf{x}_{j} -\mathbf{b}_{j}) - \frac{\gamma_{j}}{2}\mathbf{\lambda}_{j}[t+1]^{\top}\mathbf{\lambda}_{j}[t+1] - \mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda})\\ &\leq& (G_{j}+\gamma_{j}D)||\mathbf{\lambda}_{j}[t+1]-\overline{\mathbf{\theta}}[t]|| + \mathcal{L}_{j}(\mathbf{x}_{j},\overline{\mathbf{\theta}}[t]) - \mathcal{L}_{j}(\mathbf{x}_{j}[t+1],\mathbf{\lambda}). \end{array} $$

Together with (20), (21), (22), and (23) and the definition of \(\mathcal {L}(\mathbf {x},\lambda )\), we can obtain the desired result. □

Next, we prove Theorem 2.

Proof of Theorem 2

Let x = x and λ = 0 in Lemma 2, we have

$$ \begin{array}{@{}rcl@{}} &&||\overline{\mathbf{\theta}}[t+1]-\mathbf{0}||^{2} \\ &\!\leq\! & ||\overline{\mathbf{\theta}}[t]-\mathbf{0}||^{2}+ \frac{4\beta[t+1]}{m}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]||\\ &&- \frac{2\beta[t+1]}{m}(\mathcal{L}(\mathbf{x}[t + 1],\mathbf{0}) - \mathcal{L}(\mathbf{x}^{*},\overline{\mathbf{\theta}}[t])) + \frac{\beta^{2}[t + 1]}{m}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)^{2}. \end{array} $$

Using the definition of function \(\mathcal {L}(\mathbf {x},\lambda )\) and letting \( \gamma = {\sum }_{j=1}^{m}\gamma _{j}\), we can obtain

$$ \begin{array}{@{}rcl@{}} &&\mathcal{L}(\mathbf{x}[t+1],\mathbf{0}) - \mathcal{L}(\mathbf{x}^{*},\overline{\mathbf{\theta}}[t])\\ &= & F(\mathbf{x}[t+1])-F(\mathbf{x}^{*}) + \mathcal{L}(\mathbf{x}^{*},\mathbf{0})- \mathcal{L}(\mathbf{x}^{*},\overline{\mathbf{\theta}}[t])\\ &\geq & F(\mathbf{x}[t+1])-F(\mathbf{x}^{*}) + \frac{\gamma}{2}||\mathbf{\overline{\theta}}[t]-\mathbf{0}||^{2}, \end{array} $$

where the last inequality makes use of the strong concavity of \(\mathcal {L}({\mathbf {x}^{*}},\cdot )\). Thus, by (24) and (25), and then letting \(\beta [t]=\frac {q}{t}\), we have

$$ \begin{array}{@{}rcl@{}} &&||\overline{\mathbf{\theta}}[t+1]-\mathbf{0}||^{2} \\ &\leq & (1-\frac{q\gamma}{m(t+1)})||\overline{\mathbf{\theta}}[t]-\mathbf{0}||^{2}+ \frac{4q}{m(t+1)}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]||\\ &&-\frac{2q}{m(t+1)}(F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})) +\frac{q^{2}}{m(t+1)^{2}}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Note that \(4\leq \frac {q\gamma }{m}\), it follows that

$$ \begin{array}{@{}rcl@{}} &&||\overline{\mathbf{\theta}}[t+1]-\mathbf{0}||^{2} \\ &\leq & (1-\frac{2}{t+1})||\overline{\mathbf{\theta}}[t]-\mathbf{0}||^{2}+ \frac{4q}{m(t+1)}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]|| \\ &&-\frac{2q}{m(t+1)}(F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})) +\frac{q^{2}}{m(t+1)^{2}}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Multiplying the preceding relation by t(t + 1), we can see that, for all t ≥ 1

$$ \begin{array}{@{}rcl@{}} &&(t+1)t||\overline{\mathbf{\theta}}[t+1]-\mathbf{0}||^{2} \\ &\leq & t(t-1)||\overline{\mathbf{\theta}}[t]-\mathbf{0}||^{2}+ \frac{4qt}{m}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]|| \\ &&-\frac{2qt}{m}(F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})) +\frac{q^{2}t}{m(t+1)}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Summing up the above inequality from 1 to (T − 1) for all T ≥ 2 and rearranging the terms, it leads to

$$ \begin{array}{@{}rcl@{}} &&\frac{2q}{m}\sum\limits_{t=1}^{T-1}t(F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})) \\ &\leq & -T(T-1)||\overline{\mathbf{\theta}}[t]-\mathbf{0}||^{2} + \frac{q^{2}}{m}\sum\limits_{t=1}^{T-1}\frac{t}{t+1}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}\\ && + \frac{4q}{m}\sum\limits_{t=1}^{T-1}t\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]||\\ &\!\leq\! & \frac{q^{2}(T-1)}{m}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)^{2} + \frac{4q(T - 1)}{m}\sum\limits_{t=1}^{T-1}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)||\lambda_{j}[t + 1] -\overline{\theta}[t]||. \end{array} $$

Dividing both sides by \(\frac {qT(T-1)}{m}\) in the above relation, it yields

$$ \begin{array}{@{}rcl@{}} &&\frac{2}{T(T-1)}\sum\limits_{t=1}^{T-1}t(F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})) \\ &\leq & \frac{4}{T}\sum\limits_{t=1}^{T-1}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)||\lambda_{j}[t+1] -\overline{\theta}[t]||+\frac{q}{T}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Note that, for all i and t, we get

$$||A_{i}\mathbf{x}_{i}[t+1]-\mathbf{b}_{i}-\gamma_{i}\mathbf{\lambda}_{i}[t+1]||_{1}\leq \sqrt{p}||(A_{i}\mathbf{x}_{i}[t+1]-\mathbf{b}_{i}-\ \gamma_{i}\mathbf{\lambda}_{i}[t+1])||\leq \sqrt{p}(G_{i}+\gamma_{i}D).$$

Letting ei[t] = β[t](Aixi[t + 1] −biγiλi[t + 1]) with \(\beta [t]=\frac {q}{t}\), we have \(||\mathbf {e}_{i}[t]||_{1}\leq \frac {q B}{t}\) for all i and t, where \(B=\max _{1\leq i\leq m}\sqrt {p}(G_{i}+\gamma _{i}D)\). By applying Corollary 2 in [20], we can estimate the term \(||\lambda _{j}[t+1] - \overline {\mathbf {\theta }}[t]||\) in (26) as follows

$$ \begin{array}{@{}rcl@{}} \sum\limits_{t=1}^{T-1}||\lambda_{j}[t+1] - \overline{\mathbf{\theta}}[t]||\leq\frac{8\eta}{\delta(1-\eta)}\sum\limits_{j=1}^{m}||\mathbf{\theta}_{j}[0]||_{1} + \frac{8 q m B}{\delta(1-\eta)}(1+\ln T). \end{array} $$

Combining (27) with (26), we can get

$$ \begin{array}{@{}rcl@{}} &&\frac{{\sum}_{t=1}^{T-1}t \left( F(\mathbf{x}[t+1])-F(\mathbf{x}^{*})\right)}{\frac{T(T-1)}{2}}\\ &\leq & \frac{4}{T\delta}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)\left( \frac{8\eta}{1-\eta}\sum\limits_{j=1}^{m}||\mathbf{\theta}_{j}[0]||_{1}+ \frac{8q m B}{1-\eta}(1+\ln T)\right) + \frac{q}{T}\sum\limits_{j=1}^{m}(G_{j}+\gamma_{j}D)^{2}. \quad\quad \end{array} $$

Using the convexity of F, the definition of \(\widehat {\mathbf {x}}[T],\) and (28), the desired result can be obtained. □

Proof of Theorem 3

Let x = x in Lemma 2, we can get

$$ \begin{array}{@{}rcl@{}} &&\frac{2\beta[t+1]}{m}(\mathcal{L}(\mathbf{x}[t+1],\lambda) - \mathcal{L}(\mathbf{x}^{*},\overline{\mu}[t]))\\ &\leq & ||\overline{\mu}[t]-\lambda||^{2}-||\overline{\mu}[t+1]-\lambda||^{2} + \frac{4\beta[t+1]}{m}\\&&\times\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)||\lambda_{j}[t+1] - \overline{\mu}[t]||\\ &&+ \frac{\beta^{2}[t+1]}{m}\sum\limits_{j=1}{m}(G_{j}+\gamma_{j}D)^{2}. \end{array} $$

Considering the terms in the left-hand side of (29), we have

$$ \begin{array}{@{}rcl@{}} &&2(\mathcal{L}(\mathbf{x}[t+1],\lambda)-\mathcal{L}(\mathbf{x}^{*},\overline{\mu}[t]))\\ &= & F(\mathbf{x}[t+1]) + \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})-F(\mathbf{x}^{*}) \\&&- \frac{\gamma}{2}\lambda^{\top}\lambda + \frac{\gamma}{2}\overline{\mu}^{\top}[t]\overline{\mu}[t]\\ &&+ \mathcal{L}(\mathbf{x}[t+1],\lambda)-\mathcal{L}(\mathbf{x}^{*},\lambda)+\mathcal{L}(\mathbf{x}^{*},\lambda) - \mathcal{L}(\mathbf{x}^{*},\overline{\mu}[t])\\ &\geq & 2(F(\mathbf{x}[t+1]) + \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})\\&&-F(\mathbf{x}^{*}))- \frac{\gamma}{2}\lambda^{\top}\lambda + \frac{\gamma}{2}\overline{\mu}^{\top}[t]\overline{\mu}[t]\\ && + \frac{\gamma}{2}||\overline{\mu}[t]-\lambda||^{2}+\gamma\lambda^{\top}(\overline{\mu}[t]-\lambda), \end{array} $$

where the last inequality is due to the strong concavity of L(x,⋅). Further, by (30), we can deduce

$$ \begin{array}{@{}rcl@{}} && 2(\mathcal{L}(\mathbf{x}[t+1],\lambda)-\mathcal{L}(\mathbf{x}^{*},\overline{\mu}[t]))\\ &\geq & 2\lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j}) + \frac{\gamma}{2}||\overline{\mu}[t]-\lambda||^{2} \\&&-\frac{\gamma}{2}(4\lambda^{\top}\lambda - ||\lambda+\overline{\mu}[t]||^{2}) \\ &\geq & 2\lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j}) + \frac{\gamma}{2}||\overline{\mu}[t]-\lambda||^{2}-2\gamma\lambda^{\top}\lambda. \end{array} $$

Combining (29) with (31), and then letting \(\beta [t+1]=\frac {q}{t+1}\), we can obtain

$$ \begin{array}{@{}rcl@{}} && \frac{2q}{m(t+1)}\left( \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})-\gamma\lambda^{\top}\lambda\right)\\ &\leq & (1-\frac{q\gamma}{2m(t+1)})||\overline{\mu}[t]-\lambda||^{2} - ||\overline{\mu}[t+1]-\lambda||^{2} + \frac{4q}{m(t+1)}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)||\lambda[t+1]-\overline{\mu}[t]||\\ && + \frac{q^{2}}{m(t+1)^{2}}\sum\limits_{j=1}^{2}(G_{j}+ \gamma_{j}D)^{2}. \end{array} $$

Due to the fact that \(4 \leq \frac {q\gamma }{m}\), we can see that \(1-\frac {q\gamma }{2m(t+1)} \leq 1-\frac {2}{t+1}\). Thus, by the preceding inequality, we can obtain

$$ \begin{array}{@{}rcl@{}} && \frac{2q}{m(t+1)}\left( \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})-\gamma\lambda^{\top}\lambda\right)\\ &\leq & (1-\frac{2}{t+1})||\overline{\mu}[t]-\lambda||^{2} - ||\overline{\mu}[t+1]-\lambda||^{2}\\ &&+ \frac{4q}{m(t+1)}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)||\lambda[t+1]-\overline{\mu}[t]||\\ && + \frac{q^{2}}{m(t+1)^{2}}\sum\limits_{j=1}^{2}(G_{j}+ \gamma_{j}D)^{2}. \end{array} $$

Multiplying the above inequality by t(t + 1), and then summing up from 1 to T − 1, we have, for all t ≥ 1 and T ≥ 2

$$ \begin{array}{@{}rcl@{}} && \frac{2q}{m}\sum\limits_{t=1}^{T-1}t\left( \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})-\gamma\lambda^{\top}\lambda\right)\\ &\leq & \frac{4q}{m}\sum\limits_{t=1}^{T-1}t{\sum}_{j=1}^{m}(G_{j}+ \gamma_{j}D)||\lambda[t+1]-\overline{\mu}[t]|| + \frac{q^{2}}{m}\sum\limits_{t=1}^{T-1}\frac{t}{t+1}\sum\limits_{j=1}^{2}(G_{j}+ \gamma_{j}D)^{2}\\ && -T(T-1)||\overline{\mu}[T]-\lambda||^{2}\\ &\leq & \frac{4q(T - 1)}{m}\sum\limits_{t=1}^{T-1}\sum\limits_{j=1}^{m}(G_{j} + \gamma_{j}D)||\lambda[t + 1] - \overline{\mu}[t]|| + \frac{q^{2}(T-1)}{m}\sum\limits_{j=1}^{2}(G_{j}+ \gamma_{j}D)^{2}. \end{array} $$

Dividing both sides by \(\frac {qT(T-1)}{m}\) in the inequality above, it gives rise to

$$ \begin{array}{@{}rcl@{}} && \frac{{\sum}_{t=1}^{T-1}t(\lambda^{\top}({\sum}_{j=1}^{m}A_{j}\mathbf{x}_{j}[t+1]-\mathbf{b}_{j})-\gamma\lambda^{\top}\lambda)}{\frac{(T-1)T}{2}}\\ &\leq & \frac{q}{T}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)^{2} + \frac{4}{T}\sum\limits_{t=1}^{T-1}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)||\lambda[t+1]-\overline{\mu}[t]||. \end{array} $$

Note that \({\sum }_{j=1}^{m}A_{j}\mathbf {x}_{j}-\mathbf {b}_{j}\) is linear with respect to xj; thus, we have

$$ \begin{array}{@{}rcl@{}} && \frac{{\sum}_{t=1}^{T-1}t(\lambda^{\top}({\sum}_{j=1}^{m}A_{j}\mathbf{x}_{j}[t + 1] - \mathbf{b}_{j}) - \gamma\lambda^{\top}\lambda)}{\frac{(T-1)T}{2}} \!\geq\! \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\widehat{\mathbf{x}}_{j}[T] - \mathbf{b}_{j}) - \gamma\lambda^{\top}\lambda. \end{array} $$

By (33) and (34), we can obtain, for any \(\lambda \in \mathbb {R}^{p}\)

$$ \begin{array}{@{}rcl@{}} && \lambda^{\top}(\sum\limits_{j=1}^{m}A_{j}\widehat{\mathbf{x}}_{j}[T]-\mathbf{b}_{j})-\gamma\lambda^{\top}\lambda\\ &\leq & \frac{q}{T}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)^{2} + \frac{4}{T}\sum\limits_{t=1}^{T-1}\sum\limits_{j=1}^{m}(G_{j}+ \gamma_{j}D)||\lambda[t+1]-\overline{\mu}[t]||. \end{array} $$

Maximizing the terms in the left-hand side of (35) with respect to λ and using the estimate (27), we can get the desired result. The proof is completed. □

Numerical experiments

Distributed optimization problems with coupled equality constraints have an interesting application on the network utility maximization (NUM) problem investigated in [22, 31, 32]. More specifically, a network is modeled as a set of links L with finite capacities C = (Cl, lL). They are shared by a set of sources S indexed by s. Each source s uses a set L(s) ⊂ L. Let S(l) = {sS|lL(s)} be the set of sources using link l. The set {L(s)} defines an |L|×|S| routing matrix A with entries given by Als = 1 if lL(s), Als = 0 otherwise. Each source s is associated with a utility function \(U_{s}: \mathbb {R}^{+} \rightarrow \mathbb {R}\), i.e., source s gains a utility Us(xs), when it sends data at rate xs satisfying 0 ≤ msxsMs. Let Is = [ms, Ms]. Mathematically, the NUM problem is to determine the source rates that minimize the sum of disutilities with link capacity constraints [31]:

$$ \begin{array}{@{}rcl@{}} \text{(NUM)}~~~\min\limits_{x_{s}\in I_{s}} && g_{N}(x):=\sum\limits_{s\in S}-U_{s}(x_{s}) \\ \text{s.t.} && Ax= C. \end{array} $$

Note that the utility function Us and constraint Is are local and private, only known by the source s. Solving the NUM problem directly requires coordination among possibly all sources and is impractical in real networks. It is important to seek a distributed solution. In the following numerical experiments, we will utilize our proposed distributed method to solve the NUM problem.

For numerical simulations, the utility function is taken as Us(xs) = 20ws log(xs + 0.1) from [32]. Set Cl = 1 for all lL, and ws = |L(s)|/|L|, ms = 0, Ms = 1 for all sS. For the communicated weight matrix W[t], we first generate a pool of 20 weight matrices connecting random graphs, in which each weight matrix satisfies Assumption 2, then we choose a communication matrix at each time t. We take all the regularization parameters as the same with γs = 0.4, sS and the stepsize parameter as q = 10. We use MATLAB convex programming toolbox CVX to compute the solution x. For our method and the compared algorithm, all the algorithms were terminated when all of the conditions below are satisfied at an iteration t: (i) maxsS|λs[t + 1] − λs[t]|≤ 𝜖, (ii) maxlL||Ax[t + 1] − C||≤ 𝜖, (iii) \(\max _{s\in S}|\frac {U_{s}(x[t+1])-U_{s}(x[t])}{U_{s}(x[t])}| \leq \epsilon \), where we set 𝜖 = 0.01 in the simulations.

We first consider a simple logical topology with S = 3 and L = 2 [31], displayed as in Fig. 1. It follows from Fig. 1 that w1 = 1, w2 = 1, w3 = 1/2. Figure 2 shows the evolution of dual variables at the first 70 iterations. Clearly, all local dual variables λs, s = 1,2,3, agree on the same value at a short time with around 70 iterations. Figure 3 illustrates the evolution of each source rate xs, s = 1,2,3. Source rate x1 and x2 can arrive at same value because the weight coefficients w1 = w2. After 70 iterations, every source rate xs can arrive approximately at the optimal solution. Figure 4 demonstrates the aggregated source rates that use Link 2 versus capacity limit of Link 2. It can observed from Fig. 4 that the aggregated source rates satisfy the constraint of Link 2 capacity appropriately. As shown in Fig. 5, the iterative values of disutility objective function gN(x[t]) rapidly converge to the optimal value gN(x) .

Fig. 1

Logical topology. Source Si, i = 1,2,3 transmits to destinations Di, i = 1,2,3

Fig. 2

Iterative value of dual variable λ

Fig. 3

Iterative value of source rate x

Fig. 4

Aggregated source rates using Link 2 vs. capacity of Link 2

Fig. 5

Iterative value of total disutility function vs. optimal value

To compare the performance of our proposed Alg. DRDGA with the existing dual decomposition distributed algorithm (Alg. CDDA) in [30], we next test a random generated problem NUM with sizes S = 20, L = 19 and report the comparisons on the constraint violations and objective function values. Figure 6 displays the evolution of the constraint violation ||Ax[t] − C||. We can find that both algorithms can satisfy the linear equality constraints gradually. But, the convergence speedup of our Alg. DRDGA is faster than that of Alg. CDDA. Figure 7 illustrates that both algorithms can also converge to the optimal value. However, by comparisons, our Alg. DRDGA is convergent to the optimal value faster than Alg. CDDA.

Fig. 6

Evolution of constraint violation ||Ax[t] − C||

Fig. 7

Iterative value of total disutility function using Algs. DRDGA and CDDA vs. optimal value


This paper proposed a solution tool for distributed convex problems with coupling equality constraints. The proposed algorithm is implemented in time-changing directed networks. By resorting to regularize the Lagrangian function, the norm of dual variables can be bounded. The proposed method can reach a fast convergence rate with order O(lnT/T) under some conditions. Numerical example on the network utility maximization demonstrates the effectiveness of the proposed algorithm. As a future research, it is interesting to analyze the communication delays of the proposed distributed method in this paper.


  1. 1.

    Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Nedić, A., Ozdaglar, A., Parrilo, P.: Constrainted consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)

    Article  Google Scholar 

  3. 3.

    Jakovetic, D., Xavier, J., Moura, J.M.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Nedić, A., Olshevsky, A: Distributed optimization over time-varing directed graphs. IEEE Trans. Autom. Control 3(60), 601–615 (2015)

    Article  Google Scholar 

  5. 5.

    Johansson, B., Keviczky, T., Johansson, M., Johansson, K.H.: Subgradient methods and consensus algorithms for solving convex optimization problems. In Proc. IEEE CDC, pp. 4185–4190. Cancun (2008)

  6. 6.

    Baingana, B., Mateos, G., Giannakis, G.: Proximal-gradient algorithms for tracking cascades over social networks. IEEE J. Selected Topics Signal Process. 8(4), 563–575 (2014)

    Article  Google Scholar 

  7. 7.

    Mateos, G., Giannakis, G.: Distributed recursiveleast-squares: Stability and performance analysis. IEEE Trans. Signal Process. 60(7), 3740–3754 (2012)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Bolognani, S., Carli, R., Cavraro, G., Zampieri, S.: Distributed reactive power feedback control for voltage regulation and loss minimization. IEEE Trans. Autom. Control 60(4), 966–981 (2015)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Zhang, Y., Giannakis, G.: Distributed stochastic market clearing with high-penetration wind power and large-scale demand response. IEEE Trans. Power Syst. 31(2), 895–906 (2016)

    Article  Google Scholar 

  10. 10.

    Martinez, S., Bullo, F., Cortez, J., Frazzoli, E.: On synchronous robotic networks-Part I: Models, tasks, and complexity. IEEE Trans. Autom. Control 52(12), 2199–2213 (2007)

    Article  Google Scholar 

  11. 11.

    Tsitsiklis, J.N., Bertsekas, D.P., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Ram, S.S., Nedić, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147(3), 516–545 (2010)

    MathSciNet  Article  Google Scholar 

  13. 13.

    Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Zhu, M., Martinez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 151–163 (2012)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Li, J., Wu, C., Wu, Z., Long, Q.: Gradient-free method for nonsmooth distributed optimization. J. Glob. Optim. 61(2), 325–340 (2015)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Lorenzo, P., Scutari, G.: Netx: In-network nonconvex optimization. IEEE Trans. Signal Inf. Process. Netw. 2(2), 120–136 (2016)

    Article  Google Scholar 

  17. 17.

    Li, J., Chen, G., Dong, Z., Wu, Z.: Distributed mirror descent method for multi-agent optimization with delay. Neurocomputing 177, 643–650 (2016)

    Article  Google Scholar 

  18. 18.

    Gharesifard, B., Cortes, J.: Distributed continuous-time convex optimization on weight-balanced digraphs. IEEE Trans. Autom. Control 59(3), 781–786 (2012)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Tsianos, K.I., Lawlor, S., Rabbat, M.G.: Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1543–1550. IEEE (2012)

  20. 20.

    Nedić, A., Olshevsky, A: Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans. Autom. Control 12(61), 3936–3947 (2016)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Bertsekas, D.P., Nedić, A., Ozdaglar, A.E.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)

    Google Scholar 

  22. 22.

    Necoara, I., Suykens, J.A.: Application of smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)

    MathSciNet  Article  Google Scholar 

  23. 23.

    Li, J., Chen, G., Dong, Z., Wu, Z.: A fast dual proximal-gradient method for separable convex optimization with linear coupled constraints. Comput. Optim. Appl. 64(3), 671–697 (2016)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Yuan, D., Xu, S., Zhao, H.: Distributed primal-dual subgradient method for multiagent optimization via consensus algorithms. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 41(6), 1715–1724 (2011)

    Article  Google Scholar 

  25. 25.

    Aybat, N.S., Hamedani, E.Y.: Distributed primal-dual method for multi-agent sharing problem with conic constraints. In: 2016 50th Asilomar Conference on Signals, Systems and Computers, pp. 777–782. IEEE (2016)

  26. 26.

    Aybat, N.S., Hamedani, E.Y.: A distributed ADMM-like method for resource sharing under conic constraints over time-varying networks. arXiv:1611.07393 (2016)

  27. 27.

    Chang, T.H., Nedić, A., Scaglione, A.: Distributed constrained optimization by consensus-based primal-dual perturbation method. IEEE Trans. Autom. Control 59(6), 1524–1538 (2014)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Yuan, D., Ho, D.W.C., Xu, S.: Regularized primal-dual subgradient method for distributed constrained optimization. IEEE Trans. Cybern. 46(9), 2109–2118 (2016)

    Article  Google Scholar 

  29. 29.

    Khuzani, M.B., Li, N.: Distributed regularized primal-dual method: Convergence analysis and trade-offs. arXiv:1609.08262v3 (2017)

  30. 30.

    Falsone, A., Margellos, K., Garetti, S., Prandini, M.: Dual decomposition and proximal minimization for multi-agent distributed optimization with coupling constraints. Automatica 84, 149–158 (2017)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Low, S.H., Lapsley, D.E.: Optimization flow control. I. Basic algorithm and convergence. IEEE/ACM Trans. Network. 7, 861–874 (1999)

    Article  Google Scholar 

  32. 32.

    Beck, A., Nedić, A., Ozdaglar, A., Teboulle, M.: An O(1/k) Gradient method for network resource Aalocation problems. IEEE Trans. Cont. Net. Sys 1(1), 64–73 (2014)

    Article  Google Scholar 

  33. 33.

    Gu, C., Wu, Z., Li, J., Guo, Y.: Distributed convex optimization with coupling constraints over time-varying directed graphs. arXiv:1805.07916 (2018)

Download references


This research was partially supported by the NSFC 11501070, 11671362 and 11871128, by the Natural Science Foundation Projection of Chongqing cstc2017jcyjA0788 and cstc2018jcyjAX0172, and the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN201800520).

Author information



Corresponding author

Correspondence to Jueyou Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gu, C., Wu, Z. & Li, J. Regularized dual gradient distributed method for constrained convex optimization over unbalanced directed graphs. Numer Algor 84, 91–115 (2020).

Download citation


  • Convex optimization
  • Distributed algorithm
  • Dual decomposition
  • Regularization
  • Multi-agent network