Abstract
This paper investigates a distributed optimization problem over a cooperative multiagent time–varying network, where each agent has its own decision variables that should be set so as to minimize its individual objective subjected to global coupled constraints. Based on pushsum protocol and dual decomposition, we design a regularized dual gradient distributed algorithm to solve this problem, in which the algorithm is implemented in unbalanced time–varying directed graphs only requiring the column stochasticity of communication matrices. By augmenting the corresponding Lagrangian function with a quadratic regularization term, we first obtain the bound of the Lagrangian multipliers which does not require constructing a compact set containing the dual optimal set when compared with most of primaldual based methods. Then, we obtain that the convergence rate of the proposed method can achieve the order of \(\mathcal {O}(\ln T/T)\) for strongly convex objective functions, where T is the number of iterations. Moreover, the explicit bound of constraint violations is also given. Finally, numerical results on the network utility maximum problem are used to demonstrate the efficiency of the proposed algorithm.
Introduction
In recent years, it is witnessed the unprecedented growth in the research for solving many optimization problems over multiagent networks [1,2,3,4]. Various multiagent optimization problems have been investigated by many researchers and found in a lot of application domains, such as distributed finitetime optimal rendezvous problem [5], wireless and social networks [6, 7], power systems [8, 9], and robotics [10]. In the optimization community of this class of problems, there is indeed a long history, see [11].
Based on consensus schemes, there are mainly three categories of algorithms designed for distributed optimization in the literatures, including primal consensus distributed algorithms, dual consensus distributed algorithms, and primaldual consensus distributed algorithms, see [1, 12,13,14,15,16,17]. In most previous works, the communication graphs are required to be balanced, i.e., the communication weight matrices are doubly stochastic. The paper [18] considered a fixed and directed graph with the requirement of a balanced graph. The work in [19] proposed distributed subgradientbased algorithms in directed and fixed topologies, in which the messages among agents are propagated by “pushsum” protocol. However, the communication protocol is required to know the number of agents or the graph. In general, pushsum protocol is attractive for implementations since it can easily operate over directed communication topologies, and thus avoids incidents of deadlock that may occur in practice when using undirected communication topologies [4]. Nedić et al. in [4] designed a subgradientpush distributed method for a class of unconstrained optimization problems, where the requirement of a balanced graph was canceled. Their proposed method has a slower convergence rate with order of \(\mathcal {O}(\ln T/\sqrt {T})\). Later, Nedić et al. in [20] improved the convergence rate from \(\mathcal {O}(\ln T/\sqrt {T})\) to \(\mathcal {O}(\ln T/T)\) under the condition of strong convexity. However, they only considered unconstrained optimization problems.
The methods for solving distributed optimization problems subject to equality or (and) inequality constraints have received considerable attention [21,22,23]. The authors in [14] first proposed a distributed Lagrangian primaldual subgradient method by characterizing the primaldual optimal solutions as the saddle points of the Lagrangian function related to the problem under consideration. The work [24] developed a variant of the distributed primaldual subgradient method by introducing multistep consensus mechanism. Aybat et al. [25, 26] investigated a consensus optimization with agentspecific private conic constraints over timevarying networks and proposed a novel ADMM–based distributed method. For more general distributed optimization problem with inequality constraints that couple all the agents’ decision variables, Chang et al. [27] designed a novel distributed primaldual perturbed subgradient method and analyzed the convergence of the algorithm. The implementation of the algorithms aforementioned usually involves projections onto some primal and dual constrained sets, respectively. In particular, they require constructing a compact set that contains the dual optimal set, and projecting the dual variable onto this set to guarantee the boundedness of dual iterates, which is of importance in establishing the convergence of the algorithms. However, the construction of this compact set is impractical since it involves each agent solving a general constrained convex problem [28, 29]. To ensure the boundedness of the norm of the dual variables, Yuan et al. in [28] proposed a regularized primaldual distributed algorithm. However, the optimization problem only includes one constraint. Later, Khuzani et al. in [29] investigated a distributed optimization with several inequality constraints, and established the convergence of their proposed distributed deterministic and stochastic primaldual algorithms, respectively. Very recently, Falsone et al. [30] designed a dual decomposition–based distributed method for solving a separable convex optimization with coupled inequality constraints and provided the convergence analysis, but none of explicit convergence rate of their algorithm was given. Most of aforementioned works operating over undirected networks with the usage of doubly stochastic matrices are possible. However, it turns out that directed graphs depending on doubly stochastic matrices may be undesirable for a variety of reasons, see [4, 20].
In this paper, we propose a distributed regularized dual gradient method for solving convex optimization problem subjected to local and coupling constraints over timevarying directed networks. The proposed method is based on pushsum protocol and dual decomposition. Each agent is only required to know its outdegree at each time, without requiring knowledge of either the number of agents or the graph sequence. By augmenting the corresponding Lagrangian function with a quadratic regularization term, the norm of the multipliers is bounded, which does not require constructing a compact set containing the dual optimal set when compared with existing most of primaldual methods. The convergence rate of the method with the order of \(\mathcal {O}(\ln T/T)\) for strongly convex objective functions is obtained. Moreover, the explicit bound on the constraint violations is also provided.
The main contributions of this paper can be summarized as follows:
 (i)
For the proposed algorithm, we obtain the explicit convergence rate of the objective values. Moreover, we give the explicit bound estimate on the constraint violation. By comparisons, the work in [30] only establishes the convergence of their approach, where no explicit convergence rate is characterized.
 (ii)
Our proposed algorithm removes the requirement of balanced graphs. Balanced communication graphs have been used in [28,29,30]. In contrast, more general interaction graphs, i.e., unbalanced graphs, are considered in this paper by exploiting the pushsum protocol [4].
 (iii)
We provide the upper bound on the norm of dual variables by resorting to the regularized Lagrangian function, without the requirement of constructing a compact set to contain the dual optimal set when compared with the work [14, 27].
The remainder of this paper is organized as follows. In Section 2, we state the related problem, useful assumptions, and preparatory work. In Section 3, we propose the distributed regularized dual gradient algorithm and give main results. In Section 4, we give some Lemmas and the proof of main results. Numerical simulations are given in Section 5. Finally, Section 6 draws some conclusions.
Notation: We use boldface to distinguish between the scalars and vectors in \(\mathbb {R}^{n}\). For example, v_{i}[t] is a scalar and u_{i}[t] is a vector. For a matrix W, we will use the (W)_{ij} to show its i, j’th entry. We use the x to denote the Euclidean norm of a vector x, and 1 for the vector of ones. A convex function \(f: \mathbb {R}^{n}\rightarrow \mathbb {R}\) is \(\widetilde {\gamma }\)strongly convex with \(\widetilde {\gamma }>0\) if the following relation holds, for all \(\mathbf {x}, \mathbf {y}\in \mathbb {R}^{n}\)
where ∇f(y) is the subgradient of f at y. The function f(x) is \(\widetilde {\gamma }\)strongly concave if − f(x) is \(\widetilde {\gamma }\)strongly convex.
Distributed optimization problem with equality constraints
Constrained multiagent optimization
Consider the following constrained optimization problem
where there are m agents associated with a timevarying network. Each agent i only knows its own objective function f_{i}(x_{i}): \(\mathbb {R}^{n_{i}}\rightarrow \mathbb {R}\) and its own constraints \(\mathbf {X}_{i}\in \mathbb {R}^{n_{i}}\), and all agents subject to the coupling equality constraints \({\sum }_{i=1}^{m}(A_{i}\mathbf {x}_{i}\mathbf {b}_{i})=\mathbf {0}\), \(A_{i}\in \mathbb {R}^{p\times n_{i}}\) and \(\mathbf {b}_{i}\in \mathbb {R}^{p}\). \(\mathbf {x}:=(\mathbf {x}_{1}^{\top },\cdots , \mathbf {x}_{m}^{\top })^{\top }\) with \(n={\sum }_{i=1}^{m} n_{i}\), belongs to X := X_{1} ×⋯ ×X_{m}.
Problem (1) is quite general arising in diverse applications, for examples, distributed model predictive control [22], network utility maximization [31, 32], and economic dispatch problems for smart grid [8, 9].
To decouple the coupling equality constraints, we introduce a regularized Lagrangian function \(\mathcal {L}(\mathbf {x},\lambda )\) of problem (1), given by
where \(\mathcal {L}_{i}(\mathbf {x}_{i},\lambda )=f_{i}(\mathbf {x}_{i}) + \lambda ^{\top }(A_{i}\mathbf {x}_{i}\mathbf {b}_{i})  \frac {\gamma _{i}}{2}\lambda ^{\top }\lambda \) is the regularized Lagrangian function associated with the i th agent, and γ_{i} > 0 is a regularization parameter for i = 1,…, m.
Define a regularized dual function of problem (1) as follows
Note that the regularized Lagrangian function \(\mathcal {L}(\mathbf {x},\lambda )\) defined in (2) is separable with respect to x_{i}, i = 1,…, m. Thus, the regularized dual function ϕ(λ) can be rewritten as
where \(\phi _{i}(\lambda ):=\min _{\mathbf {x}_{i}\in \mathbf {X}_{i}}\mathcal {L}_{i}(\mathbf {x}_{i},\lambda )\) can be regarded as the regularized dual function of agent i, i = 1,…, m. Then, the regularized dual problem of problem (1) can be written as \(\max _{\lambda }\min _{\mathbf {x}\in X} \mathcal {L}(\mathbf {x},\lambda )\), or, equivalently,
The coupling equality constraints between agents is represented by the fact that λ is a common decision vector and all the agents should agree on its value.
Related assumptions
The following assumptions on the problem (1) and on the communication time–varying network are needed to show properties of convergence for the proposed method.
Assumption 1
For eachi = 1,…, m,the functionf_{i}(⋅):\(\mathbb {R}^{n_{i}}\rightarrow \mathbb {R}\)isτ_{i}stronglyconvex, and the set\(\mathbf {X}_{i}\subseteq \mathbb {R}^{n_{i}}\)isnonempty, convex, and compact.
Note that, under the Assumption 1, we have:
 (i)
the function ϕ_{i}(λ) defined in (4) is γ_{i}strongly concave and differentiable and its gradient ∇ϕ_{i}(λ) = A_{i}x_{i}(λ) −b_{i} − γ_{i}λ is Lipschitz continuous with constant A_{i}/τ_{i}, where \(\mathbf {x}_{i}(\lambda ):=\arg \min _{\mathbf {x}_{i}\in \mathbf {X}_{i}} \mathcal {L}_{i}(\mathbf {x}_{i},\lambda )\) (see [23, 32], for more details);
 (ii)
for any x_{i} ∈X_{i}, there is a constant G_{i} > 0 such that A_{i}x_{i} −b_{i}≤ G_{i}, due to the compactness of X_{i}, i = 1,…, m.
We assume that each agent can communicate with other agents over a timevarying network. The communication topology is modeled by a directed graph \(\mathcal {G}[t]=(\mathcal {V}, \mathcal {E}[t])\) over the vertex set \(\mathcal {V}=\{1,\ldots ,m\}\) with the edge set \(\mathcal {E}[t]\subseteq \mathcal {V}\times \mathcal {V}\). Let \(\mathcal {N}_{i}^{in}[t]\) represent the collection of inneighbors and \(\mathcal {N}_{i}^{out}[t]\) represent the collection of outneighbors of agent i at time t, respectively. That is,
where (j, i) represents agent j may send its information to agent i. And let d_{i}(t) be the outdegree of agent i, i.e.,
We introduce a timevarying communication weight matrix W[t] with elements (W[t])_{ij}, defined by
We need the following assumptions on the weight matrix W[t], which can be found in [2, 4].
Assumption 2
(i) Every agentionly knows its outdegreed_{i}[t] at every timet;(ii) The graph sequence\(\mathcal {G}[t]\)isBstronglyconnected, namely there exists an integerB > 0 such that the sequence\(\mathcal {G}[t]\)withedge set\(\mathcal {E}[t]=\cup _{l=kB}^{(k+1)B1}\mathcal {E}[l]\)isstrongly connected, for allk ≥ 0.
Note that the communicated weight matrix W[t] is column stochastic. In this paper, we do not require the assumption of doublestochasticity on W[t].
Algorithm and main results
Distributed regularized dual gradient algorithm
In general, the problem (1) could be solved in a centralized manner. However, if the number m of agents is large, this may turn out to be computationally challenging. Additionally, each agent would be required to share its own information, such as the objective f_{i}, the constraints X_{i} and (A_{i}, b_{i}), either with the other agents or with a central coordinate collecting all information, which is possibly undesirable in many cases, due to privacy concerns.
To overcome both the computational and privacy issues stated above, we propose a Distributed Regularized Dual Gradient Algorithm (DRDGA, for short) by resorting to solve the regularized dual problem (4). Our proposed algorithm DRDGA is motivated by the gradient pushsum method [4] and dual decomposition [23, 30], described as in Algorithm 1.
In Algorithm 1, each agent i broadcasts (or pushes) the quantities 𝜃_{i}[t]/d_{i}[t] and ρ_{i}[t]/d_{i}[t] to all of the agents in its outneighborhood \(\mathcal {N}_{i}^{out}[t]\). Then, each agent simply sums all the received messages to obtain u_{i}[t + 1] in step 4 and ρ_{i}[t + 1] in step 5, respectively. The update rules in steps 6–8 can be implemented locally. In particular, the update of local primal vector x_{i}[t + 1] in step 7 is performed by minimizing \(\mathcal {L}_{i}\) with respect to x_{i} evaluated at λ = λ_{i}[t + 1], while the update of the dual vector λ_{i}[t + 1] in step 8 involves the maximization of \(\mathcal {L}_{i}\) with respect to λ = λ_{i} evaluated at x_{i} = x_{i}[t + 1]. Note that the term A_{i}x_{i}[t + 1] −b_{i} − γ_{i}λ_{i}[t + 1] in step 8 is the gradient of the dual function ϕ_{i}(λ) at λ = λ_{i}[t + 1].
Remark 1

(i)
The algorith’m proposed in [30] requires that the communication weight matrices are doubly stochastic with balanced graphs, while our Algorithm 1 removes the requirement only needs the column stochasticity of communication matrices with unbalanced graphes which is more general and practical.

(ii)
Based on primaldual methods [28, 29], a global knowledge by all agents of the coupled constraints in the primal is required and information related to the primal problem is exchanged among agents. This may raise privacy issues and often results in unnecessary computational and communication efforts [30]. By comparison, our Algorithm 1 is based on dual methods, only the local estimates of dual variables are exchanged. Thus, this secures maximum privacy among agents and the required local computational and communication effort is much smaller.
Statement of main results
In this section, we will show that the main convergence results of the proposed Algorithm 1.
It is shown in [2] that the local primal vector x_{i}[t] does not converge to the optimal solution \(\mathbf {x}_{i}^{*}\) of problem (1) in general. Compared to x_{i}[t], however, the following recursive auxiliary primal iterates
can show better convergence properties by setting \(\widehat {\mathbf {x}}_{i}[1]=\mathbf {x}_{i}[0]\), see [14, 27, 32]. Define the averaging iterates as \(\overline {\theta }[t]=\frac {{\sum }_{i=1}^{m}\mathbf {\theta }_{i}[t]}{m}\).
The following Theorem 1 first gives an upper on the norm of dual variables. By controlling the norm of the dual variables, we in turn control the norm of the subgradients of the augmented Lagrangian function, which are instrumental to prove Theorem 2 and Theorem 3 below.
Theorem 1
Suppose that Assumptions 1 and 2 hold and the nonincreasing stepsize sequence{β[t]}_{t> 0}satisfies\(\lim _{t\rightarrow \infty } \beta [t] = 0\). Then, there is a finiteconstantD > 0 such that for alli = 1,…, m,
where Ddepends on the parametersγ_{i}, τ_{i},A_{i}, G_{i}, andδ.
In what follows, Theorem 2 shows the convergence rate of primal function’s value under Assumptions 1 and 2.
Theorem 2 (Convergence rate)
Suppose Assumptions 1 and 2 hold. Let the stepsize be takenas\(\beta [t]=\frac {q}{t}, t=1,2,\ldots \),where the constantqsatisfies\(\frac {q\gamma }{m}\geq 4\)with\(\gamma ={\sum }_{j=1}^{m}\gamma _{j}\).Then, for allT ≥ 1 andi = 1,…, m,we have
where the constant\(B=\max _{1\leq i\leq m} \sqrt {p}(G_{i} + \gamma _{i}D)\),and the scalarsη ∈ (0,1) andδ > 0 satisfy\(\delta \geq \frac {1}{m^{mB}},~~ \eta \leq (1\frac {1}{m^{mB}})^{\frac {1}{mB}}\).
Theorem 2 shows that the iterative sequence of primal objective function \(\{F(\widehat {\mathbf {x}}[T])\}\) converges to the optimal value F(x^{∗}) at a rate of O(lnT/T), i.e.,
with the constant relying on the regularization parameters γ_{i}, i = 1,…, m, the bounds of dual variables D and coupling constraints G_{i}, i = 1,…, m, initial values \(\overline {\mu }[0]\) at the agents, and on both the speed η of the network information diffusion and the imbalance δ of influences among the agents.
In the next theorem, we show the upper bound on the constraint violation.
Theorem 3 (Constraint violation bound)
Under the conditions of Theorem 1, we have for allT ≥ 1,
Theorem 3 provides that the bound of constraint violation measured by \({\sum }_{i=1}^{m}A_{i}\widehat {\mathbf {x}}_{i}[T]\mathbf {b}_{i}\) is of the order \(O(\sqrt {\ln T/T})\).
Remark 2

(i)
Under unbalanced networks, Theorems 2 and 3 give the explicit convergence rates of the objective function value and constraint violation. In contrast with the work [30], only the convergence results on the dual iterates and primal iterates are obtained under balanced network, where no explicit convergence rate is provided.

(ii)
The algorithms of [27, 30] require that a Slater point exists and is known to all agents. However, our work removes these requirements and obtain the boundedness of dual variables by introducing the regularized Lagrangian function, presented as in Theorem 1.

(iii)
When the objective functions of the problem (1) considered in this paper are convex without the strong convexity assumption, the convergence rate of Algorithm 1 can be reduced to \(O(\sqrt {\ln T}/\sqrt {T})\). Please refer to our previous work [33].
Proof of main results
Before the proof of main results, we need to establish some useful auxiliary lemmas. The following Lemma 1 exploits the structure of strongly concave functions with Lipschitz gradients, whose proof is motivated by Lemma 3 in [20]. We omit the proof here.
Lemma 1
Let\(h: \mathbb {R}^{p}\rightarrow \mathbb {R}\)bea\(\widetilde {\gamma }\)strongly concavefunction with\(\widetilde {\gamma }>0\)andhave Lipschtiz continuous gradients with constant\(\widetilde {M}>0\). Forany\(\mathbf {z}\in \mathbb {R}^{p}\),let\(\mathbf {y}\in \mathbb {R}^{p}\)be definedby
where\(\beta \in (0, \frac {\widetilde {\gamma }}{8\widetilde {M}^{2}}]\)and\(\varphi : \mathbb {R}^{p}\rightarrow \mathbb {R}^{p}\)is a mapping such that
Then, there is a compact set\(V \subset \mathbb {R}^{p}\) (which depends oncand the define of functionh, but not onβ) such that
where\(R = \max _{\mathbf {v}\in V}\{\mathbf {v} + \frac {\widetilde {\gamma }}{8\widetilde {M}^{2}}\nabla h(\mathbf {v})\} + \frac {\widetilde {\gamma }~ c }{8\widetilde {M}^{2}}\)and ∇h(v) is the gradient atv.
Based on Lemma 1, we are ready to prove our Theorem 1.
Proof of Theorem 1
By step 5 of Algorithm 1, we have
where ρ[t] is the vector with entries ρ_{i}[t]. Further, the above relation can be recursively written as follows
where we use the fact that ρ_{i}[0] = 1, for all i = 1,…, m. Under Assumption 2, by Corollary 2(b) in [4], for all i, we have
Therefore, we can obtain
Using steps 4 and 8 of Algorithm 1, we get
Furthermore, the above equality gives rise to
Since the transition matrix W[t]W[t − 1]⋯W[0] is column stochastic and ρ[0] = 1, we have that \({{\sum }_{i}^{m}}\rho _{i}[t] = m\) and
Together with (6) and β[t] → 0, it yields for all i
Thus, for each i, there exists a T_{i} > 1 such that \(\frac {\beta [t]}{\rho _{i}[t]} \leq \frac { {\gamma _{i}{\tau _{i}^{2}}}}{8A_{i}^{2}}\), for all t ≥ T_{i}.
Since the function ϕ_{i}(λ) defined in (4) is γ_{i}strongly concave, and its gradient ∇ϕ_{i}(λ) is Lipschitz continuous with constant A_{i}/τ_{i}, thus, by Lemma 1, there is a compact set V_{i} and a finite T_{i} > 1 such that, for all t ≥ T_{i},
where \(R_{i}(\gamma _{i})=\max _{\mathbf {v}\in V_{i}}\{\mathbf {v}+\frac {\gamma _{i}{\tau _{i}^{2}}}{8{A_{i}^{2}}}\nabla \phi _{i}(\mathbf {x}[t],\mathbf {v})\}+\frac {c\gamma _{i}{\tau _{i}^{2}}}{8{A_{i}^{2}}}\). Let T_{0} = max1≤i≤mT_{i}. Now, we divide t into two parts (t ≥ T_{0} and 1 ≤ t < T_{0}) to prove the boundedness of λ_{i}[t].
(i) By exploiting the mathematical induction, we will first prove that, for all t ≥ T_{0}
where \({ \widetilde {R}=\max \{\max _{j} R_{j}(\gamma _{j}), \max _{j}\lambda _{j}[T_{0}]\} }\). Clearly, if t = T_{0}, the relation (10) is true. Suppose it is true at some time t ≥ T_{0}. Then, by (9), we have for all i
where the last inequality uses the relation (10).
Next, in Lemma 4 of [20], we let v = ρ[t], P = W[t], and u be taken as the vector of the s th coordinates of the vectors 𝜃_{i}[t], i = 1,…, m, where the coordinate index s is arbitrary. By Lemma 4 of [20], we can get that each vector λ_{i}[t + 1] is a convex combination of the vector \(\frac {\mathbf {\theta }_{i}[t]}{\rho _{i}[t]}\), i.e.,
where Q[t] is a row stochastic matrix with entries \(Q_{ij}[t] = \frac {W_{ij}[t]\rho _{j}[t]}{\rho _{i}[t+1]}\). Due to the convexity of Euclidean norm ⋅, we further obtain
By (11) and (13), we have that for all i = 1,…, m and the time t + 1
Hence, the relation (10) indeed holds, for all t ≥ T_{0}.
(ii) We then prove that λ_{i}[t] is bounded upper for t = 1,…, T_{0} − 1. By (8), we have
Note that β[t] is nonincreasing in the above inequality and goes to 0 gradually as t →∞, there exists a constant S > 0 such that for all i = 1,…, m and t = 1,…, T_{0} − 1
Thus, together with (7) and (13), we can obtain that, for all t = 1,…, T_{0} − 1
where the second inequality holds by (15) and the fact that β[1] ≥ β[t] for t ≥ 1. Thus, using the preceding relation recursively for t = 1,…, T_{0} − 1, and the fact that the initial points 𝜃_{i}[0] is deterministic, we can conclude that there exists a finite constant \(\widetilde {S}>0\) such that
Let \(D=\max \{\widetilde {R},\widetilde {S}\}\). By (10) and (17), we can obtain that, for all t ≥ 1 and i = 1,…, m
□
In order to prove Theorems 2 and 3, we need to use the following result, which is a generalization of Lemma 8 in [4].
Lemma 2
Under the conditions of Theorem 1, forany\(\mathbf {\lambda }\in \mathbb {R}^{p}\)andt > 0,we have
Proof
We first prove that \(\overline {\mathbf {\theta }}[t]\) is bounded, for any t > 0. Since W[t] is a column stochastic matrix, we have 1^{⊤}y = 1^{⊤}W[t]y, for any vector \(\mathbf {y}\in \mathbb {R}^{m}\). By step 4 of Algorithm 1, we further have
From the definition of λ_{i}[t + 1] in step 6 of Algorithm 1, it gives rise to
Note that \({\sum }_{i=1}^{m}\rho _{i}[t]=m\), and ρ_{i}[t] > 0 for all t and i. Thus, by the result of Theorem 1, we have, for all i = 1,…, m and t ≥ 0,
Now we are beginning to prove the result of Lemma 2. From step 8 of Algorithm 1, we have
For any \(\mathbf {\lambda }\in \mathbb {R}^{p}\), the relation (18) gives rise to
By using the inequality \(({\sum }_{j=1}^{m}a_{j})^{2} \leq m{\sum }_{j=1}^{m}{a^{2}_{j}}\), we can obtain
Thus, we have, for all t ≥ 0
We now consider the last term in the righthand side of (19), it can rewritten as
By the CauchySchwarz inequality, we have
Since \(\mathcal {L}_{j}(\mathbf {x},\cdot )\) is γ_{j}strongly concave, we have, for any \(\lambda \in \mathbb {R}^{p}\)
By step 7 of Algorithm 1, for any \(\mathbf {x}_{j} \in \mathbb {R}^{n_{j}}\), we can get
Subtracting \(\mathcal {L}_{j}(\mathbf {x}_{j}[t+1],\mathbf {\lambda })\) in the above relation, we obtain
Together with (20), (21), (22), and (23) and the definition of \(\mathcal {L}(\mathbf {x},\lambda )\), we can obtain the desired result. □
Next, we prove Theorem 2.
Proof of Theorem 2
Let x = x^{∗} and λ = 0 in Lemma 2, we have
Using the definition of function \(\mathcal {L}(\mathbf {x},\lambda )\) and letting \( \gamma = {\sum }_{j=1}^{m}\gamma _{j}\), we can obtain
where the last inequality makes use of the strong concavity of \(\mathcal {L}({\mathbf {x}^{*}},\cdot )\). Thus, by (24) and (25), and then letting \(\beta [t]=\frac {q}{t}\), we have
Note that \(4\leq \frac {q\gamma }{m}\), it follows that
Multiplying the preceding relation by t(t + 1), we can see that, for all t ≥ 1
Summing up the above inequality from 1 to (T − 1) for all T ≥ 2 and rearranging the terms, it leads to
Dividing both sides by \(\frac {qT(T1)}{m}\) in the above relation, it yields
Note that, for all i and t, we get
Letting e_{i}[t] = β[t](A_{i}x_{i}[t + 1] −b_{i} − γ_{i}λ_{i}[t + 1]) with \(\beta [t]=\frac {q}{t}\), we have \(\mathbf {e}_{i}[t]_{1}\leq \frac {q B}{t}\) for all i and t, where \(B=\max _{1\leq i\leq m}\sqrt {p}(G_{i}+\gamma _{i}D)\). By applying Corollary 2 in [20], we can estimate the term \(\lambda _{j}[t+1]  \overline {\mathbf {\theta }}[t]\) in (26) as follows
Combining (27) with (26), we can get
Using the convexity of F, the definition of \(\widehat {\mathbf {x}}[T],\) and (28), the desired result can be obtained. □
Proof of Theorem 3
Let x = x^{∗} in Lemma 2, we can get
Considering the terms in the lefthand side of (29), we have
where the last inequality is due to the strong concavity of L(x^{∗},⋅). Further, by (30), we can deduce
Combining (29) with (31), and then letting \(\beta [t+1]=\frac {q}{t+1}\), we can obtain
Due to the fact that \(4 \leq \frac {q\gamma }{m}\), we can see that \(1\frac {q\gamma }{2m(t+1)} \leq 1\frac {2}{t+1}\). Thus, by the preceding inequality, we can obtain
Multiplying the above inequality by t(t + 1), and then summing up from 1 to T − 1, we have, for all t ≥ 1 and T ≥ 2
Dividing both sides by \(\frac {qT(T1)}{m}\) in the inequality above, it gives rise to
Note that \({\sum }_{j=1}^{m}A_{j}\mathbf {x}_{j}\mathbf {b}_{j}\) is linear with respect to x_{j}; thus, we have
By (33) and (34), we can obtain, for any \(\lambda \in \mathbb {R}^{p}\)
Maximizing the terms in the lefthand side of (35) with respect to λ and using the estimate (27), we can get the desired result. The proof is completed. □
Numerical experiments
Distributed optimization problems with coupled equality constraints have an interesting application on the network utility maximization (NUM) problem investigated in [22, 31, 32]. More specifically, a network is modeled as a set of links L with finite capacities C = (C_{l}, l ∈ L). They are shared by a set of sources S indexed by s. Each source s uses a set L(s) ⊂ L. Let S(l) = {s ∈ Sl ∈ L(s)} be the set of sources using link l. The set {L(s)} defines an L×S routing matrix A with entries given by A_{ls} = 1 if l ∈ L(s), A_{ls} = 0 otherwise. Each source s is associated with a utility function \(U_{s}: \mathbb {R}^{+} \rightarrow \mathbb {R}\), i.e., source s gains a utility U_{s}(x_{s}), when it sends data at rate x_{s} satisfying 0 ≤ m_{s} ≤ x_{s} ≤ M_{s}. Let I_{s} = [m_{s}, M_{s}]. Mathematically, the NUM problem is to determine the source rates that minimize the sum of disutilities with link capacity constraints [31]:
Note that the utility function U_{s} and constraint I_{s} are local and private, only known by the source s. Solving the NUM problem directly requires coordination among possibly all sources and is impractical in real networks. It is important to seek a distributed solution. In the following numerical experiments, we will utilize our proposed distributed method to solve the NUM problem.
For numerical simulations, the utility function is taken as U_{s}(x_{s}) = 20w_{s} log(x_{s} + 0.1) from [32]. Set C_{l} = 1 for all l ∈ L, and w_{s} = L(s)/L, m_{s} = 0, M_{s} = 1 for all s ∈ S. For the communicated weight matrix W[t], we first generate a pool of 20 weight matrices connecting random graphs, in which each weight matrix satisfies Assumption 2, then we choose a communication matrix at each time t. We take all the regularization parameters as the same with γ_{s} = 0.4, s ∈ S and the stepsize parameter as q = 10. We use MATLAB convex programming toolbox CVX to compute the solution x^{∗}. For our method and the compared algorithm, all the algorithms were terminated when all of the conditions below are satisfied at an iteration t: (i) maxs∈Sλ_{s}[t + 1] − λ_{s}[t]≤ 𝜖, (ii) maxl∈LAx[t + 1] − C≤ 𝜖, (iii) \(\max _{s\in S}\frac {U_{s}(x[t+1])U_{s}(x[t])}{U_{s}(x[t])} \leq \epsilon \), where we set 𝜖 = 0.01 in the simulations.
We first consider a simple logical topology with S = 3 and L = 2 [31], displayed as in Fig. 1. It follows from Fig. 1 that w_{1} = 1, w_{2} = 1, w_{3} = 1/2. Figure 2 shows the evolution of dual variables at the first 70 iterations. Clearly, all local dual variables λ_{s}, s = 1,2,3, agree on the same value at a short time with around 70 iterations. Figure 3 illustrates the evolution of each source rate x_{s}, s = 1,2,3. Source rate x_{1} and x_{2} can arrive at same value because the weight coefficients w_{1} = w_{2}. After 70 iterations, every source rate x_{s} can arrive approximately at the optimal solution. Figure 4 demonstrates the aggregated source rates that use Link 2 versus capacity limit of Link 2. It can observed from Fig. 4 that the aggregated source rates satisfy the constraint of Link 2 capacity appropriately. As shown in Fig. 5, the iterative values of disutility objective function g_{N}(x[t]) rapidly converge to the optimal value g_{N}(x^{∗}) .
To compare the performance of our proposed Alg. DRDGA with the existing dual decomposition distributed algorithm (Alg. CDDA) in [30], we next test a random generated problem NUM with sizes S = 20, L = 19 and report the comparisons on the constraint violations and objective function values. Figure 6 displays the evolution of the constraint violation Ax[t] − C. We can find that both algorithms can satisfy the linear equality constraints gradually. But, the convergence speedup of our Alg. DRDGA is faster than that of Alg. CDDA. Figure 7 illustrates that both algorithms can also converge to the optimal value. However, by comparisons, our Alg. DRDGA is convergent to the optimal value faster than Alg. CDDA.
Conclusion
This paper proposed a solution tool for distributed convex problems with coupling equality constraints. The proposed algorithm is implemented in timechanging directed networks. By resorting to regularize the Lagrangian function, the norm of dual variables can be bounded. The proposed method can reach a fast convergence rate with order O(lnT/T) under some conditions. Numerical example on the network utility maximization demonstrates the effectiveness of the proposed algorithm. As a future research, it is interesting to analyze the communication delays of the proposed distributed method in this paper.
References
 1.
Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multiagent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
 2.
Nedić, A., Ozdaglar, A., Parrilo, P.: Constrainted consensus and optimization in multiagent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)
 3.
Jakovetic, D., Xavier, J., Moura, J.M.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
 4.
Nedić, A., Olshevsky, A: Distributed optimization over timevaring directed graphs. IEEE Trans. Autom. Control 3(60), 601–615 (2015)
 5.
Johansson, B., Keviczky, T., Johansson, M., Johansson, K.H.: Subgradient methods and consensus algorithms for solving convex optimization problems. In Proc. IEEE CDC, pp. 4185–4190. Cancun (2008)
 6.
Baingana, B., Mateos, G., Giannakis, G.: Proximalgradient algorithms for tracking cascades over social networks. IEEE J. Selected Topics Signal Process. 8(4), 563–575 (2014)
 7.
Mateos, G., Giannakis, G.: Distributed recursiveleastsquares: Stability and performance analysis. IEEE Trans. Signal Process. 60(7), 3740–3754 (2012)
 8.
Bolognani, S., Carli, R., Cavraro, G., Zampieri, S.: Distributed reactive power feedback control for voltage regulation and loss minimization. IEEE Trans. Autom. Control 60(4), 966–981 (2015)
 9.
Zhang, Y., Giannakis, G.: Distributed stochastic market clearing with highpenetration wind power and largescale demand response. IEEE Trans. Power Syst. 31(2), 895–906 (2016)
 10.
Martinez, S., Bullo, F., Cortez, J., Frazzoli, E.: On synchronous robotic networksPart I: Models, tasks, and complexity. IEEE Trans. Autom. Control 52(12), 2199–2213 (2007)
 11.
Tsitsiklis, J.N., Bertsekas, D.P., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)
 12.
Ram, S.S., Nedić, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147(3), 516–545 (2010)
 13.
Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)
 14.
Zhu, M., Martinez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 151–163 (2012)
 15.
Li, J., Wu, C., Wu, Z., Long, Q.: Gradientfree method for nonsmooth distributed optimization. J. Glob. Optim. 61(2), 325–340 (2015)
 16.
Lorenzo, P., Scutari, G.: Netx: Innetwork nonconvex optimization. IEEE Trans. Signal Inf. Process. Netw. 2(2), 120–136 (2016)
 17.
Li, J., Chen, G., Dong, Z., Wu, Z.: Distributed mirror descent method for multiagent optimization with delay. Neurocomputing 177, 643–650 (2016)
 18.
Gharesifard, B., Cortes, J.: Distributed continuoustime convex optimization on weightbalanced digraphs. IEEE Trans. Autom. Control 59(3), 781–786 (2012)
 19.
Tsianos, K.I., Lawlor, S., Rabbat, M.G.: Consensusbased distributed optimization: Practical issues and applications in largescale machine learning. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1543–1550. IEEE (2012)
 20.
Nedić, A., Olshevsky, A: Stochastic gradientpush for strongly convex functions on timevarying directed graphs. IEEE Trans. Autom. Control 12(61), 3936–3947 (2016)
 21.
Bertsekas, D.P., Nedić, A., Ozdaglar, A.E.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
 22.
Necoara, I., Suykens, J.A.: Application of smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)
 23.
Li, J., Chen, G., Dong, Z., Wu, Z.: A fast dual proximalgradient method for separable convex optimization with linear coupled constraints. Comput. Optim. Appl. 64(3), 671–697 (2016)
 24.
Yuan, D., Xu, S., Zhao, H.: Distributed primaldual subgradient method for multiagent optimization via consensus algorithms. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 41(6), 1715–1724 (2011)
 25.
Aybat, N.S., Hamedani, E.Y.: Distributed primaldual method for multiagent sharing problem with conic constraints. In: 2016 50th Asilomar Conference on Signals, Systems and Computers, pp. 777–782. IEEE (2016)
 26.
Aybat, N.S., Hamedani, E.Y.: A distributed ADMMlike method for resource sharing under conic constraints over timevarying networks. arXiv:1611.07393 (2016)
 27.
Chang, T.H., Nedić, A., Scaglione, A.: Distributed constrained optimization by consensusbased primaldual perturbation method. IEEE Trans. Autom. Control 59(6), 1524–1538 (2014)
 28.
Yuan, D., Ho, D.W.C., Xu, S.: Regularized primaldual subgradient method for distributed constrained optimization. IEEE Trans. Cybern. 46(9), 2109–2118 (2016)
 29.
Khuzani, M.B., Li, N.: Distributed regularized primaldual method: Convergence analysis and tradeoffs. arXiv:1609.08262v3 (2017)
 30.
Falsone, A., Margellos, K., Garetti, S., Prandini, M.: Dual decomposition and proximal minimization for multiagent distributed optimization with coupling constraints. Automatica 84, 149–158 (2017)
 31.
Low, S.H., Lapsley, D.E.: Optimization flow control. I. Basic algorithm and convergence. IEEE/ACM Trans. Network. 7, 861–874 (1999)
 32.
Beck, A., Nedić, A., Ozdaglar, A., Teboulle, M.: An O(1/k) Gradient method for network resource Aalocation problems. IEEE Trans. Cont. Net. Sys 1(1), 64–73 (2014)
 33.
Gu, C., Wu, Z., Li, J., Guo, Y.: Distributed convex optimization with coupling constraints over timevarying directed graphs. arXiv:1805.07916 (2018)
Funding
This research was partially supported by the NSFC 11501070, 11671362 and 11871128, by the Natural Science Foundation Projection of Chongqing cstc2017jcyjA0788 and cstc2018jcyjAX0172, and the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN201800520).
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gu, C., Wu, Z. & Li, J. Regularized dual gradient distributed method for constrained convex optimization over unbalanced directed graphs. Numer Algor 84, 91–115 (2020). https://doi.org/10.1007/s11075019007462
Received:
Accepted:
Published:
Issue Date:
Keywords
 Convex optimization
 Distributed algorithm
 Dual decomposition
 Regularization
 Multiagent network