Verified Learning Without Regret
Abstract
Multiplicative Weights (MW) is a simple yet powerful algorithm for learning linear classifiers, for ensemble learning à la boosting, for approximately solving linear and semidefinite systems, for computing approximate solutions to multicommodity flow problems, and for online convex optimization, among other applications. Recent work in algorithmic game theory, which applies a computational perspective to the design and analysis of systems with mutually competitive actors, has shown that noregret algorithms like MW naturally drive games toward approximate Coarse Correlated Equilibria (CCEs), and that for certain games, approximate CCEs have bounded cost with respect to the optimal states of such systems.
In this paper, we put such results to practice by building distributed systems such as routers and load balancers with performance and convergence guarantees mechanically verified in Coq. The main contributions on which our results rest are (1) the first mechanically verified implementation of Multiplicative Weights (specifically, we show that our MW is no regret) and (2) a languagebased formulation, in the form of a DSL, of the class of games satisfying Roughgarden smoothness, a broad characterization of those games whose approximate CCEs have cost bounded with respect to optimal. Composing (1) with (2) within Coq yields a new strategy for building distributed systems with mechanically verified complexity guarantees on the time to convergence to nearoptimal system configurations.
Keywords
Multiplicative weights Algorithmic game theory Smooth games Interactive theorem proving Coq1 Introduction
The Multiplicative Weights algorithm (MW, [1, 25]) solves the general problem of “combining expert advice”, in which an agent repeatedly chooses which action, or “expert”, to play against an adaptive environment. The agent, after playing an action, learns from the environment both the cost of that action and of other actions it could have played in that round. The environment, in turn, may adapt in order to minimize environment costs. MW works by maintaining a weighted distribution over the action space, in which each action initially has equal weight, and by updating weights with a linear or exponential loss function to penalize poorly performing actions.
MW is a noregret algorithm: its expected cost approaches that of the best fixed action the agent could have chosen in hindsight (i.e., external regret tends to zero) as time \(t \rightarrow \infty \). Moreover, this simple algorithm performs remarkably well: in number of rounds logarithmic in the size of the action space, MW’s expected regret can be bounded by a small constant \(\epsilon \) (MW has bounded external regret). In [1], Arora, Hazan, and Kale showed that MW has wideranging connections to numerous problems in computer science, including optimization, linear and semidefinite programming, and machine learning (cf. boosting [14]).
Our work targets another important application of MW: the approximate solution of multiagent games, especially as such games relate to the construction of distributed systems. It is well known (cf. [30, Chapter 4]) that noregret algorithms such as MW converge, when played by multiple independent agents, to a large equilibrium class known as Coarse Correlated Equilibria (CCEs). CCEs may not be socially optimal, but for some games, such as Roughgarden’s smooth games [35], the social cost of such equilibrium states can be bounded by a constant factor of the optimal cost of the game (the game has bounded Price of Anarchy, or POA). Therefore, to drive the social cost of a smooth game to near optimal, it suffices simply to let each agent play a noregret algorithm such as MW.
Moreover, a number of distributed systems can be encoded as games, especially when the task being distributed is viewed as an optimization problem. Consider, for example, distributed balancing of network flows over a set of web servers, an application we return to in Sect. 3. Assuming the set of flows is fixed, and that the cost of (or latency incurred by) assigning a flow to a particular web server increases as a function of the number of flows already assigned to that server (the traffic), then the load balancing application is encodable as a game in which each flow is a “player” attempting to optimize its cost (latency). An optimal solution of this game minimizes the total latency across all flows. Since the game is Roughgarden smooth (assuming affine cost functions), the social cost of its CCEs as induced by letting each player independently run MW is bounded with respect to that of an optimal solution.
1.1 Contributions
In this paper, we put such results to work by building the first verified implementation of the MW algorithm – which we use to drive all games to approximate CCEs – and by defining a languagebased characterization of a subclass of games called Roughgarden smooth games that have robust Price of Anarchy guarantees extending even to approximate CCEs. Combining our verified MW with smooth games, we construct distributed systems for applications such as routing and load balancing that have verified convergence and correctness guarantees.

a new architecture, as embodied in the Cage system (https://github.com/gstew5/cage), for the construction of distributed systems with verified complexity guarantees, by composition of verified Multiplicative Weights (MW) with robust Price of Anarchy bounds via Roughgarden smoothness;

the first formally verified implementation of the MW algorithm;

a languagebased characterization of Roughgarden smooth games, in the form of a mechanized DSL for the construction of such games together with smoothness preservation theorems showing that each combinator in the language preserves smoothness;

the application of the resulting system to distributed routing and load balancing.
By verified, we mean our MW implementation has mechanically checked convergence bounds and proof of correctness within an interactive theorem prover (specifically, Ssreflect [16], an extension of the Coq [5] system). By convergence and correctness, we mean that we prove both that MW produces the right answer (functional correctness with respect to a highlevel functional specification), but also that it does so with external regret^{1} bounded by a function of the number of iterations of the protocol (convergence). Convergence of MW in turn implies convergence to an approximate CCE. By composing this second convergence property with Roughgarden smoothness, we bound the social, or total, cost of the resulting system state with respect to the optimal.
As we’ve mentioned, MW has broad application across a number of subdisciplines of computer science, including linear programming, optimization, and machine learning. Although our focus in this paper is the use of MW to implement noregret dynamics, a general strategy for computing the CCEs of multiagent games, our implementation of MW (Sect. 5.3) could be used to build, e.g., a verified LP solver or verified implementation of boosting as well.
Limitations. The approach we outline above does not apply to all distributed systems, nor even to all distributed systems encodable as games. In particular, in order to prove POA guarantees in our approach, the game encoding a particular distributed system must first be shown Roughgarden smooth, a condition which does not always apply (e.g., to network formation games [35, Section 2]). More positively, the Smooth Games DSL we present in Sects. 3 and 4 provides one method by which to explore the combinatorial nature of Roughgarden smoothness, as we demonstrate with some examples in Sect. 3.
Relationship to Prior Work. Some of the ideas we present in this paper previously appeared in summary form in a 3page brief announcement at PODC 2017 [4]. The current paper significantly expands on the architecture of the Cage system, our verified implementation of Multiplicative Weights, the definition of the Smooth Games DSL, and the composition theorems of Sect. 6 proving that the pieces fit together to imply systemwide convergence and quality bounds.
1.2 Organization
The following section provides background on games, algorithmic game theory, and smoothness. Section 3 presents an overview of the main components of the Cage approach, via application to examples. Section 4 provides more detail on the combinators of our Smooth Games DSL. Section 5 presents our verified implementation of MW. Section 6 describes the composition theorems proving that multiagent MW converges to nearoptimal \(\epsilon \)CCEs. Sections 7 and 8 present related work and conclude.
2 Background
2.1 Games
Von Neumann, Morgenstern, and Nash [28, 29] (in the US) and Bachelier, Borel, and Zermelo [3, 8, 43] (in Europe) were the first to study the mathematical theory of strategic interaction, modern game theory. Nash’s famous result [27] showed that in all finite games, mixedstrategy equilibria (those in which players are allowed to randomize) always exist. Since the 1950s, game theory has had huge influence in numerous fields, especially economics.
A state \(s : A_1\times A_2\times \ldots \times A_N\) is a Pure Nash Equilibrium (PNE) when no player \(i\in [1,N]\) has incentive to change its strategy: \(\forall s'_i.\ C_i(s)\le C_i(s'_i, s_{i})\). Here \(s'_i\) is an arbitrary strategy. Strategy \(s_i\) is player i’s move in state s. By \(s'_i, s_{i}\) we denote the state in which player i’s strategy is \(s'_i\) and all other players play s. In other words, no player can decrease its cost by unilateral deviation.
Purestrategy Nash equilibria do not always exist. Mixed Nash Equilibria (MNE), which do exist in all finite games, permit players to randomize over the strategy space, by playing a distribution \(\sigma _i\) over A. The overall state is the product distribution over the player distributions. Every PNE is trivially an MNE, by letting players choose deterministic distributions \(\sigma _i\).
Correlated Equilibria (CEs) generalize MNEs to situations in which players coordinate via a trusted third party. In what follows, we’ll mostly be interested in a generalization of CEs, called Coarse Correlated Equilibria (CCEs), and their approximate relaxations. Specifically, a distribution \(\sigma \) over \(A^N\) (Listing 2) is a CCE when \(\forall i\forall s'_i.\ \mathbb {E}_{s\sim \sigma }[C_i(s)] \le \mathbb {E}_{s\sim \sigma }[C_i(s'_i, s_{i})] \). \( \mathbb {E}_{s\sim \sigma }[C_i(s)] \) is the expected cost to player i in distribution \(\sigma \). The CCE condition states that there is no \(s'_i\) that could decrease player i’s expected cost. CCEs are essentially a relaxation of MNEs which do not require \(\sigma \) to be a product distribution (i.e., the players’ strategies may be correlated). CEs are a subclass of CCEs in which \( \mathbb {E}_{s\sim \sigma }[C_i(s'_i, s_{i})] \) may be conditioned on \(s_i\).
2.2 Algorithmic Game Theory
 1.
How good equilibrium states are with respect to the optimal configurations of a game. By optimal, we usually mean states \(s^*\) that optimize the social cost: \(\forall s.\ \sum _i C_i(s^*)\le \sum _i C_i(s)\).
 2.
How “easy” (read computationally tractable) it is to drive competing players of the game toward an equilibrium state.
Algorithmic game theory and the related fields of mechanism design and distributed optimization provide excellent tools here.
Good Equilibria. The Price of Anarchy, or POA, of game (A, C) quantifies the cost of equilibrium states of (A, C) with respect to optimal configurations. Precisely, define POA as the ratio of the social cost of the worst equilibrium s to the social cost of an optimal state \(s^*\). POA near 1 indicates highquality equilibria: finding an equilibrium in such a game leads to overall social cost close to optimal. Prior work in algorithmic game theory has established nontrivial POA bounds for a number of game classes: on various classes of congestion and routing games [2, 6, 10], on facility location games [40], and others [11, 32].
In the system of Sect. 3, we use the related concept of Roughgarden smooth games [35], or simply smooth games, which define a subclass of games with canonical POA proofs. To each smooth game are associated two constants, \(\lambda \) and \(\mu \). The precise definition of the smoothness condition is less relevant here than its consequences: if a costminimization game is \((\lambda ,\mu )\)smooth, then it has POA \(\lambda / (1  \mu )\). Not all games are smooth, but for those that are, the POA bound above extends even to CCEs and their approximations, a particularly large (and therefore tractable) class of equilibria [35, Sects. 3 and 4].
Tractable Dynamics. Good equilibrium bounds are most useful when we know how quickly a particular game converges to equilibrium [7, 9, 12, 13, 17]. Certain classes of games, e.g. potential games [26], reach equilibria under a simple model of dynamics called best response. As we’ve mentioned, we use a different distributed learning algorithm in this work, variously called Multiplicative Weights (MW) [1] or sometimes Randomized Weighted Majority [25], which drives all games to CCEs, a larger class of equilibrium states than those achieved by potential games under best response.
3 Cage by Example
3.1 Overview
At the top, we have a domainspecific language in Coq (DSL, box 1) that generates games with automatically verified POA bounds. To execute such games, we have verified (also in Coq) an implementation of the Multiplicative Weights algorithm (MW, 2). Correctness of MW implies convergence bounds on the games it executes: \(O((ln\ A)/\epsilon ^2)\) iterations suffice to drive the game to an \(\epsilon \)CCE (here, A is the size of the action space, or game type, A).
We compose N instances of multiplicative weights (4), one per agent, with a server (3) that facilitates communication, implemented in OCaml and modeled by an operational semantics in Coq. To actually execute games, we use Coq’s code extraction mechanism to generate OCaml code that runs clients against the server, using an unverified OCaml shim to send and receive messages. We prove performance guarantees in Coq from POA bounds on the game and from the regret bound on MW.
3.2 Smooth Games DSL

the bias combinator \(\mathsf {Bias}(A,b) \), which adds the fixed value b to the cost function associated with game A;

the scalar combinator \(\mathsf {Scalar}(A,m) \), which multiplies the output of the cost function C associated with game A by a fixed value m;

the product combinator \(A\times B \), corresponding to the parallel composition of two games A and B with cost equal to the sum of the costs in the two games;

the subtype game \(\{x : A,\;P(x)\} \), which constructs a new game over the dependent sum type \(\varSigma x : A. P(x)\) (values x satisfying the predicate P);

the singleton game \(\mathsf {Singleton}(A)\), which has cost 1 if if player i “uses” the underlying resource (\(\mathbb {B}_{\mathsf {Resource}}(f\;i) = \mathsf {true}\)), and 0 otherwise. The function \(\mathbb {B}_{}()\) generalizes the notion of resource usage beyond the primitive \(\mathsf {Resource}\) game. For example, \(\mathbb {B}_{\mathsf {Scalar}(A,m)}(x) = \mathbb {B}_{A}(x)\): usage in a game built from the scalar combinator reduces to usage in the underlying game.
3.3 Example: Distributed Routing
We illustrate the Smooth Games DSL with an example: distributed routing over networks with affine latency functions (Fig. 2). This game is known to have POA 5/2 [35].
3.4 Example: Load Balancing
As a second example, consider the load balancing game depicted in Fig. 3, in which a number of network flows are distributed over several servers with affine cost functions. In general, N load balancing agents are responsible for distributing M flows over K servers. The cost of allocating a flow to a server is modeled by an affine cost function which scales in the total load (number of flows) on that server. Like routing, the load balancing game has POA 5/2. This is no coincidence; both are special cases of “finite congestion games”, a class of games which have POA 5/2 when costs are linear [10]. The connection between them can be seen more concretely by observing that they are built up from the same primitive \(\mathsf {Resource}\) game.
We model the system as an NMplayer Kresource game in which each player corresponds to a single network flow. Each load balancing agent poses as multiple players (MW instances) in the game, one per flow, and composes the actions chosen by these players to form its overall strategy. The result of running the game is an approximate CCE with respect to the distribution of flows over servers.
4 Smooth Games
Roughgarden smoothness [35] characterizes a subclass of games with canonical Price of Anarchy (POA) proofs. In [35], Roughgarden showed that smooth games have canonical POA bounds not only with respect to pure Nash equilibria but also with respect to mixed Nash equilibria, correlated equilibra, CCEs, and their approximate relaxations. In the context of Cage, we use smoothness to bound the social cost of games executed by multiple clients each running MW. We show how the technical pieces fit together, in the form of bounds on an operational semantics of the entire Cage system, in Sect. 6. This section introduces the technical definition of smoothness and the language of combinators, or Smooth Games DSL of Sect. 3, that we use to build games that are smooth by construction.
Definition 1
Here, \( C_i(s^*_i, s_{i})\) denotes the individual cost to player i in the mixed state where all other players follow their strategies from s, while player i follows the corresponding strategy from \(s^*\). Smooth games bound the individual cost of players’ unilateral deviations from state s to \(s^*\) by the weighted social costs of s and \(s^*\). In essence, when \(\lambda \) and \(\mu \) are small, the effect of any single player’s deviation from a given state has minimal effect.
4.1 Combinators
Figure 4 lists the syntax and combinators of the Smooth Games DSL we used in Sect. 3 to build smooth routing and load balancing games.
The proofs of smoothness for other combinators are straightforward. For example, since \(\mathsf {Unit}\) games always have cost 0, all values of \(\lambda \) and \(\mu \) satisfy the smoothness inequality: \(0 \le \lambda \cdot 0 + \mu \cdot 0\). We restrict the range of the cost function in \(\mathsf {SingletonSmooth}\) games to \(\{0, 1\}\) by applying the function \(\mathbb {B}_{A}(\cdot )\), which generalizes the notion of “using a resource” to all the game types of Fig. 4. Smoothness of the \(\mathsf {Singleton}\) game follows by case analysis on the results of \(\mathbb {B}_{A}(\cdot )\) in the states s and \(s^*\) of the smoothness inequality. The games produced by the \(\mathsf {SigmaSmooth}\) combinator have costs equal to those of the underlying games but restrict the domain to those states satisfying a predicate P. Since smoothness of the underlying bound holds for all states in A, the same bound holds of the restricted domain of states \(a \in A\) drawn from P. Smoothness of product games relies on the fact that smoothness still holds if \(\lambda \) and \(\mu \) are replaced with larger values. Thus, each of the argument games to \(\mathsf {ProductSmooth}\) is \(({\mathsf {max}(\lambda _A,\lambda _B)}, {\mathsf {max}(\mu _A,\mu _B)})\)smooth. The overall product game, which sums the costs of its argument games, is \(({\mathsf {max}(\lambda _A,\lambda _B)}, {\mathsf {max}(\mu _A,\mu _B)})\)smooth as well.
It’s possible to derive combinators from those defined in Fig. 4. For example, define as \(\mathsf {Affine} (m,b,A)\) the game with cost function \(mx + b\). We implement this game as \(\{p : \mathsf {Scalar}(m,A) \times \mathsf {Scalar}(b,\mathsf {Singleton}(A)),\;p.1=p.2\} \), or the subset of product games over the scalar game \(\mathsf {Scalar}(m,A) \) and the \(\{0, 1\}\) scalar game over b such that the first and second projections of each strategy p are equal.
5 Multiplicative Weights (MW)
5.1 The Algorithm
The MW algorithm (Fig. 5) pits a client, or agent, against an adaptive environment. The agent maintains a weight distribution w over the action space, initialized to give each action equal weight. At each time step \(t \in [1\ldots T]\), the agent commits to the distribution \(w_t / \sum _{a\in A} w_t(a)\), communicating this mixed strategy to the environment. After receiving a cost vector \(c_t\) from the environment, the agent updates its weights \(w_{t+1}\) to penalize highcost actions, at a rate determined by a learning constant \(\eta \in (0,1/2]\). High \(\eta \) close to 1/2 leads to higher penalties, and thus relatively less exploration of the action space.
The environment is typically adaptive, and may be implemented by a number of other agents also running instances of MW. The algorithm proceeds for a fixed number of epochs T, or until some bound on expected external regret (expected cost minus the cost of the best fixed action) is achieved. In what follows, we always assume that costs lie in the range \([1, 1]\). Costs in an arbitrary but bounded range are also possible (with a concomitant relaxation of the algorithm’s regret bounds), as are variations of MW to solve payoff maximization instead of cost minimization.
5.2 MW Is No Regret
The MW algorithm converges reasonably quickly: To achieve expected regret at most \(\epsilon \), it’s sufficient to run the algorithm \(O((ln\ A)/\epsilon ^2)\) iterations, where A is the size of the action space [36, Chapter 17]. Regret can be driven arbitrarily small as the number of iterations approaches infinity. Bounded regret suffices to prove convergence to an approximate CCE, as [36] also shows.
In this section, we present a highlevel sketch of the proof that MW is no regret. We follow [36, Chapter 17], which has additional details. At the level of the mathematics, our formal proof makes no significant departures from Roughgarden.
Definition 2
Theorem 1
(MW Has Bounded Regret). The algorithm of Fig. 5 has expected perstep external regret at most \(\eta + ln\;A\ /\ \eta T\).
Proof Sketch
The proof of Theorem 1 uses a potentialfunction argument, with potential \(\varPhi _t\) equal the sum of the weights \(\varGamma _t = \sum _{a \in A} w_t(a)\) at time t. It proceeds by relating the cumulative expected cost \(\sum _{t} \zeta _t\) of the algorithm to \( OPT \), the cost of the best fixed action, through the intermediate quantity \(\varGamma _{T+1}\).
By letting \(\eta = \sqrt{ln\;A\; /\; T}\) (cf. [36, Chapter 17]), it’s possible to restate the regret bound of Theorem 1 to the following arguably nicer bound:
Corollary 1
5.3 MW Architecture
Our implementation and proof of MW (Fig. 6) were designed to be extensible. At a high level, the proof structure follows the program refinement methodology, in which a highlevel mathematical but inefficient specification of MW (HighLevel Functional Specification) is gradually made more efficient by a series of refinements to various features of the program (for example, by replacing an inefficient implementation of a keyvalue map with a more efficient balanced binary tree).
For each such refinement, we prove that every behavior of the lowerlevel program is a possible behavior of the higherlevel program it refines. Thus specifications proved for all behaviors of the highlevel program also apply to each behavior at the low level. By behavior here, we mean the trace of action distributions output by MW as it interacts with, and receives cost vectors from, the environment.
We factor the lower implementation layers (Medium and Low) into an interpreter and operational semantics over a domainspecific language specialized to MWstyle algorithms (MW DSL). The DSL defines commands for maintaining and updating weights tables as well as commands for interacting with the environment. We prove, for any DSL program c, that the interpretation of that program refines its behavior with respect to the smallstep operational semantics (Medium). Our overall proof specializes this general refinement to an implementation of MW as a command in the DSL, in order to relate that command’s interpreted behavior to the highlevel functional specification.
5.4 MW DSL
The syntax and semantics of the MW DSL are given in Fig. 7. The smallstep operational semantics (\(\vdash c, \sigma \Rightarrow c', \sigma '\)) is parameterized by an environment oracle that defines functions for sending action distributions to the environment (\(\mathsf {oracle}\_\mathsf {send}\)) and for receiving the resulting cost vectors (\(\mathsf {oracle}\_\mathsf {recv}\)). The oracle will in general be implemented by other clients also running MW (Sect. 6) but is left abstract here to facilitate abstraction and reuse. The oracle is stateful (the type T, of oracle states, may be updated both by \(\mathsf {oracle}\_\mathsf {send}\) and \(\mathsf {oracle}\_\mathsf {recv}\)).
Most of the operational semantics rules are straightforward. In the MWStepWeights rule for updating the state’s weights table, we make use of an auxiliary expression evaluation function \(E_{}[]\) (standard and therefore not shown in Fig. 7). The only other interesting rules are those for \(\mathsf {send}\) and \(\mathsf {recv}\), which call \(\mathsf {oracle\_send}\) and \(\mathsf {oracle\_recv}\) respectively. In the relation \(\mathsf {oracle\_recv}\), the first two arguments are treated as inputs (the input oracle state of type T and the channel) while the second two are treated as outputs (the cost vector of type \(A\rightarrow \mathbb {Q}\) and the output oracle state). In the relation \(\mathsf {oracle\_send}\), the first three arguments are inputs while only the last (the output oracle state) is an output.
After \(\mathsf {recv}\), \(\mathsf {mult\_weights\_body}\) implements an update to its weights table as defined by the command: \(\mathsf {update} \,\,\textsf {(}\lambda a \,\,\textsf {:} \,\,A \Rightarrow \mathsf {weight}\,\, a \,*\, \textsf {(1}  \eta \,\,\mathsf {*} \,\,\mathsf {cost} \,\, a\textsf {)}\textsf {)}\). As an argument to the \(\mathsf {update}\), we embed a function from actions \(a \in A\) to expressions that defines how the weight of each action a should change at this step (time \(t+1\)). The expressions \(\mathsf {weight} \,\,a\) and \(\mathsf {cost}\,\, a\) refer to the weight and cost, respectively, of action a at time t. The anonymous function term is defined in SsreflectCoq, the metalanguage in which the MW DSL is defined.
5.5 Interpreter
At the level of \(\mathsf {cstate}\)s, we use efficient purely functional data structures such as AVL trees. For example, the type \(\mathsf {M.t}\,\, \mathbb {Q}\) denotes an AVLtree map from actions A to rational numbers \(\mathbb {Q}\). In the smallstep semantics state, by contrast, we model the weights table not as a balanced binary tree but as a SsreflectCoq finite function, of type \(\mathsf {\{ffun}\,\,A \rightarrow \mathbb {Q}\mathsf {\}}\), which directly maps actions of type A to values of type \(\mathbb {Q}\).
To speed up computation on rationals, we use a dyadic representation \(q = \frac{n}{2^d}\), which facilitates fast multiplication. We do exact arithmetic on dyadic \(\mathbb {Q}\) instead of floating point arithmetic to avoid floatingpoint precision error. Verification of floatingpoint error bounds is an interesting but orthogonal problem (cf. [31, 34]).
5.6 Proof
Adaptive Vs. Oblivious Adversaries. In our highlevel specification of MW, we parameterize functions like \(\mathsf {weights\_of}\) by a fixed sequence of cost vectors cs rather than model interaction with the environment, as is done in Fig. 5. An execution of our lowlevel interpreted MW, even against an adaptive adversary, is always simulatable by the highlevel functional specification by recording in the lowlevel execution the cost vectors produced by the adversary, as is done by the \(\mathsf {SPrevCosts}\) field (Sect. 5.5), and then passing this sequence to \(\mathsf {weights\_of}\). This strategy is quite similar to using backward induction to solve the MW game for an oblivious adversary.
To prove that the operational semantics correctly refines our highlevel functional specification of MW (and therefore satisfies the regret bounds given at the start of Sect. 5.6), we prove a similar series of refinements. Since backward simulations compose transitively, we prove regret bounds on our interpreted MW just by composing the refinements in series. The bounds we prove in this way are parametric in the environment oracle with which MW is instantiated. When the oracle state types differ from source to target in a particular simulation, as is the case in our proof that the MW DSL interpreter refines the operational semantics, we require that the oracles simulate as well.
6 Coordinated MW
A system of multiple agents each running MW yields an \(\epsilon \)CCE of the underlying game. If the game being played is smooth – for example, it was built using the combinators of the Smooth Games DSL of Sect. 4 – then the resulting \(\epsilon \)CCE has bounded social cost with respect to a globally optimal strategy. In this section, we put these results together by (1) defining an operational semantics of distributed interaction among multiple clients each running MW, and (2) proving that distributed executions of this semantics yield nearoptimal solutions, as long as the underlying game being played is smooth.
6.1 Machine Semantics
States of the coordinated machine (type \(\mathsf {machine\_state} \,\,N\,\, A\)) map client indices in range \([0..N1]\) to client states (type \(\mathsf {client\_state} \,\, A\)). Machine states also record, at each iteration of the distributed MW protocol, the history of distributions received from the clients in that round (type \(\mathsf {seq}\,\, \textsf {(}[0..N1] \rightarrow \mathsf {dist}\,\, A\textsf {)}\)), which will be used to prove Price of Anarchy bounds in the next section (Sect. 6.2). We say that \(\mathsf {all\_clients\_have\_sent}\) in a particular machine state m, committing to the set of distributions f, if each client’s \(\mathsf {received}\) buffer is empty and its \(\mathsf {sent}\) buffer contains the distribution \(f_i\), of type \(\mathsf {dist}\,\, A\).
The machine step relation models a server–client protocol, distinguishing server steps (\(\mathsf {ServerStep}\)) from client steps (\(\mathsf {ClientStep}\)). Client steps, which run commands in the language of Fig. 7, may interleave arbitrarily. Server steps are synchronized by the \(\mathsf {all\_clients\_have\_sent}\) relation to run only after all clients have completed the current round. The work done by the server is modeled by the auxiliary relation \(\mathsf {server\_sent\_cost\_vector}\ i\ f\ m\ m'\), which constructs and sends to client i the cost vector derived from the set of client distributions f. The relation \(\sigma \ \sim _{\mathcal {O}}\ \sigma '\) states that \(\sigma \) and \(\sigma '\) are equal up to their \(\mathsf {SOracleSt}\) components.
In the distributed MW setting, the cost to player i of a particular action a : A is defined as the expected value, over all Nplayer strategy vectors p in which player i chose action a (\(p_i=a\)), of the cost to player i of p, with the expectation over the \((N1)\)size product distribution induced by the players \(j\ne i\).
6.2 Convergence and Optimality
 1.
Show that noregret clients implementing MW are still no regret when interleaved in the distributed semantics of Fig. 8.
 2.
Prove that perclient regret bounds – one for each client running MW – imply systemwide convergence to an \(\epsilon \)CCE.
 3.
Use POA results for smooth games from Sect. 4 to bound the cost, with respect to that of an optimal state, of all such \(\epsilon \)CCEs.
Composing 1, 2, and 3 proves that the distributed machine of Fig. 8 – when instantiated to clients running MW – converges to nearoptimal solutions to smooth games. We briefly describe each part in turn.
Part 1: Noregret clients are still no regret when interleaved. That MW noregret bounds lift to an MW client running in the context of the distributed operational semantics of Fig. 8 follows from the oracular structure of our implementation of MW (Sect. 5) – clients interact with other clients and with the server only through the oracle.
In particular, for any execution \(\vdash m \Longrightarrow ^+ m'\) of the machine of Fig. 8, and for any client i, there is a corresponding execution of client i with respect to a small nondeterministic oracle that simply “guesses” which cost vector to supply every time the MW client executes a \(\mathsf {recv}\) operation. Because MW is no regret for all possible sequences of cost vectors, proving a refinement against the nondeterministic oracle implies a regret bound on client i’s execution from state \(m_i\) to state \(m'_i\).
We assume that the history is empty in the initial state (\(\mathsf {hist}\,\, m = \mathsf {nil}\)), and that at least one round was completed (\(\mathsf {0} < \mathsf {size}\,\, \textsf {(}\mathsf {hist}\,\, m'\))). By \(\mathsf {final\_state}\,\, m'\), we mean that all clients have synchronized with the server (by receiving a cost vector and sending a distribution) and then have terminated in \(\mathsf {CSkip}\). All clients in state m are initialized to execute \(\mathsf {T}\) steps of MW over game A (\(\mathsf {mult\_weights}\,\, A\,\, \mathsf {T}\)), from an initial state and initial \(\mathsf {ClientPkg}\).
Part 2: Systemwide convergence to an \(\epsilon \)CCE. The machine semantics of Fig. 8 converges to an approximate Coarse Correlated Equilibrium (\(\epsilon \)CCE).
Part 3 Systemwide regret bounds. The machine semantics of Fig. 8 converge to a state with expected cost bounded with respect to the optimal cost.
7 Related Work
Reinforcement Learning, Bandits. There is extensive work on reinforcement learning [39], multiagent reinforcement learning (MARL [19]), and multiarmed bandits (MAB, [15]), more than can be cited here. We note, however, that Qlearning [41], while similar in spirit to MW, addresses the more general scenario in which an agent’s action space is modeled by an arbitrary Markov Decision Process (in MW, the action space is a single set A). Our verified MW implementation is most suitable, therefore, for use in the fullinformation analog of MAB problems, in which actions are associated with “arms” and each agent learns the cost of all arms – not just the one it pulled – at each time step. In this domain, MW has good convergence bounds, as we prove formally of our implementation in this paper. Relaxing our verified MW and formal proofs to the partial information Bandit setting is interesting future work.
Verified Distributed Systems. EventML [33] is a domainspecific language for specifying distributed algorithms in the Logic of Events, which can be mechanically verified within the Nuprl proof assistant. Work has been done to develop methods for formally verifying distributed systems in Isabelle [20]. Model checking has been used extensively (e.g., [21, 24]) to test distributed systems for bugs.
Verdi [42] is a Coq framework for implementing verified distributed systems. A Verdi system is implemented as a collection of handler functions which exchange messages through the network or communicate with the “outside world” via input and output. Applicationlevel safety properties of the system can be proved with respect to a simple, idealized network semantics. A verified system transformer (VST) can then be used to transform the executable system into one which is robust to network faults such as reordering, duplication, and dropping of packets. The safety properties of the system proved under the original network semantics are preserved under the new faulty semantics, with minimal additional proof effort required of the programmer.
The goals of Verdi are complementary to our own. We implement a verified noregret MW algorithm, together with a language of Roughgarden smooth games, for constructing distributed systems with verified convergence and correctness guarantees. Verdi allows safety properties of a distributed system to be lifted to analogous systems which tolerate various network faults, and provides a robust runtime system for execution in a practical setting. It stands to reason, then, that Verdi (as well as followon related work such as [37]) may provide a natural avenue for building robust executable versions of our distributed applications. We leave this for future work.
Chapar [23] is a Coq framework for verifying causal consistency of distributed keyvalue stores as well as correctness of client programs with respect to causally consistent keyvalue stores. The implementation of a keyvalue store is proved correct with respect to a highlevel specification using a program refinement method similar to ours. Although Chapar’s goal isn’t to verify robustness to network faults, node crashes and message losses are modeled by its abstract operational semantics.
IronFleet [18] is a framework and methodology for building verified distributed systems using a mix of TLAstyle state machine refinement, Hoare logic, and automated theorem proving. An IronFleet system is comprised of three layers: a highlevel state machine specification of the overall system, a more detailed distributed protocol layer which describes the behavior of each agent in the system as a state machine, and the implementation layer in which each agent is programmed using a variant of the Dafny [22] language extended with a trusted set of UDP networking operations. Correctness properties are proved with respect to the highlevel specifications, and a series of refinements is used to prove that every behavior in the implementation layer is a refinement of some behavior in the highlevel specification. IronFleet has been used to prove safety and liveness properties of IronRSL, a Paxosbased replicated state machine, as well as IronKV, a shared keyvalue store.
Alternative Proofs. Variant proofs of Theorem 1, such as the one via KLdivergence (cf. [1, Section 2.2]), could be formalized in our framework without modifying most parts of the MW implementation. In particular, because we have proved once and for all that our interpreted MW refines a highlevel specification of MW, it would be sufficient to formalize the new proof just with respect to the highlevel program of Sect. 5.6.
8 Conclusion
This paper reports on the first formally verified implementation of Multiplicative Weights (MW), a simple yet powerful algorithm for approximately solving Coarse Correlated Equilibria, among many other applications. We prove our MW implementation correct via a series of program refinements with respect to a highlevel implementation of the algorithm. We present a DSL for building smooth games and show how to compose MW with smoothness to build distributed systems with verified Price of Anarchy bounds. Our implementation and proof are open source and available online.
Footnotes
 1.
The expected (perstep) cost of the algorithm minus that of the best fixed action.
Notes
Acknowledgments
This material is based on work supported by the National Science Foundation under Grant No. CCF1657358. We thank the ESOP anonymous referees for their comments on an earlier version of this paper.
References
 1.Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a metaalgorithm and applications. Theor. Comput. 8(1), 121–164 (2012)MathSciNetCrossRefGoogle Scholar
 2.Awerbuch, B., Azar, Y., Epstein, A.: The price of routing unsplittable flow. In: Proceedings of the thirtyseventh annual ACM Symposium on Theory of Computing, pp. 57–66. ACM (2005)Google Scholar
 3.Bachelier, L.: Théorie mathématique du jeu. Annales Scientifiques de l’Ecole Normale Supérieure 18, 143–209 (1901)MathSciNetCrossRefGoogle Scholar
 4.Bagnall, A., Merten, S., Stewart, G.: Brief announcement: certified multiplicative weights update: verified learning without regret. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017. ACM (2017)Google Scholar
 5.Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Springer, Heidelberg (2013)MATHGoogle Scholar
 6.Bhawalkar, K., Gairing, M., Roughgarden, T.: Weighted congestion games: price of anarchy, universal worstcase examples, and tightness. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 17–28. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642157813_2CrossRefGoogle Scholar
 7.Blum, A., Monsour, Y.: Learning, regret minimization, and equilibria. In: Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V. (eds.) Algorithmic Game Theory. Cambridge University Press, Cambridge (2007). Chapter 4Google Scholar
 8.Borel, E.: La théorie du jeu et les équations intégrales à noyau symétrique. Comptes rendus de l’Académie des Sci. 173(1304–1308), 58 (1921)MATHGoogle Scholar
 9.Borowski, H., Marden, J.R., Shamma, J.: Learning efficient correlated equilibrium. Submitted for journal publication (2015)Google Scholar
 10.Christodoulou, G., Koutsoupias, E.: The price of anarchy of finite congestion games. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pp. 67–73. ACM (2005)Google Scholar
 11.Demaine, E.D., Hajiaghayi, M., Mahini, H., Zadimoghaddam, M.: The price of anarchy in network creation games. In: Proceedings of the Twentysixth Annual ACM Symposium on Principles of Distributed Computing, pp. 292–298. ACM (2007)Google Scholar
 12.Fanelli, A., Moscardelli, L., Skopalik, A.: On the impact of fair best response dynamics. In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464, pp. 360–371. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642325892_33CrossRefMATHGoogle Scholar
 13.Foster, D.P., Vohra, R.V.: Calibrated learning and correlated equilibrium. Games Econ. Behav. 21(1), 40–55 (1997)MathSciNetCrossRefGoogle Scholar
 14.Freund, Y., Schapire, R.E.: A desiciontheoretic generalization of online learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3540591192_166CrossRefGoogle Scholar
 15.Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. Ser. B (Methodol.) 41(2), 148–177 (1979)MathSciNetMATHGoogle Scholar
 16.Gonthier, G., Mahboubi, A., Tassi, E.: A small scale reflection extension for the Coq system. Technical report, INRIA (2015)Google Scholar
 17.Hart, S., MasColell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)MathSciNetCrossRefGoogle Scholar
 18.Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L., Setty, S., Zill, B.: IronFleet: proving practical distributed systems correct. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 1–17. ACM (2015)Google Scholar
 19.Hu, J., Wellman, M.P., et al.: Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol. 98, pp. 242–250. Citeseer (1998)Google Scholar
 20.Küfner, P., Nestmann, U., Rickmann, C.: Formal verification of distributed algorithms. In: Baeten, J.C.M., Ball, T., de Boer, F.S. (eds.) TCS 2012. LNCS, vol. 7604, pp. 209–224. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642334757_15CrossRefGoogle Scholar
 21.Lamport, L.: Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. AddisonWesley Longman Publishing Co., Inc., Boston (2002)Google Scholar
 22.Leino, K.R.M.: Dafny: an automatic program verifier for functional correctness. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010. LNCS (LNAI), vol. 6355, pp. 348–370. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642175114_20CrossRefMATHGoogle Scholar
 23.Lesani, M., Bell, C.J., Chlipala, A.: Chapar: certified causally consistent distributed keyvalue stores. In: ACM SIGPLAN Notices, vol. 51. ACM (2016)CrossRefGoogle Scholar
 24.Lin, H., Yang, M., Long, F., Zhang, L., Zhou, L.: MODIST: transparent model checking of unmodified distributed systems. In: Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (2009)Google Scholar
 25.Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE (1989)Google Scholar
 26.Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14(1), 124–143 (1996)MathSciNetCrossRefGoogle Scholar
 27.Nash, J.: Noncooperative games. Ann. Math. 54(2), 286–295 (1951)MathSciNetCrossRefGoogle Scholar
 28.Nash, J.F.: Equilibrium in nplayer games. Proc. Natl. Acad. Sci. (PNAS) 36(1), 48–49 (1950)CrossRefGoogle Scholar
 29.von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior, vol. 60. Princeton University Press, New Jersey (1944)MATHGoogle Scholar
 30.Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V.: Algorithmic Game Theory, vol. 1. Cambridge University Press, New York (2007)CrossRefGoogle Scholar
 31.Panchekha, P., et al.: Automatically improving accuracy for floating point expressions. ACM SIGPLAN Not. 50(6), 1–11 (2015)CrossRefGoogle Scholar
 32.Perakis, G., Roels, G.: The price of anarchy in supply chains: quantifying the efficiency of priceonly contracts. Manag. Sci. 53(8), 1249–1268 (2007)CrossRefGoogle Scholar
 33.Rahli, V.: Interfacing with proof assistants for domain specific programming using EventML. In: Proceedings of the 10th International Workshop on User Interfaces for Theorem Provers, Bremen, Germany (2012)Google Scholar
 34.Ramananandro, T., et al.: A unified Coq framework for verifying C programs with floatingpoint computations. In: Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs. ACM (2016)Google Scholar
 35.Roughgarden, T.: Intrinsic robustness of the price of anarchy. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 513–522. ACM (2009)Google Scholar
 36.Roughgarden, T.: Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, New York (2016)CrossRefGoogle Scholar
 37.Sergey, I., Wilcox, J.R., Tatlock, Z.: Programming and proving with distributed protocols. In: Proceedings of the ACM on Programming Languages, vol. 2 (POPL), Article 28 (2018)Google Scholar
 38.Spitters, B., Van der Weegen, E.: Type classes for mathematics in type theory. Math. Struct. Comput. Sci. 21(04), 795–825 (2011)MathSciNetCrossRefGoogle Scholar
 39.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)Google Scholar
 40.Vetta, A.: Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions. In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, pp. 416–425. IEEE (2002)Google Scholar
 41.Watkins, C.J., Dayan, P.: Qlearning. Mach. Learn. 8(3–4), 279–292 (1992)MATHGoogle Scholar
 42.Wilcox, J.R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M.D., Anderson, T.: Verdi: a framework for implementing and formally verifying distributed systems. In: ACM SIGPLAN Notices, vol. 50, pp. 357–368. ACM (2015)CrossRefGoogle Scholar
 43.Zermelo, E.: Über eine anwendung der mengenlehre auf die theorie des schachspiels. In: Proceedings of the Fifth International Congress of Mathematicians, vol. 2, pp. 501–504. II, Cambridge University Press, Cambridge (1913)Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.