Offline synthesis of evolutionarily stable normative systems
Abstract
Within the area of multiagent systems, normative systems are a widely used framework for the coordination of interdependent activities. A crucial problem associated with normative systems is that of synthesising norms that will effectively accomplish a coordination task and that the agents will comply with. Many works in the literature focus on the online synthesis of a single, evolutionarily stable norm (convention) whose compliance forms a rational choice for the agents and that effectively coordinates them in one particular coordination situation that needs to be identified and modelled as a game in advance. In this work, we introduce a framework for the automatic offline synthesis of evolutionarily stable normative systems that coordinate the agents in multiple interdependent coordination situations that cannot be easily identified in advance nor resolved separately. Our framework roots in evolutionary game theory. It considers multiagent systems in which the potential conflict situations can be automatically enumerated by employing MAS simulations along with basic domain information. Our framework simulates an evolutionary process whereby successful norms prosper and spread within the agent population, while unsuccessful norms are discarded. The outputs of such a natural selection process are sets of codependent norms that, together, effectively coordinate the agents in multiple interdependent situations and are evolutionarily stable. We empirically show the effectiveness of our approach through empirical evaluation in a simulated traffic domain.
Keywords
Norms Normative systems Norm synthesis Evolutionary algorithm1 Introduction
Within human societies and multiagent systems (MAS), normative systems (norms) have been widely studied as a mechanism for coordinating the interplay between autonomous agents [7, 28]. Norms can resolve coordination problems in MAS by guiding the decisionmaking of the agents, restricting their behaviours or encouraging desirable courses of action once some preconditions are fulfilled. Coordination in this sense is usually understood as ensuring that the agents can successfully interact by avoiding undesirable outcomes.
When designing norms for MAS (e.g., an autonomous cars scenario), a system designer will potentially need to address two crucial problems.First, identifying all the conflict situations (or even the MAS states) in which the agents may require coordination. If performed manually, this task might be time consuming and error prone, specially if we consider that conflict situations might be interdependent – that is, the agents’ decisions in a situation might affect the outcomes of different ones. As an example, two cars arriving at a junction will need to decide which one yields in order to avoid collisions. However, the coordination success of these cars might depend on the decisions of other cars in the road, such as other cars queueing behind them (and so on an so forth). These interdependencies might lead to numerous situations that cannot be easily identified in advance nor resolved separately. Secondly, a system designer will need to synthesise an effective normative system (a set of norms) that the agents will comply with and that will successfully coordinate the agents in all the identified conflict situations. Such a norm synthesis problem is known to be a challenging problem (NPComplete [30]) that remains open.
A popular (online) norm synthesis approach in the literature is that of norm emergence (or convention emergence) [3, 5, 26, 27, 31, 33, 36], in which the norms of a MAS emerge from within the agent society at runtime. For the purposes of this paper, what is interesting about norm emergence is that it has been successfully used to synthesise stable norms. Most works on norm emergence build on principles in line with those from the framework of evolutionary game theory (EGT) [32]. They consider a MAS in which the agents repeatedly engage in a conflict situation modelled as a game. Strategies that are seen to be successful prosper and spread within the agent society through an evolutionary process whereby the agents tend to adopt successful strategies with higher probabilities than unsuccessful ones. A conventional norm (a convention) [36] emerges once a significant proportion of agents adopt a strategy that everyone prefers to conform, on the assumption that everyone else does. Such a strategy is said to be evolutionarily stable (ESS), for no agent could benefit from deviating. Hence, complying with an ESS forms a rational choice for the agents.
From the point of view of a system designer, EGT and norm emergence can be seen as powerful tools to synthesise effective and stable norms. By simulating the evolution of strategies in a MAS, one can anticipate the norms that the agents will abide by at runtime because they lead to successful coordination [3, 31, 34]. However, norm emergence considers a single (typically twoplayer) game that is known beforehand and of which the agents have complete information (e.g., the number of players, and the payoffs). Then, one norm is synthesised to coordinate the agents in the game. Therefore, norm emergence is inappropriate to synthesise norms for MAS with numerous interdependent conflict situations, such as the traffic scenario. One possible solution might be considering a big game involving all the agents from interdependent conflict situations (e.g., a game with two cars in a junction and other cars queueing behind them). Nevertheless, agents may have a limited perception of the environment, thus being unable to properly detect big games that they are engaged in in order to coordinate with all their players. For instance, cars in a junction may not be able to perceive other cars behind them, thus being unaware of their need for coordination.
An alternative online norm synthesis approach that may be more appropriate for the scenarios pictured so far is the one proposed in [17, 18], in which the potential conflict situations of a MAS need not be known in advance. Complex coordination situations are detected at runtime during MAS executions by considering basic domain information, and then modelled as sets of smaller, interdependent games that the agents can fully recognise and coordinate in. Norms are automatically created to resolve each game, and individually evaluated in terms of their joint performance once the agents simultaneously play interdependent games. The outputs of this process are sets of norms whose coordination utility depend on each other and, as a whole, effectively coordinate the agents in each possible game. However, unlike most stateoftheart approaches, this work does not consider evolutionary stability as a synthesis criterion, and hence the normative systems it synthesises cannot be guaranteed to be evolutionarily stable.
Against this background, this paper builds on techniques from the work in [17, 18] and the framework of EGT [32], and contributes to the state of the art by introducing a novel framework for the offline synthesis of evolutionarily stable normative systems (ESNS) for MAS. Our framework considers a MAS with multiple, interdependent conflict situations that are unknown beforehand. It employs techniques described in [17, 18] to automatically discover these situations during MAS simulations, modelling them as interdependent games. Norms are automatically created in order to coordinate the agents in these games, and submitted to an evolutionary process inspired in EGT. Norms that are proven to be useful to coordinate the agents in each game prosper and spread, ultimately converging to sets of codependent conventional norms that, together, are effective for coordination and evolutionarily stable. We provide empirical evaluation of our framework in a simulated traffic domain. We show that it can synthesise ESNSs that successfully avoid conflicts in numerous (up to 89) interdependent traffic situations that cannot be easily identified and resolved separately, for the resulting normative systems would not be evolutionarily stable.
Broadly speaking, our framework provides a valuable tool for offline norms design. Given a MAS, it opens the possibility of synthesising evolutionarily stable normative systems without requiring full knowledge about its possible conflict situations. By simulating basic agents’ interactions and providing basic domain information (such as identifying when cars collide), our framework can provide system designers with the necessary norms to achieve effective and stable agent coordination in a MAS.
The remainder of the paper is organised as follows. Section 2 provides the necessary background to understand our work. Section 3 describes our framework, whereas Sect. 4 illustrates its empirical evaluation. Section 5 reviews the state of the art in norm synthesis, and Sect. 6 provides some concluding remarks and outlines possible future research. Finally, Sect. 7 discusses the main limiting assumptions of this work and how these might be lifted in order to be applicable to a wider extent of problems.
2 Background
In this section we provide the necessary background to understand our work. We start by surveying the automatic norm synthesis approach described in [17, 18], named iron (Intelligent Robust Online Norm Synthesis), as our work employs it to automatically detect coordination situations and generate norms. Then, we introduce the framework of evolutionary game theory (EGT) [32] and the key concept of evolutionarily stable strategy (ESS).
2.1 Automatic norm synthesis
iron is an iterative approach that monitors the evolution of a MAS at regular time intervals, searching for conflict situations (e.g., collisions in a traffic scenario). Whenever it detects a conflict at time t, iron triggers a norm generation process that results in the creation of a norm aimed to avoid the detected conflict in the future (from time \(t+1\) onwards). The norm is then communicated to the agents, and thereafter empirically evaluated in terms of its utility to coordinate the agents once they comply with it.^{1} If the norm avoids conflicts once the agents abide by its strictures, then it is regarded as useful and sustained over time. Otherwise, the norm is ultimately discarded and alternative norms are created in order to avoid the conflict. Over time, iron may detect multiple, interdependent conflicts. As a result, it will create multiple, codependent norms whose utility might depend on each other.
iron considers a MAS in which the agents have a limited perception of the environment. Thus, a car perceives the road by means of its local context, i.e., its internal representation of the area of the road that is immediately next to and in front of it at a given time. For instance, cars 1 and 2 perceive each other at time t (Fig. 1a), but neither of them can perceive car 3. Car 3 perceives car 2, but it cannot perceive car 1.
iron detects a conflict at a given time by perceiving the state of the monitored MAS, and employing a domaindependent conflict function^{3} that allows it to detect groups of agents whose interaction has led to an undesirable outcome in the state. For instance, at time \(t+1\) (Fig. 1b), iron will detect a collision between cars 1 and 2 after both have performed action “go”.
After detecting a collision at time \(t+1\), iron creates a norm aimed to avoid future collisions. Specifically, it creates a norm that prohibits to perform one of the actions performed by the collided cars at time \(t+1\) whenever they engage in the situation faced by them before colliding, i.e., at time t. In this way, the conflict might be avoided in the future. iron creates a norm as follows. First, it randomly chooses one of the cars involved in the collision at time \(t+1\), e.g., car 2. Then, it employs a domaindependent \({context function}^3\) to retrieve the local context of this car in the situation previous to the conflict. The context of car 2 at time t can be informally described as “there is a car coming from my left”.^{4} Next, it employs a domaindependent \({action function}^3\) to retrieve the action that this car performed during the transition from time t to time \(t+1\) (i.e., action “go”).
The new norm is then communicated to all the agents in the system, who will have this norm applicable once they face the situation described by the norm. Then, iron keeps monitoring the MAS. At time \(t+2\), cars 5 and 6 engage in the same situation faced by cars 1 and 2 at time t. This time, car 6 has norm \(n_a\) applicable and is prohibited to go forward. Consequently, car 6 stops at time \(t+3\), giving way to car 5, and being hit by car 7 from behind. iron will detect this new conflict, which in fact is caused by the decision of car 6. Then, it will generate a new norm that prohibits to go to any agent encountering the context of car 7 at time \(t+2\). The resulting norm, \(n_b\), can be described as:\(n_a\): if there is a car coming from my left then “i am prohibited to go”.
which can be seen as a “security distance” norm.\(n_b\): if there is a car in front of me going forward then “i am prohibited to go”.
Notice that the two norms synthesised so far (\(n_a\) and \(n_b\)) are codependent, i.e., they can only avoid collisions if the agents have both of them. iron has no means to explicitly detect such a codependency relationship. Instead, it implicitly captures norms’ codependencies effects by empirically computing their individual utilities in a series of concurrent game plays in which norms can affect each other’s performances. For instance, say that a car applies norm \(n_a\) and the car behind it applies norm \(n_b\), hence avoiding collisions. Then, both norms will be individually evaluated as useful. Conversely, should cars not have and apply \(n_b\), then norm \(n_a\) would be individually evaluated as useless. Eventually, iron will sustain these norms if they individually prove to be useful enough during a sufficient amount of time. Otherwise, it will discard them and will synthesise new ones, and so on and so forth until it finds a set of norms that successfully avoid conflicts in all possible situations.
At this point it is worth to notice that the normative systems synthesised by iron cannot be guaranteed to be evolutionarily stable. iron does not seek to synthesise optimal normative systems, but normative systems that are good enough for a given MAS. Thus, once iron finds a good enough normative system, it misses to explore “better” (more useful) normative systems that the agents might eventually be tempted to switch to. In this work we address this problem, building on iron and incorporating ideas from EGT in order to synthesise evolutionarily stable normative systems.
2.2 Evolutionary game theory
EGT combines population ecology with classical game theory. It considers a population of agents that repeatedly engage in strategic pairwise interactions by adopting different (pure) strategies. An ESS is a strategy that, if adopted by a majority of agents, no agent could benefit from using any alternative strategy – namely, the fitness (i.e., the average payoff) of an agent using that strategy is higher than the fitness of any agent using alternative strategies.
Figure 2 graphically illustrates the EGT model. It considers an initial population of agents \(P_t\) that adopt different strategies to play a game (Fig. 2(1)). Each strategy has a certain fitness that quantifies the average payoff to an agent that adopts the strategy to play against other strategists. Strategies are then replicated (Fig. 2(2)), growing in frequency proportionally to their relative fitness with respect to the average fitness of the population. If the process has not converged yet, then a new population \(P_{t+1}\) is generated that reflects the changes in strategy frequencies (Fig. 2(3)). Such population is then employed to repeat the replication process, and so on and so forth until the process converges. The replication process is considered to have converged once the population remains stable between generations (that is, the frequencies of each strategy remain unchanged). Then, if a majority of agents have adopted the same strategy, this strategy is considered to be an ESS.
Payoff matrix of the Prisoner’s Dilemma
C  D  

C  (3,3)  (0,5) 
D  (5,0)  (1,1) 
In this manner, the fitness of a cooperator is computed as the summation of its initial fitness, the probability of encountering a cooperator times the payoff to the cooperator when that happens, and the probability of encountering a defector times the payoff to the cooperator when that happens. The fitness of a defector is computed analogously.
As previously mentioned, the replication process can eventually lead the population to a point of equilibrium in which the frequencies of each strategy do not change over time because their fitnesses are equal. When this happens, the population can be either monomorphic (a majority of agents adopt the same strategy) or polymorphic (the agents adopt a variety of strategies). If the population composition can be restored after a disturbance,^{5} then it is said that the population is in an evolutionarily stable state. If such population is monomorphic, then the strategy adopted by the agents is said to be an ESS.
In the Prisoner’s Dilemma, cooperation is not an ESS because a population of cooperators will always perform more poorly (will have a lower fitness) than defectors when playing against each other. Then, defection is the only evolutionarily strategy: a population of cooperators can be invaded by one defector.
3 Evolutionary norm synthesis
In this section we introduce our framework for the synthesis of evolutionarily stable normative systems (ESNS) – hereafter, referred to as our “System for Evolutionary Norm SynthEsis”, or sense for short. In Sect. 3.1, we provide a general overview of the sense operation. Then, we provide some basic definitions and formally define our problem in Sect. 3.2. We detail how sense performs evolutionary norm synthesis in Sect. 3.3.
3.1 Evolutionary norm synthesis system: an outline
Our framework implements an evolutionary process similar to the one in EGT (Sect. 2.2), but instead of strategies, it replicates norms. It considers a MAS in which agents have a limited perception of the environment. Thus, complex coordination situations of a MAS, in which agents may not be able to entirely perceive each other in order to detect their need for coordination, are resolved by creating sets of simpler, interdependent games in which agents can fully perceive each other and coordinate. At a given time, the coordination success of the agents playing a game will depend on their actions in the game, as well as the actions of other agents simultaneously playing interdependent games.
 MAS simulation sense runs simulation of the MAS for a certain amount of time,^{6} repeatedly carrying out two simultaneous subtasks:

Norm generation, which consists in monitoring agents’ activities, detecting new games and keeping track of them in a Games Base. For each new tracked game, sense creates alternative norms to coordinate the agents and sends each norm to different agents. For instance, if a game has four different norms, 25% of agents will be provided with the first norm, and the same applies to the remaining norms. The agents will incorporate their assigned norms to their normative systems, and thereafter, every time they play the game, their respective norms will guide their courses of action for the sake of coordination. Over time, sense will create a heterogeneous population whose agents have different normative systems, and thus will play each game by using different, competing norms. In order to detect conflicts and create norms, sense employs iron’s techniques described in Sect. 2.1.

Utility learning, which consists in accumulating evidence about the utilities of the norms of each game once the agents play the game over time. A norm’s utility is empirically computed in terms of the frequency with which it effectively coordinates the agents in a game once they comply with the norm within a sequence of game plays. Such a utility might depend on the (codependent) norms adopted by the agents in simultaneously played interdependent games. Computing norms’ utilities empirically allows sense to capture in a compact manner the runtime effects of the codependencies between norms, enabling it to evaluate and coevolve norms together.


Norm replication After simulation, sense has available a collection of interdependent games along with their norms’ utilities. With all this at hand, sense replicates norms likewise strategies are replicated in EGT. For each game, it computes the fitnesses of its competing norms as their average utility in the game. Then, it replicates each norm in numbers proportional to its fitness: the frequency of those norms fitter than average will increase, while that of those norms less fit than average will decrease. Likewise utilities, the fitnesses and frequencies of codependent norms will coevolve over time.
 1.
There is one norm in each game that has eliminated its competitors, i.e., this norm has achieved 100% frequency.
 2.
The frequencies of each norm of each game remain stable (unchanged) during a sufficient amount of generations, denoted as \({\mathcal {I}}\).

norm1 establishes no prohibitions. Thus, a car is free to go forward when coming from either the left or the right (no matter the role it plays).

norm2 says that a car is prohibited to go forward when coming from the left (when playing role 1). In practice, norm2 stands for a “give way to the right” norm. Analogously, norm3 stands for a “give way to the left” norm.

norm4 stands for a “give way always” norm, i.e., it prohibits a car to go forward once it is playing either role 1 or role 2.
Each car will incorporate two norms to its normative system: one norm to coordinate in game A, and another norm for game B. In particular, 25% of the cars will have norm1 in their normative systems, and the same applies to norm2, norm3 and norm4. Similarly, 50% of the cars will have norm5, and the remaining 50% will have norm6. Thereafter, sense will accumulate evidence about the coordination utility of each norm once the agents use it to coordinate in a game. For instance, suppose that at a given time t sense perceives the situation illustrated in Fig. 6a. Cars 1–2 play game A described above, and car 3 plays game B. Cars 1–2 have norm3 as their norm to coordinate in game A, and car 3 has norm6 for game B. Thus, at time \(t+1\) (Fig. 6b) cars 2 and 3 stop, giving way to car 1 and avoiding a collision. sense will monitor this positive outcome and will evaluate norm3 and norm6 as useful. At a later time \(t'\), sense perceives a similar situation in which car 5 has norm5 to coordinate in game B (Fig. 7a). Thus, at time \(t'+1\) (Fig. 7b) car 4 stops and car 5 goes forward, hitting it from behind. Consequently, sense will evaluate norm3 and norm5 as useless.
3.2 Basic definitions and problem statement
We consider a MAS with a set of agents Ag and a finite set of actions Ac available to these agents. Let S be the set of all the states of the MAS. We adopt a synchronous model in which the agents interact in some system state, perform a collection of actions, and lead the system from its previous state to a new one. We assume that each agent has a limited perception of the state of the MAS it is part of at a given time. Thus, an agent’s context stands for its internal representation of a MAS state (i.e., its beliefs). Agents express their contexts in terms of an agent language \({\mathcal {L}}\) composed of predicates and terms.
In each state of the MAS, agents may engage in local strategic interactions in which they need coordination in order to avoid conflicts. We will refer to such interactions as oneshot games. Formally, a game is composed of a set of roles that define the actions available to each agent involved in the game. A game has a description that describes the initial situation of the game from the point of view of its players. Such a description is composed of the local contexts of each player. We assume that the payoffs to the players of a game cannot be assessed beforehand, as these might depend on the actions of the players of interdependent games simultaneously played at runtime. Thus, it is important to notice that a game in our model does not contain a predefined payoff matrix. Instead, we define it as follows:
Definition 1

\(R=\{1, \ldots , m\}\) is a set of m agent roles, one per each agent involved in the game.

\(A=\langle A_1, \ldots , A_m \rangle \) is an mtuple of action sets available to each role, where \(A_i \subseteq Ac\) is the set of actions available to the agent taking on role i.

\(\varphi \) is an expression of \({\mathcal {L}}\) that describes the initial situation of the game from the point of view of its players (i.e., \(\varphi \) is the conjunction of their local contexts, sorted by role).
For simplicity, henceforth we shall refer to an mrole game as a game, and to an agent playing role i in a game as player i.
As an example, we can formally describe game A in Fig. 4a as a tuple \(G_A=\langle R, A, \varphi \rangle \), where \(R=\{1,2\}\) is the set of roles, \(A=\langle \{go,stop\},\{go,stop\}\rangle \) is the set of actions available to each role, and \(\varphi \) is its description, which can be informally interpreted as “player 1 perceives a car coming from its right and player 2 perceives a car coming from its left”.
In general, in each MAS state the same game can be simultaneously played by different groups of agents, and each agent can engage in the same game in different MAS states. At a given time, an agent identifies whether it is engaged in a game in the current state of the MAS (and the role that it plays in it) by means of its context. For instance, if the context of a car at a given time is “there is a car coming from my right”, then this car will know that it is enacting role 1 in game \(G_A\).
If G is a game, a norm stands for a coordination strategy that specifies what an agent is prohibited to do when playing each possible role of G. Norms are expressed in terms of the agent language \({\mathcal {L}}\) so that agents can interpret and comply with them. Formally, a norm is a (possibly empty) set of constraints that restricts the action space of the agents involved in a game by prohibiting certain actions.
Definition 2

\(\psi \in {\mathcal {L}}\) is the precondition of the norm.

\(prh: R \rightarrow 2^{Ac}\) is a function that returns the set of actions that an agent is prohibited to perform when taking on role i, where \(prh(i) \in 2^{A_i}\) for all \(i \in R\).
As an example, norm3 introduced above to coordinate cars in \(G_A\) can be formally defined as a pair \(norm3 =\langle \psi , prh \rangle \), where \(\psi =\)“player 1 perceives a car coming from its right and player 2 perceives a car coming from its left”, and function prh returns an empty set for role 1, and action “go” for role 2. Formally, \(prh(1)=\emptyset \) and \(prh(2)=\{go\}\).
Let \(G=\langle R, A, \varphi \rangle \) be an mrole game, and \(n = \langle \psi , prh \rangle \) a norm. We say that n applies in G if the precondition of n satisfies the description of G, namely if \(\varphi \models \psi \). Hereafter, we will refer to the set of norms that apply in a game G as the norm space of game G, denoted by \(N_{G}\). For instance, the norm space of game \(G_A\) can be denoted by \(N_{G_A}=\{norm1, norm2, norm3, norm4\}\) (the norms in the Table in Fig. 4).
Agents in a MAS may engage in multiple, different games. Henceforth, we shall denote the set of games that agents can play as \({\mathcal {G}} = \{G_1,\ldots ,G_s\}\). A normative system is a set of \({\mathcal {G}}\) norms that provides an agent with the means to coordinate in each game in \({\mathcal {G}}\). Following our example, each car will have one norm out of norm space \(N_{G_A}\), and so on for each game.
Definition 3
(Normative system) Let \({\mathcal {G}}\) be a (possibly empty) set of games. A normative system is a set of norms \(\varOmega \) such that for each \( G \in {\mathcal {G}}\) there is one norm \(n \in \varOmega \) and \(n \in N_{G}\).
First of all, each agent \(ag_j \in Ag\) counts on its own normative system \(\varOmega _j\). Thus, in general we assume that a MAS is composed of a heterogeneous population whose agents may have different normative systems.
Let \(Ag' \subseteq Ag\) be a group of agents engaged in a game \(G=\langle R, A, \varphi \rangle \) at a given time, each playing one role from R. Each agent will count on one norm out of its normative system that applies in G and prohibits it to perform some actions. We denote the combination of norms applicable to these agents at this given time as \(\mathbf {n}=\langle n_1, \ldots , n_m \rangle \), where \(n_i\) stands for the norm for G in the normative system of the agent playing role i. We assume that agents always comply with their applicable norms.^{7} Therefore, based on the norms in \(\mathbf {n}\), these agents will perform a tuple of actions \(\mathbf {a}=\langle a_1,\ldots ,a_m \rangle \), where \(a_i\) is an action performed by the agent enacting role i that is not prohibited by norm \(n_i\) for role i.^{8}
As previously introduced, we assume that the payoffs to the players of a game cannot be assessed beforehand, as these might depend on the actions of the players of interdependent games simultaneously played at runtime. However, we can compute the rewards to the players of such a game at a given time, once they perform a joint action and lead the MAS to its next state. We obtain such a reward by means of a reward function.
Definition 4
(Reward) Given a MAS with a set of agents Ag, the reward to an agent \(ag \in Ag\) at a particular point in time \(t \in {\mathbb {N}}\) is represented by means of a reward function of the form \(r^t(ag) \in [0,1]\).
Say that cars 1 and 2 in Fig. 6a have normative systems \(\varOmega _1\) and \(\varOmega _2\), respectively, and that both normative systems have norm3 as the applicable norm in \(G_A\). Thus, at time t these cars play \(G_A\) with norm combination \(\mathbf {n}=\langle norm3, norm3 \rangle \). These cars will perform a joint action \(\mathbf {a}=\langle go,stop \rangle \), thus avoiding collisions at time \(t+1\) and getting reward 1 (i.e., \(r^{t+1}(1)= 1\) and \(r^{t+1}(2)= 1\)). At time \(t'\) (Fig. 7a), cars 3 and 4 play \(G_A\) with the same norm combination. Car 3 avoids collisions, but car 4 is hit from behind by car 5, which was engaged in game B. Thus, the rewards of cars 3 and 4 at time \(t'+1\) are 1 and 0, respectively, i.e., \(r^{t'+1}(3)= 1\) and \(r^{t'+1}(4)= 0\).
Notice that, in practice, given a game and the norms that apply in it, the agents will play a repeated oneshot game of norms against norms, namely a normative game, in which the norm combinations used by the agents to play the game over time will lead them to obtain a history of rewards. Thus, a normative game will consist of a game, the norms that apply in it, and a history of rewards obtained by the agents in the game.
Definition 5

G is an mrole game, and \(N_{G}\) is the norm space of G.

\(H=\langle h_0,\ldots , h_w \rangle \) is the memory of the normative game over a time window \([0,t_w]\), where \(h_j=\langle \mathbf {n}^j, \mathbf {r}^j \rangle , j \in [0,w]\) such that \(\mathbf {n}^j \in N_G^{R}\) is the combination of norms that applied to the agents playing the game at time \(t_j\), and \(\mathbf {r}^j\) is the vector of rewards that these agents obtained at that time (one for each agent).

\(U=\langle u_1, \ldots , u_m \rangle \) is an mtuple of utility functions of the form \(u_i: N_G^{R} \times H \rightarrow [0,1]\), which return the personal coordination utility to an agent enacting role i in the game once the players of the game have a certain combination of applicable norms. Such a utility is empirically computed based on the memory of the game, H.
Intuitively, the utility of a norm combination tells us how successful it has been historically to avoid conflicts to each player of the game. Such a utility is computed based on the rewards obtained by the agents that have played the game within a time window. Further on, we provide equations to compute such a utility in Sect. 3.3.2.
Note therefore that each game will have its equivalent normative game. If \({\mathcal {G}}=\{G_1, \ldots , G_s \}\) is a set of games with norm spaces \(N_{G_1}, \ldots , N_{G_s}\), we shall denote as \({{\mathcal {N}}}{{\mathcal {G}}}=\{NG_1, \ldots , NG_s\}\) the set of all possible normative games, where \(NG_i\) is the normative game resulting from \(G_i\) and the norms from \(N_{G_i}\).
At this point we import from EGT the concept of fitness introduced in Sect. 2.2. Given a normative game NG, the fitness of each one of its norms quantifies the average utility of an agent that uses the norm to play NG by enacting different roles and by playing against agents with different norms. Formally:
Definition 6
(Norm fitness) Given a normative game \(NG=\langle G, N_{G}, H, U\rangle \), the fitness of a norm \(n \in N_{G}\) is represented by means of a function of the form \(f(n,NG) \in [0,1]\).
Now we are ready to introduce the problem that we address in this paper. Consider a population of agents, and a collection of interdependent normative games \({{\mathcal {N}}}{{\mathcal {G}}}\). Our aim is to find a normative system \(\varOmega \) such that, once it is used by all the agents to coordinate in the games in \({{\mathcal {N}}}{{\mathcal {G}}}\), there is no agent that can derive a greater global fitness by using an alternative normative system \(\varOmega '\). In terms of EGT, this amounts to saying that normative system \(\varOmega \) is evolutionarily stable, since no agent could be ever tempted to use alternative normative systems to coordinate in the normative games in \({{\mathcal {N}}}{{\mathcal {G}}}\).
Definition 7
 1.
All agents adopt \(\varOmega \). That is, \(\varOmega _i = \varOmega \) for each agent \(ag_i \in Ag\).
 2.
There is no alternative normative system whose fitness outperforms that of \(\varOmega \). Formally, there is no alternative normative system \(\varOmega '\) such that \(f_{{\mathcal {G}}}(\varOmega ',{{\mathcal {N}}}{{\mathcal {G}}}) > f_{{\mathcal {G}}}(\varOmega ,{{\mathcal {N}}}{{\mathcal {G}}})\).
3.3 Formal model for evolutionary norm synthesis
We now describe the tasks that sense performs to synthesise a normative system that solves the norm synthesis problem in Definition 7. That is, norm generation and utility learning (Fig. 3(2)), and norm replication (Fig. 3(3)). In particular, norm generation and utility learning are achieved by runtime observation of the agents’ activities during MAS simulations, while norm replication is subsequently performed outside simulations.
3.3.1 Generating new normative games from observation
As introduced in Sect. 3.1, sense takes the norm generation approach of iron (Sect. 2.1). It monitors agents’ interactions at regular time intervals within a simulation of the MAS for a given number of time steps. At each time step, sense tries to detect new, untracked games, and generates norms to coordinate the involved agents. Likewise iron, we assume that conflicts can be detected at runtime, and that the agents involved in a conflict are the ones that are responsible for the conflict. Moreover, we assume that a conflict in a MAS state at a given time t is caused by the actions that the agents performed in the previous MAS state at time \(t1\).

A conflict function of the form \( conflicts : S \rightarrow 2^{Ag}\), which returns groups of agents that are involved in a conflict in the state of the MAS at a given time.

A context function of the form \( context : Ag \times S \rightarrow 2^{{\mathcal {L}}}\), which returns the local context of an agent in a state of the MAS. As introduced in Sect. 3.2, this context is expressed in terms of an agent language \({\mathcal {L}}\).

An action function of the form \( action : Ag \times S \times S \rightarrow Ac\), which returns the action that an agent performed during the transition from a MAS state \(s_t\) at a given time t to the subsequent MAS state \(s_{t+1}\) at time \(t+1\).
 1.
Detecting a new conflict at time t. Formally, sense obtains the conflicts of the current state of the MAS \(s_t \in S\) at a given time t by invoking function \( conflicts (s_t)\).
 2.
Retrieving the contexts at time \(t1\) of the m agents that are in conflict at time t. This amounts to generating, for each agent \(ag \in Ag\) involved in the conflict at time t, an expression \(\varphi \in {\mathcal {L}}\) that describes the context of this agent in the MAS state \(s_{t1} \in S\) at time \(t1\). Formally, \(context(ag,s_{t1})\).
 3.
Creating a new mrole game as \(G=\langle R, A, \varphi \rangle \) with R as the set of roles played by the agents at time \(t1\), A as the set of the actions available to these agents at time \(t1\), and \(\varphi \) as the set of contexts of these agents at time \(t1\).
 1.
Identifying the actions that led to the conflict at time t. That is, retrieving the m actions that the m agents involved in the conflict performed in the transition from the MAS state \(s_{t1}\) at time \(t1\) to its subsequent state \(s_t\) at time t. sense obtains the action performed by an agent \(ag \in Ag\) in such a state transition as \( action (ag,s_{t1},s_t)\).
 2.
Creating alternative norms that prohibit different roles of G to perform the actions that led to the conflict. Each one of these norms will prohibit agents to perform the conflicting actions whenever they perceive the context corresponding to different roles of the game.
At this point it is worth to say that sense is not guaranteed to solve a game on its first attempt, since a game may require norms that prohibit more than one action in order to coordinate the agents. For instance, if once cars are prohibited to go forward in a given game they decide to turn to one side, still colliding, then sense will synthesise further norms that prohibit both to go forward and to turn to one side, adding them to the norm space of the game and sending the new norms to the cars in the scenario.
Going back to our example of \(G_A\) and cars 1–2 (Fig. 8), sense will now create its norm space, \(N_{G_A}\), by first identifying action “go” as the one performed by cars 1 and 2 during the transition from time t to time \(t+1\). That is, \( action (1, s_t, s_{t+1})=go\) and \( action (2, s_t, s_{t+1})=go\). Then, it will create norms to prohibit action “go” to: none of the roles (norm1 in the Table of Fig. 9), role 1 (norm2), role 2 (norm3), and both roles (norm4). sense will now create and track the corresponding normative game \(NG_A=\langle G_A, N_{G_A}, H, U \rangle \). Thereafter, it will deliver each norm to 25% of the agents.
3.3.2 Computing norms’ utilities empirically
3.3.3 Replicating norms
 1.
the utility that an agent derives when using norm n to play the game by enacting different roles and by playing against other agents with possibly different norms in the game; and
 2.
the probability that the agent encounters these agents, which can be computed in terms of the frequencies of the norms applicable to these agents in the game.
As an example, let us consider that a car repeatedly plays game \(NG_A\) in Fig. 9 by using norm1. According to this norm, the car will never give way. This car will yield a high utility when playing against cars that have applicable norm4, since this norm obliges a car to always give way. This occurs when the combination of norms used in the game is either \(\langle norm1, norm4 \rangle \) (with our car playing role 1), or \(\langle norm4,norm1 \rangle \) (with our car playing role 2). Conversely, this car will derive a low utility when it interacts with cars that have norm1 (since both will go forward and collide), namely when the combination of norms used in the game is \(\langle norm1,norm1 \rangle \). Now, say that the number of cars with norm4 doubles the number of cars with norm1. Then, our car will be twice as likely to play against cars that have norm4, and hence to obtain a higher fitness.

\(N_{G}^{R}\) is the set of all norm combinations that the agents playing the game can employ;

\(\mathbf {n}\) is a norm combination and \(\mathbf {n}_i = n\) is the norm employed by the agent playing role i;

\(u_i(\mathbf {n}, H)\) is the utility to role i when agents play with norm combination \(\mathbf {n}\), computed based on the game’s memory H at a given time; and

\(p(\mathbf {n},t)\) is the joint frequency of the norms in \(\mathbf {n}\) in the normative systems of the players.
Example of utility matrix of norm1 and norm3 after simulation
norm1  norm3  

norm1  (0,0)  (1,0.45) 
norm3  (0,0)  (1,0.66) 
4 Empirical analysis and results
In this section we empirically evaluate our approach in a simulated traffic scenario. We explore several dimensions. Firstly, we analyse its convergence, showing that it manages to converge up to 100% of times to an evolutionarily stable normative system (ESNS) that fulfils the coordination needs of a given agent population. Secondly, we perform an interdepedencies analysis in which we analyse the effects of the interdependencies between games on the final normative systems that sense converges to. Thirdly, we test the adaptiveness of our approach, that is, its capability to adapt the normative systems it synthesises to the coordination needs of the population. Finally, we study the stability of the normative systems synthesised by our approach upon convergence. We demonstrate that, once all cars abide by an ESNS synthesised by sense, there is no alternative normative system that the agents can be tempted to switch to.
4.1 Empirical settings
Our experiments consider a discrete simulator of a traffic scenario publicly available in [16], in which agents are cars and the coordination task is to ensure that cars reach their destinations as soon as possible without colliding. The implementation of sense used in our experiments is publicly available in [15]. Figure 11a illustrates our scenario, composed of two orthogonal roads represented by a \(7 \times 7\) grid. At each time step, new cars may enter the scenario from four different entry points (labelled as “in”), and travel towards one of four exit points (labelled as “out”).
Thus, a car gets the best possible reward (reward 1) once it plays a normative game at time t and avoids collisions by going forward at time \(t+1\) (hence not delaying). A car gets a less positive reward (reward 0.7) once it has to stop in order to not collide (which is detrimental to the goal of reaching its destination as soon as possible). When a car is required to suddenly accelerate in order to not collide, it gets half the reward it gets when stopping (0.35). Finally, a car gets the worst possible reward (reward 0) once it plays a normative game and collides, not being able to progress any more. Note that the rewards for not colliding (either by moving forward, stopping or accelerating) are significantly higher than the reward for colliding. Thus, cars will give a higher importance to avoiding collisions at the expense of travelling time. In other words, we say that the cars will be highly averse to colliding.
Each car has a limited perception of the MAS and perceives the four cells immediately next to and in front of it: one cell on its left, two consecutive cells in front, and one cell on its right. For instance, in Fig. 11b the car in cell e can perceive cells a, b, c and d. A cell can contain either a car heading different directions, nothing, or a collision. While cars can perceive the position and direction of other cars in their contexts, they cannot assess whether they are moving (going forward or accelerating) or stopped.
A game is described by means of the contexts perceived by its players. Figure 11b graphically illustrates the description of a 2role game played by two of the cars in Fig. 11a. Two cars can avoid collisions in this game once one of them stops for one time step (giving way to the other car), or if it accelerates for one time step, anticipating to the other car and rapidly going through the junction. Conversely, if both cars go forward or accelerate at the same time, they will inevitably collide.
Norms specify the actions that cars are prohibited to perform once they enact each possible role in a game. Figure 12 graphically illustrates an example of a “give way to your right” norm to coordinate cars in the game depicted in Fig. 11b. Whenever a car perceives the context of role 1 in Fig. 11b (the car in cell b), then this car knows that it is enacting role 1 in the game and hence is prohibited to “go”. Conversely, a car has no established prohibitions whenever it perceives the context of player 2 (the one in cell e in Fig. 11b).
Each experiment consists of a set of executions of sense that start with a population of agents that have no norms to coordinate (that is, each agent \(ag_j \in Ag\) has an empty normative system \(\varOmega _j=\emptyset \)). Simulations run in rounds of 200 time steps. In each round, cars interact in the scenario and collisions occur as the simulation goes on.^{10} sense monitors the simulation, detecting games and creating norms as detailed in Sect. 3.3.1. At the end of each simulation, sense computes norm utilities and replicates norms as detailed in Sects. 3.3.2 and 3.3.3. We consider that sense has converged to an ESNS once all the agents have adopted the same norm in each possible game, and this condition holds for 100 generations (\({\mathcal {I}}=100\), see Sect. 3.1)
4.2 Convergence analysis
We first analyse the capability of our approach to synthesise an ESNS that avoid collisions by adapting to the coordination needs of our population of prudent drivers.
Out of 1,000 executions of sense, the system takes an average of 54 rounds to converge. On average, sense detected 89 different games that can be grouped into the four categories illustrated in Fig. 13. The first category (label a), which we call singlestop games (SSG), stands for 2role games in which the best strategy to avoid collisions is that one of the cars goes forward while the other car either: (i) stops, giving way to the first car, or (ii) accelerates, going through the junction before the first car does. Two examples of SSG are the game illustrated in Fig. 13a and the game depicted in Fig. 11b, which is very similar to the former one but also considers a third and a fourth car in cells a and d. In general, any variation in cells a, c and d of a 2role game is considered as a different game.^{11}
The third category (label c), called doublestop games (DSG), stands for 2role games in which both players need to stop in order to avoid collisions because the junction is blocked. Figure 13c shows an example of DSG, in which two cars are waiting for a collision to be removed. The fourth category (label d), which we call trafficjam games (TJG), stands for 1role games that are interdependent with DSGs. TJGs are similar to PGs, but in this case the road is blocked in front of the two cars and hence the car in front is very likely to stop. Thus, player 1 has no choice but stopping in order to avoid collisions. TJGs also include games in which the only player perceives a collision in the cell immediately in front of it, in which case it is also required to stop in order to avoid collisions.
It is worth noticing that SSGs and DSGs are only played at the junction (once the trajectories of two cars cross orthogonally), while PGs and TJGs are played before arriving at the junction, once cars are waiting behind other cars playing SSGs and DSGs in the junction.
Norms to coordinate a population of prudent drivers in interdependent SSGs and PGs, along with the percentage of times that cars converged to adopting each type of norm in these games
Norms  “no prohibitions”  “role 1: prh(go)”  “role 2: prh(go)”  “both roles: prh(go)” 

SSG  0%  49%  51%  0% 
(strategy)  (go, go)  (stop, go)  (go, stop)  (stop, stop) 
PG  –  100%  –  – 
(strategy)  (go)  (stop)  –  – 
Norms to coordinate a population of prudent drivers in interdependent DSGs and TJGs, along with the percentage of times that cars converged to adopting each type of norm in these games
Norms  “no prohibitions”  “role 1: prh(go)”  “role 2: prh(go)”  “both roles: prh(go)” 

DSGs  0%  0%  0%  100% 
(strategy)  (go, go)  (stop, go)  (go, stop)  (stop, stop) 
TJGs  0%  100%  –  – 
(strategy)  (go)  (stop)  –  – 
On average, cars converge 100% of executions to an ESNS that avoids collisions. In SSGs (Table 3) cars adopt 49% of times norms that prohibit role 1 to go, hence adopting a “give way to the right” strategy. That is, they stop when playing role 1 because action “stop” is the next one in order of preference once action “go” is forbidden (see Equation 12). Since role 2 has no prohibitions, cars go forward when playing this role. The resulting strategy is denoted as (stop, go). The remaining 51% of times cars converge to norms that prohibit role 2 to go, hence adopting a “give way to the left” strategy, denoted as (go, stop). As for PGs (Table 3), 100% of executions converge to norms that prohibit to go forward and cars adopt a “stop” strategy.
Figure 14 shows the evolutionary dynamics of norm adoption for SSGs (a), PGs (b) DSGs (c) and TJGs (d). Each square represents the possible frequency distributions of the norms from Tables 3, 4. For instance, the topleft corner of each square represents a population in which 100% of cars adopt a “no prohibitions established” norm, and the middle point of the square represents a population in which the four norms are 25% frequent. Arrows represent the gradient of norm adoption for each norm distribution, i.e., the most likely trajectory in terms of norm adoption that a population with a given norm distribution will follow.
In SSGs, cars always tend to adopt norms to either prohibiting role 1 to go (“give way to the right”) or prohibiting role 2 to go (“give way to the left”). Both norms are attractor points of the norm evolution process. If the mass of cars giving way to the right is bigger than the mass of cars giving way to the left, then all agents will tend to give way to the right in order to synchronise. As for PGs, the only attractor norm is the one that prohibits role 1 to go. Regarding DSGs, cars always tend to adopt norms to prohibit both roles to go. Thus, no matter what the initial norm distribution is, as long as at least one car adopts such a norm its fitness will be higher than that of any other car, and the whole population will eventually adopt its norm. Consequently, cars tend to adopt norms to prohibit role 1 to go in TJGs.
4.3 Interdependencies analysis
Norms to coordinate prudent drivers in SSGs with aggressive drivers in PGs, along with the percentage of times that cars converged to adopting each type of norm in these games
Norms  “no prohibitions”  “role 1:prh(go,stop)”  “role 2: prh(go,stop)”  “both roles:prh(go,stop)” 

SSGs  0%  50%  50%  0% 
(strategy)  (go, go)  (acc, go)  (go, acc)  (acc, acc) 
PGs  100%  0%  –  – 
(strategy)  (go)  –  –  – 
Norms to coordinate prudent drivers in interdependent DSGs, along with the percentage of times that cars converged to adopting each type of norm in these games
Norms  “no prohibitions”  “role 1: prh(go)”  “role 2: prh(go)”  “both roles: prh(go)” 

DSGs  0%  0%  0%  100% 
(strategy)  (go, go)  (stop, go)  (go, stop)  (stop, stop) 
Norms to coordinate aggressive drivers in TJGs, along with the percentage of times that cars converged to adopting each type of norm in these games
Norms  “no prohibitions”  “role 1: prh(go,acc)” 

TJGs  0%  100% 
(strategy)  (go)  (stop) 
As for DSGs (Table 6) we observe similar convergence results to the ones illustrated in Sect. 4.2. Despite the aggressiveness of cars in TJGs, the best strategy in DSGs is still 100% of times to “stop always” in order to not collide with whatever is blocking the road. This provokes cars to converge 100% to norms that prohibit to go forward and to accelerate when playing a TJG (Table 7), hence adopting a “stop” strategy. Figure 15 shows the effects of games interdependencies on the dynamics of norms adoption. Because cars always tend to adopt a “go” strategy in PGs, they also tend to adopt strategies to accelerate from either the left or the right in SSGs (label a). The dynamics of DSGs and TJGs are similar to the ones observed in Sect. 4.2.
Notice therefore that SSGs and PGs cannot be resolved in isolation (separately), for the resulting normative system would not be evolutionarily stable. The rationale is as follows. If we disregard PGs, we can evolve norms in a SSG until the process converges to a norm that prohibits cars to “give way” to one side. Such a norm will successfully avoid collisions when cars play the SSG in isolation, for no other cars could affect the outcome of the game. Similarly, by disregarding SSGs we can evolve norms in a PG until converging to a “no prohibitions” norm, for the car in front would never be playing a SSGDSG and hence would never stop. But, if we provide with these norms and put them to simultaneously play SSGs and PGs at runtime, they will not avoid collisions: cars will go forward in PGs, hitting the players of SSGs from behind. By coevolving norms from interdependent games together, sense can synthesise a normative system that coordinates cars when playing SSGs and PGs both in isolation and simultaneously.
4.4 Adaptiveness analysis
Reward functions to model populations with different degrees of collision aversion. The lower rewards (e.g., \(r^t_0, r^t_1\)) represent populations with lower aversion to colliding. The higher rewards (e.g., \(r^t_9, r^t_{10}\)) represent populations with higher aversion to colliding
Outcome  \(r^t_0\)  \(r^t_1\)  \(r^t_2\)  \(r^t_3\)  \(r^t_4\)  \(r^t_5\)  \(r^t_6\)  \(r^t_7\)  \(r^t_8\)  \(r^t_9\)  \(r^t_{10}\) 

Goes and avoids collisions  1  1  1  1  1  1  1  1  1  1  1 
Stops and avoids collisions  0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 
Accelerates and avoids collisions  0  0.05  0.1  0.15  0.2  0.25  0.3  0.35  0.4  0.45  0.5 
Collides  0  0  0  0  0  0  0  0  0  0  0 

the number of rounds that sense requires to converge, normalised from 0 to 100 range, (where 100 represents the average maximum number of rounds required to converge out of all simulations).

the average frequency with which cars optimally converge in the games that achieved 100% successful convergence in Sect. 4.2: SSGs, DSGs and TJGs. That is, the frequency with which they converge to a norm like “give way to the right” or a norm like “give way to the left” in SSGs, and to a norm like “always stop” in DSGs and TJGs.

the average frequency with which cars optimally converge in PGs. That is, the frequency with which they converge to a norm that prohibits to go. In fact, this norm is the only one that allows to avoid 100% of collisions in PGs.

the collision avoidance rate during the last round of the simulation (once the simulation has converged and the cars have adopted an ESNS).
As the collision aversion increases, simulations take less number of rounds to converge and cars adopt norms that prohibit to go more frequently. For low collision aversions (\(r^t_1\) to \(r^t_3\)), simulations still take a high number of rounds (269, 262 and 185 rounds, respectively, which are normalised as 96, 94 and 66.5), but cars adopt ESNSs that avoid up to 82% of collisions. Specifically, cars converge optimally up to 72% of times in SSGs, DSGs and TJGs, and up to 64% of times in PGs. The reason that this frequency is slightly lower in PGs is because in these type of games cars do not always collide once they choose to go forward. Hence, cars occasionally converge to norms establishing no prohibitions, which cannot fully avoid collisions. For middle and high collision aversion degrees (\(r^t_4\) to \(r^t_9\)), the number of rounds necessary to converge decreases significantly. The best results are given by functions \(r^t_7\) and \(r^t_8\), with which convergence is achieved on an average of up to 54 rounds (normalised as 19 in the plot), and cars optimally converge in the 100% of SSGs, DSGs and TJGs – yet they optimally converge up to 91% in PGs, hence avoiding up to 93% of collisions.
With total collision aversion (\(r^t_{10}\)), the number of rounds necessary to converge increases again (up to 237, normalised as 85). This happens because the reward for stopping and not colliding, and the reward for going forward and not colliding are equal. Hence, the fitness of all the norms that avoid collisions in SSGs (either by prohibiting one role or both roles to go) are similar. In consequence, cars need extra time to decide which norm to adopt. Upon convergence, cars adopt ESNSs containing only norms that prohibit both roles to go in SSGs and DSGs, hence converging optimally for DSGs, but not for SSGs. As a result, cars remain stopped indefinitely and 100% of collisions are avoided. It turns out that cars are so afraid of colliding that they do not mind to stay indefinitely in place in order to avoid collisions.
4.5 Stability analysis
Finally, we analyse the stability of the normative systems synthesised by our approach upon convergence. With this aim, we perform 100 different executions of sense that consider a population of 100 agents of which the 100% abide by an ESNS of those synthesised in the experiment of Sect. 4.2, which we will call \(\varOmega ^*\). Each execution of sense lasts 400 rounds, i.e., 400 iterations of the evolutionary process illustrated in Fig. 3, composed of MAS simulation (Fig. 3(2)) and norm replication (Fig. 3(3, 4)). Each MAS simulation lasts 200 time steps, during which cars interact in the junction, playing different games and coordinating by means of their ESNS. In each round, during norm replication, sense randomly chooses a 10% of agents (i.e., 10 agents) to be mutated. For each one of these agents, sense randomly chooses a 10% of norms from its normative system (8.9 on average, since each ESNS contains 89 norms on average), and replaces them with random norms from the norm spaces of their games. Thus, on average, in each round sense mutates an average of 89 norms in the agents’ normative systems.
In 100% of sense’s executions, cars ultimately adopted normative system \(\varOmega ^*\). Figure 17 illustrates the dynamics of one of these simulations. The xaxis shows the different rounds of the simulation, and the yaxis depicts the id’s of the normative systems created over time. Black dots represent mutant normative systems created in each round (with which \(\varOmega ^*\) has to compete), and the red line indicates the id of the most frequent normative system. For the sake of clarity, we represent \(\varOmega ^*\) as the normative system with id 1,000. After 200 rounds, the simulation created 2,500 different mutant normative systems. Upon round 400, normative system \(\varOmega ^*\) remained stable most of the time. In punctual rounds, sense generated a high number of mutant normative systems, making the frequency of \(\varOmega ^*\) to go below stability. This happened because at certain rounds there were more mutant agents than agents with \(\varOmega ^*\). But, after a few rounds, \(\varOmega ^*\) replicated and mutant agents ended up adopting \(\varOmega ^*\), thus becoming again the most frequent normative system. Upon round 400, the cars converged to adopting \(\varOmega ^*\), thus demonstrating that \(\varOmega ^*\) is a best choice for the agents.
Finally, let us detail why \(\varOmega ^*\) remains stable over time with an example. Consider the game of our running example (Figure 13a, pp. 22) in which two cars approach each other in a junction. Say that \(\varOmega ^*\), which is initially adopted by all the agents, contains a “give way to your right” norm, i.e., prohibited to go when enacting role 1, and allowed to go when enacting role 2. Destabilising this norm would require, at least, a 50% of mutant agents that adopt an alternative “give way to the left” norm. Intuitively, even if there would be a 49% of mutant agents that give way to the left, cars would still be more likely to encounter other cars that give way to the right (51% likely). Therefore, giving way to the right would still be the best choice for the agents, and hence would remain stable. However, should these probabilities be equal (once a 50% of cars are mutant), then the agents could end up adopting either of these norms. In our scenario, this applies to every SingleStop Game (Fig. 13a) and DoubleStop Game (Fig. 13a). Since these conditions never occur during our stability experiments, \(\varOmega ^*\) remains stable until the end.
5 Related work
Broadly speaking, research on norm synthesis can be classified into two main strands of work: offline design, and online synthesis. Pioneered by Shoham and Tennenholtz [30], offline design aims at designing the norms that will coordinate the agents before they start to do their work [8, 11, 30, 35]. Alternatively, online synthesis studies how norms can come to exist while the agents interact at runtime. Most online approaches focus on investigating how norms emerge from agent societies through an iterated process whereby agents tend to adopt bestperforming strategies [2, 3, 10, 29, 36, 37]. For an extensive survey about several works on norm emergence, the reader may refer to the work in [23].
Alternatively, recent work by Morales et al. approached online synthesis by employing designated agents that observe agents’ interactions at runtime and generate norms aimed at resolving conflict situations [18, 19]. Later on, Mashayekhi et al. [14] extended this work by proposing a hybrid mechanism in which norms are synthesised by designated agents, and the agent society iteratively selects and spread bestperforming norms, ultimately adopting most successful ones.
The closest approach to our work in the literature is that of norm emergence, and particularly the study of norm evolution and stability. One of the pioneer works on this approach is the one by Axelrod [3, 4], which studies how norms evolve and emerge as stable patterns of behaviour of agent societies. Axelrod considers a gametheoretic setting in which agents repeatedly play a single twoplayer game by employing different strategies. The strategies that allow the agents to achieve better personal results prosper and spread. A (stable) norm is said to have emerged once a majority of agents abide by the same strategy that is sustained over time.
Subsequently, many researchers have studied norm emergence by employing the game theoretic approach. In [27], Rajiv Sethi extended the work of Axelrod and studied how social norms of vengeance and cooperation emerge within agent societies. With this aim, Sethi incorporated the solution concept of evolutionarily stable strategy (ESS) and the principle of replicator dynamics from evolutionary game theory (EGT) [32]. Again, this work considers that agents play a single twoplayer game, and hence one norm can be synthesised. Shoham and Tennenholtz [31] introduced a framework for the emergence of social conventions as points of (Nash) equilibria in stochastic settings. They introduced a natural strategyselection rule whereby the agents eventually converge to rationally acceptable social conventions.
Later, Sen and Airiau proposed in [26] a social learning model whereby agents can learn their policies and norms can emerge over repeated interactions between the agents in twoplayer games. Many works have considered this model to study further criteria that affect to the emergence of norms. Of these, the closest to our work is perhaps the one by Sugawara et. al. [33, 34], in which conflict situations are characterised as Markov games, and a model is introduced to evolve norms that successfully coordinate the agents in these games.
More recent work (such as that by Santos et al. [21]) studies how cooperation norms can emerge once the agents can explore alternative strategies, i.e. they have arbitrary exploration rates. They show that cooperation emergence depends on both the exploration rate of the agents and the underlying norms at work. Similarly, Soham et al. [9] introduce an EGTbased model to study how norms change in agent societies in terms of the need for coordination and the agents’ exploration rate. They show that societies with high needs for coordination tend to lower exploration rates and higher norm compliance, while societies with lower coordination needs lead to higher exploration rates. Also, Lorini et al. [13] introduce a model for the evolution and emergence of fairness norms in relation to the degree of sensitivity (internalisation) of the agents to these norms. They show that, in the long term, the higher the sensitivity of the agents to norms, the more beneficial for the social welfare.
From Axelrod [3, 4] to Lorini [13], our approach is different to all the aforementioned works for several reasons. Most previous works consider that the agents play a single game whose payoffs are known beforehand. Unlike them, our framework considers a setting in which agents can play multiple, interdependent games whose outcomes may depend on each other. Our framework performs runtime detection of interdependent games, and automatically creates norms whose coordination utility is computed based on the rewards to the agents once they repeatedly play each game over time. To the best of our knowledge, our framework is the first one in introducing the analytical concept of evolutionarily stable normative system (ESNS) as a set of norms that, together, successfully coordinate the agents in multiple interdependent games.
6 Conclusions and future work
In this work we introduced sense, a framework for the offline synthesis of evolutionarily stable normative systems (ESNS), whose compliance forms a rational choice for the agents. sense synthesises sets of codependent norms that, together, successfully coordinate the agents in multiple, interdependent coordination situations that cannot be easily modelled and resolved separately beforehand. ESNSs are synthesised by carrying out a natural selection process inspired in evolutionary game theory (EGT) whereby the agents tend to adopt the norms that are more successful to coordinate them in strategic situations.
sense assumes no previous knowledge about the potential situations in which agents may need coordination, neither about their outcomes. Instead, it learns these by running MAS simulations, detecting situations that lead to undesirable outcomes, and modelling them as interdependent oneshot games. sense automatically synthesises norms for each game, and makes norms compete with each other in repeated game plays, iteratively learning their utilities in an empirical manner. Norms that are more useful to coordinate the agents in games prosper and spread, and are ultimately adopted by the agents. The outputs of such an evolutionary process are normative systems whose norms’ coordination utilities are codependent. Together, these norms are evolutionarily stable and effectively coordinate the agents in a variety of interdependent situations. Once the agents are provided with these norms, no agent can benefit from either violating them or switching to alternative ones.
We provided evidence of the quality and the relevance of our approach through an empirical evaluation in a simulated traffic scenario. We showed that our framework converges 100% of times to ESNSs that satisfy the coordination needs of car populations, avoiding collisions in a numerous (up to 89) interdependent traffic situations. We showed that the ESNSs synthesised by sense can only be synthesised by coevolving norms from interdependent games together. Otherwise, the resulting normative systems might be ineffective and unstable. We illustrated the capability of our approach to adapt norm synthesis to the preferences of the agent population, showing that different types of normative systems can be synthesised as one considers agent populations with different preferences and behaviours.
As future work, there are multiple opportunities for research. First, we plan to enhance sense to synthesise essential norms [36]. sense’s model in this paper is described in terms of the individual goals of the agents, and hence, sense is capable of synthesising norms that are evolutionarily stable once they are useful for the agents to satisfy their own goals. Nevertheless, sense cannot be employed to synthesise essential norms that allow to achieve a global, systemlevel goal established by a system designer that might be unaligned with the agents’ goals. With this aim, we plan to add a sanction mechanism in sense to synthesise normative systems that achieve some systemlevel goals, while being evolutionarily stable.
Second, and more importantly, we plan to improve sense to synthesise ESNSs without the need for simulation. In this improved version of sense, a system designer will only need to provide domain information in the form of small sets of twoplayer games, along with their potential interdependencies represented as relationships in a games network. Based on this information, sense will carry out a coevolutionary process that mathematically simulates the coevolution of agents’ strategies together with norms. For instance, in the traffic scenario considered in this paper, sense could synthesise sets of essential norms that are evolutionarily stable by only considering a few (up to five) twoplayer games describing basic car interactions (e.g., two cars in a junction, two cars doing a queue, and so on) along with their interdependency relationships.
7 Discussion
In what follows we provide guidelines about potential uses of sense. First, we describe sense’s limiting assumptions and how sense could be employed once certain conditions hold. Then, we detail how some of these assumptions could be lifted in order to apply sense to a wider variety of problems.

Agents must be rational. sense assumes that agents always choose the strategies (norms) that allow them to maximise their individual payoffs (fitnesses). The assumption of rationality is a common and sensible assumption in norm synthesis research that has allowed to use ideas from game theory and evolutionary game theory to study how autonomous agents might interact and, in particular, how agents could reach successful conventions [1, 3, 22, 24, 25, 26, 31, 37].

Agents’ interactions can be simulated. sense employs simulation of a MAS in order to learn its potential games along with the codependent utilities of their norms. Therefore, the MAS at hand should be capable of being simulated.

Agents’ preferences can be modelled at design time. sense assumes some domain information that must be provided by a system designer together with the MAS simulator, such as the agents’ action preferences and rewards. For example, in Sect. 4 sense considers a reward function (Eq. 16 in page 26) that returns the individual reward of a car once it collides in some situation, once it is able to progress safely, and so on.

MAS conflicts must be identifiable. Since sense’s game detection is based on the detection of conflicts (e.g., car collisions), it must be able to identify once a group of agents are engaged in a conflict.

Agents’ actions consequences arise immediately. sense’s oneshot games detection is based on the assumption that once a conflict arises at a given time t, the cause of the conflict can be identified from the actions of the agents at the previous time step, \(t1\).

Agents’ preferences and behaviours do not change over time. sense is intended for offline norms design, and hence, it synthesises normative systems that are evolutionarily stable whenever agents’ preferences and actions do not change over time. However, should these conditions change at runtime, an ESNS may not be stable any more.
In order to synthesise ESNSs online, each agent should be embedded with a modified version of sense illustrated in Fig. 18. Each agent should proceed by continuously interacting in the MAS, detecting when it engages in conflicts (Fig. 18(1)). Based on detected conflicts, the agent could continuously learn (keep track of) the potential games it can play, and creating different norms to coordinate in each possible game (norm generation, Fig. 18(2)). Game detection and norm generation should be performed as illustrated in Sect. 3.3.1. Over time, the agent would play different games by enacting different roles and by playing against different agents. Unlike the original offline version of sense, in this online approach the agent would always have available all the norms of the norm space of a game once it plays the game. In each game play, the agent should choose one norm to apply in the game out of the norm space of the game according to its frequency (probability). Thus, if a norm is highly frequent, the agent would be very likely to apply it every time it plays the game. This would allow the agent to accumulate evidence about the rewards it obtains by applying different norms in the game in order to compute the utility of each norm as described in Sect. 3.3.2 (utility learning, Fig. 18(3)). Once the agent would have enough evidence about the utility of each norm, it could internally simulate norm evolution by performing multiple norm replication steps as described in Sect. 3.3.3 (Fig. 18(4)). The output of norm evolution would be a prediction about the set of norms that might be evolutionarily stable given the current MAS conditions, i.e., an ESNS. Then, the agent should just immediately adopt these norms in order to try to achieve stable coordination. If the adopted norms were not stable, then the agent could keep on accumulating evidence about norms’ utilities and carrying out norm evolution again in order to make more accurate predictions.
Notice though that applying sense online requires to make some assumptions about the agents’ capabilities. Particularly, agents need to be endowed with capabilities to (1) detect conflicts; (2) assess the contexts and observe the actions of other agents in games; (3) explicitly create games along with their norms; (4) empirically evaluate norms’ utilities; and to (5) replicate norms’ probabilities.
Finally, notice that by employing sense in an online manner the agents could adapt their normative systems to the changing conditions of the MAS at runtime. That is, once the agents have adopted an ESNS, should the MAS’ conditions change (such as the agents’ rewards, or the type of conflicts), the agents could gather new evidence in order to evolve norms again, thus predicting new ESNSs that adapt to the new system conditions.
Footnotes
 1.
iron also considers the effects of the agents not complying with norms in order to assess whether they are necessary. However, such a necessity is not relevant for the purposes of this paper.
 2.
Action “go” may include turns.
 3.
This function has to be characterised for each particular domain in which iron is to be employed.
 4.
With this aim, iron needs to monitor the MAS by means of a window of at least two time steps (that is, the current MAS state at time t and the previous MAS state at time \(t1\)).
 5.
Provided that the disturbance is not too large. For example, a small number of mutant strategists joins the scenario, and after some time they are “eliminated” by dominant strategists.
 6.
This value must be specified by a system designer.
 7.
sense simulates 100% norm compliance in order to be able to evaluate norms’ utilities once the agents comply with them. However, 100% compliance is not actually required in realtime MAS execution. Once the agents are provided with an ESNS, this normative system establishes a convention for all the agents in each possible game. Such a convention has the property that no agent could benefit from violating it or from adopting different norms, and hence, agents would voluntarily comply with norms.
 8.
In principle, it is not possible to assume the composition of \(\mathbf {a}\), i.e., the actions that these agents will perform. However, we assume that the actions in \(\mathbf {a}\) comply with the prohibitions established by their respective norms in \(\mathbf {n}\).
 9.
Initially, each history function \(h_j \in H\) will return an empty sequence of rewards. Consequently, each empirical utility function \(u_i \in U\) will return an undefined value.
 10.
Whenever two or more cars collide, they remain in the scenario for 5 time steps until they are removed. With this delay we aim to simulate the time that the emergency services require to remove collided cars.
 11.
This is because sense assumes no previous knowledge about the potential games of the MAS, and hence it cannot assume that two similar situations will in fact correspond to the same game.
Notes
Acknowledgements
Research supported by the H2020MSCAIF project number 707688 (EVNSMAS). Michael Wooldridge was supported by the European Research Council under Grant 291528 (RACE). Juan A. RodríguezAguilar and Maite LópezSánchez were funded by projects TASSAT3: TIN201676573C22P and Collectiveware TIN201566863C21R (MINECO/FEDER).
References
 1.Ågotnes, T., van der Hoek, W., & Wooldridge, M. (2007). Normative system games. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, page 129. ACM.Google Scholar
 2.Airiau, S., Sen, S., & Villatoro, D. (2014). Emergence of conventions through social learning. Autonomous Agents and MultiAgent Systems, 28(5), 779–804.CrossRefGoogle Scholar
 3.Axelrod, R. (1986). An evolutionary approach to norms. American political science review, 80(04), 1095–1111.CrossRefGoogle Scholar
 4.Axelrod, R. M. (1997). The complexity of cooperation: Agentbased models of competition and collaboration. Princeton: Princeton University Press.Google Scholar
 5.Azar, O. H. (2004). What sustains social norms and how they evolve? The case of tipping. Journal of Economic Behavior & Organization, 54(1), 49–64.MathSciNetCrossRefGoogle Scholar
 6.Björnerstedt, J., & Weibull, J. W. (1994). Nash equilibrium and evolution by imitation. Working Paper Series 407, Research Institute of Industrial Economics.Google Scholar
 7.Boella, G., Van Der Torre, L., & Verhagen, H. (2006). Introduction to normative multiagent systems. Computational & Mathematical Organization Theory, 12(2–3), 71–79.CrossRefGoogle Scholar
 8.Corapi, D., Russo, A., de Vos, M., Padget, J., & Satoh, K. (2011). Normative design using inductive learning. Theory and Practice of Logic Programming, 11(4—5), 783–799.MathSciNetCrossRefzbMATHGoogle Scholar
 9.De, S., Nau, DS., & Gelfand, M.J. (2017). Understanding norm change: An evolutionary gametheoretic approach. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’17, pages 1433–1441, Richland, SC, 2017. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
 10.Delgado, J. (2002). Emergence of social conventions in complex networks. Artificial Intelligence, 141(1–2), 171–185.MathSciNetCrossRefGoogle Scholar
 11.Fitoussi, D., & Tennenholtz, M. (2000). Choosing social laws for multiagent systems: Minimality and simplicity. Artificial Intelligence, 119(1–2), 61–101.MathSciNetCrossRefzbMATHGoogle Scholar
 12.Iwai, K. (1984). Schumpeterian dynamics: An evolutionary model of innovation and imitation. Journal of Economic Behavior & Organization, 5(2), 159–190.CrossRefGoogle Scholar
 13.Lorini, E., & Mühlenbernd, R. (2015). The longterm benefits of following fairness norms: A gametheoretic analysis. In International Conference on Principles and Practice of MultiAgent Systems, pp. 301–318. Springer.Google Scholar
 14.Mashayekhi, M., Du, H., List, G.F., & Singh, M. P. (2016). Silk: A simulation study of regulating open normative multiagent systems. In Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, IJCAI’16, pp. 373–379. AAAI Press.Google Scholar
 15.Morales, J. System for evolutionary norm synthesis (sense), a framework to synthesise evolutioanrily stable normative systems for multiagent systems. https://github.com/NormSynthesis/SENSE, 2016–2017.
 16.Morales, J. A traffic simulator for the system for evolutionary norm synthesis (sense). https://github.com/NormSynthesis/SENSE_Simulators. 2016–2017.
 17.Morales, J., LópezSánchez, M., RodríguezAguilar, J. A., Vasconcelos, W., & Wooldridge, M. (2015). Online automated synthesis of compact normative systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 10(1), 2:1–2:33.Google Scholar
 18.Morales, J., LopezSanchez, M., RodriguezAguilar, JA., Wooldridge M., & Vasconcelos, W. (2013). Automated synthesis of normative systems. In Proceedings of the AAMAS’13, pp. 483–490. IFAAMAS.Google Scholar
 19.Morales, J., LopezSanchez, M., RodriguezAguilar, J.A., Wooldridge, M., & Vasconcelos, W.(2015). Synthesising liberal normative systems. In AAMAS ’15: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 433–441. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 04/05/2015 2015.Google Scholar
 20.Rapoport, A., & Chammah, A. M. (1965). Prisoner’s dilemma: A study in conflict and cooperation (Vol. 165). Ann Arbor: University of Michigan press.CrossRefGoogle Scholar
 21.Santos, F. P., Pacheco, J. M., & Santos, E. C. (2016). Evolution of cooperation under indirect reciprocity and arbitrary exploration rates. Scientific Reports, 6, 37517.CrossRefGoogle Scholar
 22.Savarimuthu B.T.R. (2011). Mechanisms for norm emergence and norm identification in multiagent societies. Ph.D. thesis, University of Otago.Google Scholar
 23.Savarimuthu, B. T. R., & Cranefield, S. (2011). Norm creation, spreading and emergence: A survey of simulation models of norms in multiagent systems. Multiagent Grid Systems, 7(1), 21–54.CrossRefGoogle Scholar
 24.Savarimuthu, B. T. R., Purvis, M., Purvis, M., & Cranefield, S. (2009). Social norm emergence in virtual agent societies. In M. Baldoni, T. C. Son, M. B. van Riemsdijk, & M. Winikoff (Eds.), Declarative Agent Languages and Technologies VI (pp. 18–28). Berlin: Springer.CrossRefGoogle Scholar
 25.Sen, O., & Sen, S. (2010). Effects of social network topology and options on norm emergence. In Proceedings of COIN, pp. 211–222.Google Scholar
 26.Sen, S., & Airiau, S.(2007). Emergence of norms through social learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, pp. 1507–1512, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc.Google Scholar
 27.Sethi, R. (1996). Evolutionary stability and social norms. Journal of Economic Behavior & Organization, 29(1), 113–140.MathSciNetCrossRefGoogle Scholar
 28.Shoham, Y., & LeytonBrown, K. (2009). Multiagent Systems: Algorithmic, GameTheoretic, and Logical Foundations. New York: Cambridge University Press.zbMATHGoogle Scholar
 29.Shoham, Y., & Tennenholtz, M.(1992). Emergent conventions in multiagent systems: Initial experimental results and observations (preliminary report). In B. Nebel, C. Rich, and W. R. Swartout, editors, KR, pages 225–231. Morgan Kaufmann.Google Scholar
 30.Shoham, Y., & Tennenholtz, M. (1995). On social laws for artificial agent societies: offline design. Artificial Intelligence, 73(1–2), 231–252.CrossRefGoogle Scholar
 31.Shoham, Y., & Tennenholtz, M. (1997). On the emergence of social conventions: modeling, analysis, and simulations. Artificial Intelligence, 94(1), 139–166.CrossRefzbMATHGoogle Scholar
 32.Smith, J. M. (1988). Evolution and the theory of games (pp. 202–215). Boston, MA: Springer.Google Scholar
 33.Sugawara, T. (2011). Emergence and stability of social conventions in conflict situations. In Proceedings of the TwentySecond international joint conference on Artificial IntelligenceVolume Volume One, pp. 371–378. AAAI Press, 2011.Google Scholar
 34.Sugawara, T. (2014). Emergence of conventions for efficiently resolving conflicts in complex networks. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)—Volume 03, WIIAT ’14, pp. 222–229, Washington, DC, USA. IEEE Computer Society.Google Scholar
 35.van Der Hoek, W., Roberts, M., & Wooldridge, M. (2007). Social laws in alternating time: Effectiveness, feasibility, and synthesis. Synthese, 156(1), 1–19.MathSciNetCrossRefzbMATHGoogle Scholar
 36.Villatoro, D.(2011). Selforganization in decentralized agent societies through social norms. In The 10th International Conference on Autonomous Agents and Multiagent SystemsVolume 3, pp. 1373–1374. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
 37.Walker, A., & Wooldridge, M.(1995). Understanding the emergence of conventions in multiagent systems. In International Conference on Multiagent Systems, ICMAS’95, pp. 384–389.Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.