Handbook of Dynamic Game Theory pp 147  Cite as
Trends and Applications in Stackelberg Security Games
Abstract
Security is a critical concern around the world, whether it is the challenge of protecting ports, airports, and other critical infrastructure; interdicting the illegal flow of drugs, weapons, and money; protecting endangered wildlife, forests, and fisheries; or suppressing urban crime or security in cyberspace. Unfortunately, limited security resources prevent full security coverage at all times; instead, we must optimize the use of limited security resources. To that end, we founded a new “security games” framework that has led to building of decision aids for security agencies around the world. Security games are a novel area of research that is based on computational and behavioral game theory while also incorporating elements of AI planning under uncertainty and machine learning. Today securitygamesbased decision aids for infrastructure security are deployed in the US and internationally; examples include deployments at ports and ferry traffic with the US Coast Guard, for security of air traffic with the US Federal Air Marshals, and for security of university campuses, airports, and metro trains with police agencies in the US and other countries. Moreover, recent work on “green security games” has led our decision aids to be deployed, assisting NGOs in protection of wildlife; and “opportunistic crime security games” have focused on suppressing urban crime. In cybersecurity domain, the interaction between the defender and adversary is quite complicated with high degree of incomplete information and uncertainty. Recently, applications of game theory to provide quantitative and analytical tools to network administrators through defensive algorithm development and adversary behavior prediction to protect cyber infrastructures has also received significant attention. This chapter provides an overview of useinspired research in security games including algorithms for scaling up security games to realworld sized problems, handling multiple types of uncertainty, and dealing with bounded rationality and bounded surveillance of human adversaries.
Keywords
Security games Scalability Uncertainty Bounded rationality Bounded surveillance Adaptive adversary Infrastructure security Wildlife protection1 Introduction
Security is a critical concern around the world that manifests in problems such as protecting our ports, airports, public transportation, and other critical national infrastructure from terrorists, in protecting our wildlife and forests from poachers and smugglers, and curtailing the illegal flow of weapons, drugs, and money across international borders. In all of these problems, we have limited security resources which prevents security coverage on all the targets at all times; instead, security resources must be deployed intelligently taking into account differences in the importance of targets, the responses of the attackers to the security posture, and potential uncertainty over the types, capabilities, knowledge, and priorities of attackers faced.
To address these challenges in adversarial reasoning and security resource allocation, a new “security games” framework has been developed (Tambe 2011); this framework has led to building of decision aids for security agencies around the world. Security games are based on computational and behavioral game theory while also incorporating elements of AI planning under uncertainty and machine learning. Security games algorithms have led to successes and advances over previous humandesigned approaches in security scheduling and allocation by addressing the key weakness of predictability in humandesigned schedules. These algorithms are now deployed in multiple applications. The first application was ARMOR , which was deployed at the Los Angeles International Airport (LAX) in 2007 to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals (Jain et al. 2010b). Following that came several other applications: IRIS , a gametheoretic scheduler for randomized deployment of the US Federal Air Marshals (FAMS) , has been in use since 2009 (Jain et al. 2010b); PROTECT , which schedules the US Coast Guard ’s randomized patrolling of ports, has been deployed in the port of Boston since April 2011 and is in use at the port of New York since February 2012 (Shieh et al. 2012) and has spread to other ports such as Los Angeles/Long Beach, Houston, and others; another application for deploying escort boats to protect ferries has been deployed by the US Coast Guard since April 2013 (Fang et al. 2013); and TRUSTS (Yin et al. 2012) which has been evaluated in field trials by the Los Angeles Sheriff’s Department (LASD) in LA Metro system. Most recently, PAWS – another gametheoretic application – was tested by rangers in Uganda for protecting wildlife in Queen Elizabeth National Park in April 2014 (Yang et al. 2014); MIDAS was tested by the US Coast Guard for protecting fisheries (Haskell et al. 2014). These initial successes point the way to major future applications in a wide range of security domains.
Researchers have recently started to explore the use of such security game models in tackling security issues in the cyber world. In Vanek et al. (2012), the authors study the problem of optimal resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple computers of differing importance. In their paper, they study the application of security games to deep packet inspection as countermeasure to intrusion detection. In a recent paper (Durkota et al. 2015), the authors study the problem of optimal number of honeypots to be placed in a network using a security game framework. Another interesting work, called audit games (Blocki et al. 2013, 2015), enhances the security games model with choice of punishments in order to capture scenarios of security and privacy policy enforcement in large organizations (Blocki et al. 2013, 2015).
Given the many gametheoretic applications for solving realworld security problems, this chapter provides an overview of the models and algorithms, key research challenges, and a description of our successful deployments. Overall, the work in security games has produced numerous decision aids that are in daily use by security agencies to optimize their limited security resources. The implementation of these applications required addressing fundamental research challenges. We categorize the research challenges associated with security games into four broad categories: (1) addressing scalability across a number of dimensions of the game, (2) tackling different forms of uncertainty that be present in the game, (3) addressing human adversaries’ bounded rationality and bounded surveillance (limited capabilities in surveillance), and (4) evaluation of the framework in the field. Given the success in providing solutions for many security domains involving the protection of critical infrastructure, the topic of security games has evolved and expanded to include new types of security domains, for example, for wildlife and environmental protection.
The rest of the chapter is organized as follows: Sect. 2 introduces the general security games model, Sect. 3 discusses three different types of security games, Sect. 4 describes the approaches used to tackle scalability issues, Sect. 5 describes the approaches to deal with uncertainty, Sect. 6 focuses on bounded rationality and bounded surveillance , and Sect. 7 provides details of field evaluation of the science of security games.
2 Stackelberg Security Games
Stackelberg games were first introduced to model leadership and commitment (von Stackelberg 1934). A Stackelberg game is a game played sequentially between two players: the first player is the leader who commits to a strategy first, and then the second player, called the follower, observes the strategy of the leader and then commits to his own strategy. The term Stackelberg security games (SSG ) was first introduced by Kiekintveld et al. (2009) to describe specializations of a particular type of Stackelberg game for security as discussed below. This section provides details on this use of Stackelberg games for modeling security domains. We first give a generic description of security domains followed by security games, the model by which security domains are formulated in the Stackelberg game framework.^{1}
2.1 Stackelberg Security Game
In Stackelberg security games, a defender must perpetually defend a set of targets T using a limited number of resources, whereas the attacker is able to surveil and learn the defender’s strategy and attack after careful planning. An action, or pure strategy, for the defender represents deploying a set of resources R on patrols or checkpoints, e.g., scheduling checkpoints at the LAX airport or assigning federal air marshals to protect flight tours. The pure strategy for an attacker represents an attack at a target, e.g., a flight. The mixed strategy of the defender is a probability distribution over the pure strategies. Additionally, each target is also associated with a set of payoff values that define the utilities for both the defender and the attacker in case of a successful or a failed attack.
A key assumption of Stackelberg security games (we will sometimes refer to them as simply security games) is that the payoff of an outcome depends only on the target attacked and whether or not it is covered (protected) by the defender (Kiekintveld et al. 2009). The payoffs do not depend on the remaining aspects of the defender allocation. For example, if an adversary succeeds in attacking target t_{1}, the penalty for the defender is the same whether the defender was guarding target t_{2} or not.
Example of a security game with two targets
Defender  Attacker  

Target  Covered  Uncovered  Covered  Uncovered  
t _{1}  10  0  −1  1  
t _{2}  0  −10  −1  1 
This allows us to compactly represent the payoffs of a security game. Specifically, a set of four payoffs is associated with each target. These four payoffs are the rewards and penalties to both the defender and the attacker in case of a successful or an unsuccessful attack and are sufficient to define the utilities for both players for all possible outcomes in the security domain. More formally, if target t is attacked, the defender’s utility is U _{ d } ^{ c }(t) if t is covered or U _{ d } ^{ u }(t) if t is not covered. The attacker’s utility is U _{ a } ^{ c }(t) if t is covered or U _{ a } ^{ u }(t) if t is not covered. Table 1 shows an example security game with two targets, t _{1} and t _{2}. In this example game, if the defender was covering target t _{1} and the attacker attacked t _{1}, the defender would get 10 units of reward, whereas the attacker would receive − 1 units. We make the assumption that in a security game, it is always better for the defender to cover a target as compared to leaving it uncovered, whereas it is always better for the attacker to attack an uncovered target. This assumption is consistent with the payoff trends in the real world. A special case is zerosum games, in which for each outcome the sum of utilities for the defender and attacker is zero, although general security games are not necessarily zero sum.
2.2 Solution Concept: Strong Stackelberg Equillibrium
The solution to a security game is a mixed strategy^{2} for the defender that maximizes the expected utility of the defender, given that the attacker learns the mixed strategy of the defender and chooses a best response for himself. The defender’s mixed strategy is a probability distribution over all pure strategies, where a pure strategy is an assignment of the defender’s limited security resources to targets. This solution concept is known as a Stackelberg equilibrium (Leitmann 1978).
The most commonly adopted version of this concept in related literature is called Strong Stackelberg Equilibrium (SSE) (Breton et al. 1988; Conitzer and Sandholm 2006; Paruchuri et al. 2008; von Stengel and Zamir 2004). In security games, the mixed strategy of the defender is equivalent to the probabilities that each target t is covered by the defender, denoted by C = { c _{ t }} (Korzhyk et al. 2010). Furthermore, it is enough to consider a pure strategy of the rational adversary (Conitzer and Sandholm 2006), which is to attack a target t. The expected utility for defender for a strategy profile (C, t) is defined as \(U_{d}(t,C) = c_{t}U_{d}^{c}(t) + (1  c_{t})U_{d}^{u}(t)\) and a similar form for the adversary. An SSE for the basic security games (nonBayesian, rational adversary) is defined as follows:
Definition 1.
 1.
The defender plays a best response: U _{ d }(t ^{∗}, C ^{∗}) ≥ U _{ d }(t(C), C) for all defender’s strategy C where t(C) is the attacker’s response against the defender strategy C.
 2.
The attacker plays a bestresponse: U_{ a }(t ^{∗}, C ^{∗}) ≥ U _{ a }(t, C ^{∗}) for all target t.
 3.
The attacker breaks ties in favor of the defender: U_{ d }(t ^{∗}, C ^{∗}) ≥ U _{ d }(t′, C ^{∗}) for all target t′ such that t′ = argmax_{ t } U _{ a }(t, C ^{∗}).
The assumption that the follower will always break ties in favor of the leader in cases of indifference is reasonable because in most cases the leader can induce the favorable strong equilibrium by selecting a strategy arbitrarily close to the equilibrium that causes the follower to strictly prefer the desired strategy (von Stengel and Zamir 2004). Furthermore an SSE exists in all Stackelberg games, which makes it an attractive solution concept compared to versions of Stackelberg equilibrium with other tiebreaking rules. Finally, although initial applications relied on the SSE solution concept, we have since proposed new solution concepts that are more robust against various uncertainties in the model (An et al. 2011; Pita et al. 2012; Yin et al. 2011) and have used these robust solution concepts in some of the later applications.
For simple examples of security games, such as the one shown above, the Strong Stackelberg Equilibrium can be calculated by hand. However, as the size of the game increases, hand calculation is no longer feasible, and an algorithmic approach for generating the SSE becomes necessary. Conitzer and Sandholm (Conitzer and Sandholm 2006) provided the first complexity results and algorithms for computing optimal commitment strategies in Stackelberg games, including both pure and mixedstrategy commitments. An improved algorithm for solving Stackelberg games, DOBSS (Paruchuri et al. 2008), is central to the fielded application ARMOR that was in use at the Los Angeles International Airport (Jain et al. 2010b).
Decomposed Optimal Bayesian Stackelberg Solver (DOBSS):
We now describe the DOBSS ^{3} in detail as it provides a starting point for the algorithms we develop in the next section. We first present DOBSS in its most intuitive form as a mixedinteger quadratic program (MIQP) ; we then present a linearized equivalent mixedinteger linear program (MILP) . The DOBSS model explicitly represents the actions by the leader and the optimal actions for the follower in the problem solved by the leader. Note that we need to consider only the rewardmaximizing pure strategies of the follower, since for a given fixed mixed strategy x of the leader, each follower faces a problem with fixed linear rewards. If a mixed strategy is optimal for the follower, then so are all the pure strategies in support of that mixed strategy.
Here, for a leader strategy x and a strategy q for the follower, the objective (Line 1) represents the expected reward for the leader. The first (Line 2) and the fourth (Line 5) constraints define the set of feasible solutions x ∈ X as a probability distribution over the set of strategies σ _{ i } ∈ Σ _{ Θ }. The second (Line 3) and third (Line 6) constraints limit the vector of strategies, q, to be a pure strategy over the set Q (that is each q has exactly one coordinate equal to one and the rest equal to zero). The two inequalities in the third constraint (Line 4) ensure that q _{ j } = 1 only for a strategy j that is optimal for the follower. Indeed this is a linearized form of the optimality conditions for the linear programming problem solved by each follower. We explain the third constraint (Line 4) as follows: this constraint enforces dual feasibility of the follower’s problem (leftmost inequality) and the complementary slackness constraint for an optimal pure strategy q for the follower (rightmost inequality). Note that the leftmost inequality ensures that \(\forall j \in Q\), a ≥ ∑ _{ i ∈ X } C _{ ij } x _{ i }. This means that given the leader’s policy x, a is an upper bound on follower’s reward for any strategy. The rightmost inequality is inactive for every strategy where q _{ j } = 0, since M is a large positive quantity. In fact, since only one pure strategy can be selected by the follower, say some q _{ j } = 1, for the strategy that has q _{ j } = 1, this inequality states a ≤ ∑ _{ i ∈ X } C _{ ij } x _{ i }, which combined with the left inequality enforces a = ∑ _{ i ∈ X } C _{ ij } x _{ i }, thereby imposing no additional constraint for all other pure strategies which have q _{ j } = 0 and showing that this strategy must be optimal for the follower.
3 Categorizing Security Games
With progress in the security games research and the expanding set of applications, it is valuable to consider categorizing this work into three separate areas. These categories are driven by applications, but they also impact the types of games (e.g., single shot vs repeated games) considered and the research issues that arise. Specifically, the three categories are (i) infrastructure security games, (ii) green security games, and (iii) opportunistic crime security games . We discuss each category below.
3.1 Infrastructure Security Games

Application characteristics: These games are focused on applications of protecting infrastructure, such as ports, airports, trains, flights, and so on; the goal is often assisting agencies engaged in counterterrorism. Notice that the infrastructure being protected tends to be static, and little changes in a few months, e.g., an airport being protected, may have new construction once in 2–3 years. The activities in the infrastructure are regulated by wellestablished schedules of movement of people or goods. Furthermore, the targets being protected often have a discrete structure, e.g., terminals at an airport, individual flights, individual trains, etc.

Overall characteristics of the defender and adversary play: These games are singleshot games. The defender does play her strategy repeatedly, i.e., the defender commits to a mixed strategy in this security game. This mixed strategy may get played for months at a time. However, a single attack by an adversary ends the game. The game could potentially restart after such an attack, but it is not set up as a repeated game as in the game categories described below.

Adversary characteristics: The games assume that the adversaries are highly strategic, who may attack after careful planning and surveillance. These carefully planned attacks have high consequences. Furthermore, since these attacks are a result of careful planning with the anticipation of high consequences, attackers commit to these plans of attacks and are not considered to opportunistically move from target to target.

Defender characteristics: The defender does not repeatedly update her strategies. In these domains, there may be just a few attacks that may occur, but these tend to be rare; they are not a very large number of attacks that occur repeatedly. As a result, traditionally, no machine learning is used in this work for the defender to update her strategies over time .
3.2 Green Security Games

Application characteristics: These games are focused on applications of protecting the environment, including forests, fish, and wildlife. The goal is thus often to assist security agencies against poachers, illegal fishermen, or those illegally cutting trees in national parks in countries around the world. Unlike infrastructure security games, animals or fish being protected may move around in geographical space, introducing new dimensions of complexity. Finally, the targets being protected are spread out over vast open geographical spaces, e.g., large forest regions protect trees from illegal cutting.

Overall characteristics of the defender and adversary play: These games are not singleshot games. Unfortunately, the adversaries often conduct multiple repeated “attacks,” e.g., poaching animals repeatedly. Thus, a single illegal activity does not end the game. Instead, usually, after obtaining reports, e.g., over a month, of illegal activities, the defender often replans her security activities. In other words, these are repeated security games where the defender plays a mixed strategy while the attacker attacks multiple times, and then the defender replans and plays a new mixed strategy and the cycle repeats. Notice also that the illegal activities of concern here may be conducted by multiple individuals, and thus there are multiple adversaries that are active at any one point.

Adversary characteristics: As mentioned earlier, the adversaries are engaged in repeated illegal activities; and the consequences of failure or success are not as severe as in the case of counterterrorism. As a result, every single attack (illegal action) cannot be carried out with the most detailed surveillance and planning; the adversaries will hence exhibit more of a bounded rationality and bounded surveillance in these domains.
Nonetheless, these domains are not ones where illegal activities can be conducted opportunistically (as in the opportunistic crime security games discussed below). This is because in these green security games, the adversaries often have to act in extremely dangerous places (e.g., deep in forests, protecting themselves from wild animals), and thus given the risks involved, they cannot take an entirely opportunistic approach.

Defender characteristics: Since this is a repeated game setting, the defender repeatedly updates her strategies. Machine learning can now be used in this work for the defender to update her strategies over time, given that attack data is available over time. The presence of large amounts of such attack data is very unfortunate in that very large numbers of crimes against the environment are recorded in real life, but the silver lining is that the defender can improve her strategy exploiting this data .
3.3 Opportunistic Crime Security Games

Application characteristics: These games focused on applications involving protecting the public against opportunistic crime. The goal is thus often to assist security agencies in protecting public’s property such as cell phones, laptops, or other valuables. Here, human crowds may move around based on scheduled activities, e.g., office hours in downtown settings or class timings on a university campus, and thus the focus of what needs to be protected may shift on a regular schedule. At least in urban settings, these games focus on specific limited geographical areas as opposed to vast open spaces as involved in “green security games.”

Overall characteristics of the defender and adversary play: While these games are not explicitly formulated as repeated games, the adversary may conduct or attempt to conduct multiple “attacks” (thefts) in any one round of the game. Thus, the defender commits to a mixed strategy, but a single attack by a single attacker does not end the game. Instead multiple attackers may be active at a time, conducting multiple thefts while the defender attempts to stop these thefts from taking place.

Adversary characteristics: Once again, the adversaries are engaged in repeated illegal activities; and the consequences of failure or success are not as severe as in the case of counterterrorism. As a result, once again, given that every single attack (illegal action) cannot be carried out with the most detailed surveillance and planning, the adversaries may thus act even less strategically and exhibit more of a bounded rationality and bounded surveillance in these domains. Furthermore, the adversaries are not as committed to detailed plans and are flexible in their execution of their plans, as targets of opportunity present themselves.

Defender characteristics: How to update defender strategies in these games from crime data is still an open research challenge.
3.4 Cybersecurity Games

Application characteristics: These games are focused on applications involving protecting network assets against cyber attacks. The goal is thus often to assist network administrators in protecting computer systems such as data servers, switches, etc., from data theft or damage to hardware, software, or information, as well as preventing disruption of services.

Overall characteristics of the defender and adversary play: Depending on the problem at hand, the attacker (or the intruder) may want to gain control over (or to disable) a valuable computer in the network by scanning the network, compromising a more vulnerable system, and/or gaining access to further devices on the computer network. The ultimate goal could be to use the compromised systems to launch further attacks or to steal data, etc. The broader goal of the defender (a human network administrator, or a detection system) could be formulated as preventing the adversary from gaining control over systems in the network by detecting malicious attacks.

Adversary characteristics: The adversary’s characteristics vary from one application domain to another. In some application scenarios, the intruder may simply want to gain control over (or to disable) a valuable computer in the network to launch other attacks, by scanning the network and thus compromising a more vulnerable system and/or gaining access to further devices on the computer network. The actions of the attacker can therefore be seen as sending malicious packets from a controlled computer (termed source) to a single or multiple vulnerable computers (termed targets). In other scenarios, the attacker may be interested in stealing valuable information from a particular data server and therefore takes necessary actions to compromise the desired system, possibly through a series of disruptions as studied in the advanced persistent threat (APT) literature.

Defender characteristics: Although this is a new and open problem, there has been recent literature that studies the problem of optimal defender resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple computers of differing importance. Therefore, the objective of the defender in such problems is to prevent the intruder from succeeding by selecting the packets for inspection, identifying the attacker, and subsequently thwarting the attack.
Even though we have categorized the research and applications of security games in these three categories, not everything is very cleanly divided in this fashion. Further research may reveal other categories of need to generate subcategories of the above three categories.
4 Addressing Scalability in RealWorld Problems
The early works in Stackelberg security games such as DOBSS (Paruchuri et al. 2008) required that the full set of pure strategies for both players be considered when modeling and solving Stackelberg security games. However, many realworld problems feature billions of pure strategies for either the defender and/or the attacker. Such large problem instances cannot even be represented in modern computers, let alone solved using previous techniques.
In addition to large strategy spaces, there are other scalability challenges presented by different realworld security domains. There are domains where, rather than being static, the targets are moving and thus the security resources need to be mobile and move in a continuous space to provide protection. There are also domains where the attacker may not conduct the careful surveillance and planning that is assumed for a Strong Stackelberg Equilibrium , and thus it is important to model the bounded rationality and bounded surveillance of the attacker in order to predict their behavior. In the former case, both the defender and attacker’s strategy spaces are infinite. In the latter case, computing the optimal strategy for the defender given attacker behavioral (bounded rationality and/or bounded surveillance) model is computationally expensive. Furthermore, in certain domains, it is important to incorporate finegrained topographical information to generate realistic patrol strategies for the defender. However, in doing so, existing techniques lead to a significant challenge in scalability especially when scheduling constraints need to be satisfied. In this section, we thus highlight the critical scalability challenges faced to bring Stackelberg security games to the real world and the research contributions that served to address these challenges.
4.1 Scale Up with Large Defender Strategy Spaces
This section provides an example of a research challenge in security games where the number of defender strategies is too enormous to be enumerated in computer memory. In this section, as in others that will follow, we will first provide a domain example motivating the challenge and then the algorithmic solution for the challenge.
Domain Example – IRIS for US Federal Air Marshals Service. The US Federal Air Marshals Service (FAMS) allocates air marshals to flights departing from and arriving in the USA to dissuade potential aggressors and prevent an attack should one occur. Flights are of different importance based on a variety of factors such as the numbers of passengers, the population of source and destination cities, and international flights from different countries. Security resource allocation in this domain is significantly more challenging than for ARMOR : a limited number of air marshals need to be scheduled to cover thousands of commercial flights each day. Furthermore, these air marshals must be scheduled on tours of flights that obey various constraints (e.g., the time required to board, fly, and disembark). Simply finding schedules for the marshals that meet all of these constraints is a computational challenge. For an example scenario with 1000 flights and 20 marshals, there are over 10^{41} possible schedules that could be considered. Yet there are currently tens of thousands of commercial flights flying each day, and public estimates state that there are thousands of air marshals that are scheduled daily by the FAMS (Keteyian 2010). Air marshals must be scheduled on tours of flights that obey logistical constraints (e.g., the time required to board, fly, and disembark). An example of a schedule is an air marshal assigned to a round trip from New York to London and back.
Against this background, the IRIS system (Intelligent Randomization In Scheduling) has been developed and deployed by FAMS since 2009 to randomize schedules of air marshals on international flights. In IRIS, the targets are the set of n flights and the attacker could potentially choose to attack one of these flights. The FAMS can assign m < n air marshals that may be assigned to protect these flights.
Since the number of possible schedules exponentially increases with the number of flights and resources, DOBSS is no longer applicable to the FAMS domain. Instead, IRIS uses the much faster ASPEN algorithm (Jain et al. 2010a) to generate the schedule for thousands of commercial flights per day .
Algorithmic SolutionIncremental Strategy Generation (ASPEN).
In this section, we describe one particular algorithm, ASPEN, that computes strong Stackelberg equilibria (SSE) in domains with a very large number of pure strategies (up to billions of actions) for the defender (Jain et al. 2010a). ASPEN builds on the insight that in many realworld security problems, there exist solutions with small support sizes, which are mixed strategies in which only a small set of pure strategies are played with positive probability (Lipton et al. 2003). ASPEN exploits this by using a incremental strategy generation approach for the defender, in which defender pure strategies are iteratively generated and added to the optimization formulation.
In ASPEN’s security game, the attacker can choose any of the flights to attack, and each air marshal can cover one schedule. Each schedule here is a feasible set of targets that can be covered together; for the FAMS, each schedule would represent a flight tour which satisfies all the logistical constraints that an air marshal could fly. For example, {t _{1}, t _{2}} would be a flight schedule, where t _{1} is an outbound flight and t _{2} is an inbound flight for one air marshal. A joint schedule then would assign every air marshal to a flight tour, and there could be exponentially many joint schedules in the domain. A pure strategy for the defender in this security game is a joint schedule. Thus, for example, if there are two air marshals, one possible joint schedule would be {{t _{1}, t _{2}}, {t _{3}, t _{4}}}, where the first air marshal covers flights t _{1} and t _{2} and the second covers flights t _{3} and t _{4}. As mentioned previously, ASPEN employs incremental strategy generation since all the defender pure strategies cannot be enumerated for such a massive problem. ASPEN decomposes the problem into a master problem and a slave problem, which are then solved iteratively. Given a number of pure strategies, the master solves for the defender and the attacker optimization constraints, while the slave is used to generate a new pure strategy for the defender in every iteration. This incremental, iterative strategy generation process allows ASPEN to avoid generation of the entire set of pure strategies. In other words, by exploiting the small support size mentioned above, only a few pure strategies get generated via the iterative process; and yet we are guaranteed to reach the optimal solution.
Employing incremental strategy generation for large optimization problems is not an “outofthebox” approach; the problem has to be formulated in a way that allows for domain properties to be exploited. The novel contribution of ASPEN is to provide a linear formulation for the master and a minimumcost integer flow formulation for the slave, which enables the application of strategy generation techniques .
4.2 Scale Up with Large Defender and Attacker Strategy Spaces
Whereas the previous section focused on domains where only the defender’s strategy was difficult to enumerate, we now turn to domains where both defender and attacker strategies are difficult to enumerate. Once again we provide a domain example and then an algorithmic solution.
Domain Example – Road Network Security One area of great importance is securing urban city networks, transportation networks, computer networks, and other networkcentric security domains. For example, after the terrorist attacks in Mumbai of 2008 (Chandran and Beitchman 2008), the Mumbai police started setting up vehicular checkpoints on roads. We can model the problem faced by the Mumbai police as a security game between the Mumbai police and an attacker. In this urban security game, the pure strategies of the defender correspond to allocations of resources to edges in the network – for example, an allocation of police checkpoints to roads in the city. The pure strategies of the attacker correspond to paths from any source node to any target node – for example, a path from a landing spot on the coast to the airport.
The strategy space of the defender grows exponentially with the number of available resources, whereas the strategy space of the attacker grows exponentially with the size of the network. For example, in a fully connected graph with 20 nodes and 190 edges, the number of defender pure strategies for only five defender resources is \({190\choose 5}\) or almost 2 billion, while the number of attacker pure strategies (i.e., paths without cycles) is on the order of 10^{18}. Realworld networks are significantly larger, e.g., the entire road network of the city of Mumbai has 9,503 nodes (intersections) and 20,416 edges (streets), and the security forces can deploy dozens (but not as many as number of edges) of resources. In addressing this computational challenge, novel algorithms based on incremental strategy generation have been able to generate randomized defender strategies that scale up to the entire road network of Mumbai (Jain et al. 2013).
Algorithmic SolutionDouble Oracle Incremental Strategy Generation (RUGGED)
In domains such as the urban network security setting, the number of pure strategies of both the defender and the attacker is exponentially large. In this section, we describe the RUGGED algorithm (Jain et al. 2011), which generates pure strategies for both the defender and the attacker.
In Fig. 3, we show that RUGGED iterates over two oracles: the defender best response and the attacker best response oracles. In this case, the defender best response oracle has added a strategy X _{2}, and the attacker best response oracle then adds a strategy A _{3}. The algorithm stops when neither of the generated best responses improve on the current minimax strategies.
The contribution of RUGGED is to provide the mixed integer formulations for the best response modules which enable the application of such a strategy generation approach. The key once again is that RUGGED is able to converge to the optimal solution without enumerating the entire space of defender and attacker strategies. However, originally RUGGED could only compute the optimal solution for deploying up to four resources in realcity network with 250 nodes within a time frame of 10 h (the complexity of this problem can be estimated by observing that both the best response problems are NP hard themselves (Jain et al. 2011)). More recent work Jain et al. (2013) builds on RUGGED and proposes SNARES , which allows scaleup to the entire city of Mumbai, with 10–15 checkpoints .
4.3 ScaleUp with Mobile Resources and Moving Targets
Whereas the previous two sections focused on incremental strategy generation as an approach for scaleup, this section introduces another approach: use of compact marginal probability representations. This alternative approach is shown in use in the context of a new application of protecting ferries.
Domain Example – Ferry Protection for the US Coast Guard
The US Coast Guard is responsible for protecting domestic ferries, including the Staten Island Ferry in New York, from potential terrorist attacks. Here are a number of ferries carrying hundreds of passengers in many waterside cities. These ferries are attractive targets for an attacker who can approach the ferries with a small boat packed with explosives at any time; this attacker’s boat may only be detected when it comes close to the ferries. Small, fast, and wellarmed patrol boats can provide protection to such ferries by detecting the attacker within a certain distance and stop him from attacking with the armed weapons. However, the number of patrol boats is often limited; thus, the defender cannot protect the ferries at all times and locations. We thus developed a gametheoretic system for scheduling escort boat patrols to protect ferries, and this has been deployed at the Staten Island Ferry since 2013 (Fang et al. 2013).
Algorithmic Solution – Compact Strategy Representation (CASS).
In this section, we describe the CASS (Solver for Continuous Attacker Strategy) algorithm (Fang et al. 2013) for solving security problems where the defender has mobile patrollers to protect a set of mobile targets against the attacker who can attack these moving targets at any time during their movement. In these security problems, the sets of pure strategies for both the defender and attacker are continuous w.r.t, the continuous spatial and time components of the problem domain. The CASS algorithm attempts to compute the optimal mixed strategy for the defender without discretizing the attacker’s continuous strategy set; it exactly models this set using subinterval analysis which exploits the piecewiselinear structure of the attacker’s expected utility function. The insight of CASS is to compactly represent the defender’s mixed strategies as a marginal probability distribution, overcoming the shortcoming of an exponential number of pure strategies for the defender.
CASS casts problems such as the ferry protection problem mentioned above as a zerosum security game in which targets move along a onedimensional domain, i.e., a straight line segment connecting two terminal points. This onedimensional assumption is valid as in realworld domains such as ferry protection, ferries normally move backandforth in a straight line between two terminals (i.e., ports) around the world. Although the targets’ locations vary w.r.t time changes, these targets have a fixed daily schedule, meaning that determining the locations of the targets at a certain time is straightforward. The defender has mobile patrollers (i.e., boats) that can move along between two terminals to protect the targets. While the defender is trying to protect the targets, the attacker will decide to attack a certain target at a certain time. The probability that the attacker successfully attacks depends on the positions of the patroller at that time. Specifically, each patroller possesses a protective circle of radius within which she can detect and try to intercept any attack, whereas she is incapable of detecting the attacker prior to that radius.
A pure strategy for the defender is defined as a trajectory of this graph, e.g., the trajectory including (A, 5 min), (B, 10 min), and (C, 15 min) indicates a pure strategy for the defender. One key challenge of this representation for the defender’s pure strategies is that the transition graph consists of an exponential number of trajectories, i.e., O(N ^{ T }) where N is the number of location points and T is the number of time steps. To address this challenge, CASS proposes a compact representation of the defender’s mixed strategy. Instead of directly computing a probability distribution over pure strategies for the defender, CASS attempts to compute the marginal probability that the defender will follow a certain edge of the transition graph, e.g., the probability of being at the node (A, 5 min) and moving to the node (B, 10 min). We show that given a discretized strategy space for the defender, any strategy in full representation can be mapped into a compact representation as well as compact representation does not lead to any loss in solution quality compared to the full representation (see Theorem 1 in ?). This compact representation allows CASS to reformulate the resource allocation problem as computing the optimal marginal coverage of the defender over a number of O(NT), the edges of the transition graph .
4.4 ScaleUp with Boundedly Rational Attackers
One key challenge of realworld security problems is that the attacker is boundedly rational; the attacker’s target choice is nonoptimal. In SSGs, attacker bounded rationality is often modeled via behavior models such as quantal response (QR) (McFadden 1972; McKelvey and Palfrey 1995). In general, QR attempts to predict the probability the attacker will choose each target with the intuition is that the higher the expected utility at a target is, the more likely that the adversary will attack that target. Another behavioral model that was recently shown to provide higher prediction accuracy in predicting the attacker’s behavior than QR is subjective utility quantal response (SUQR) (Nguyen et al. 2013). SUQR is motivated by the lens model which suggested that evaluation of adversaries over targets is based on a linear combination of multiple observable features (Brunswik 1952). We provide a detailed discussion on modeling and learning the attacker’s behavioral model in Sect. sec:bounded. However, even when the attacker’s bounded rationality is modeled and those models are learned efficiently, handling multiple attackers with these behavioral models in the context of large defender’s strategy space is computational challenge. Therefore in this section, we mainly focus on handling the scalability problem given behavioral models of the attacker.
To handle the problem of large defender’s strategy space given behavioral models of attackers, we introduce yet another technique of scaling up, which is similar to the incremental strategy generation. Instead, here we use incremental marginal space refinement. We use the compact marginal representation, discussed earlier, but refine that space incrementally if the solution produces violates the necessary constraints.
Domain Example – Fishery Protection for US Coast Guard
Fisheries are a vital natural resource from both an ecological and economic standpoint. However, fish stocks around the world are threatened with collapse due to illegal, unreported, and unregulated (IUU) fishing. The US Coast Guard (USCG) is tasked with the responsibility of protecting and maintaining the nation’s fisheries. To this end, the USCG deploys resources (both air and surface assets) to conduct patrols over fishery areas in order to deter and mitigate IUU fishing. Due to the large size of these patrol areas and the limited patrolling resources available, it is impossible to protect an entire fishery from IUU fishing at all times. Thus, an intelligent allocation of patrolling resources is critical for security agencies like the USCG.
Natural resource conservation domains such as fishery protection raise a number of new research challenges. In stark contrast to counterterrorism settings, there is frequent interaction between the defender and attacker in these resource conservation domains. This distinction is important for three reasons. First, due to the comparatively low stakes of the interactions, rather than a handful of persons or groups, the defender must protect against numerous adversaries (potentially hundreds or even more), each of which may behave differently. Second, frequent interactions make it possible to collect data on the actions of the adversary actions over time. Third, the adversaries are less strategic given the short planning windows between actions.
Algorithmic Solution – Incremental Constraint Generation (MIDAS).
Generating effective strategies for domains such as fishery protection requires an algorithmic approach which is both scalable and robust. For scalability, the defender is responsible for protecting a large patrol area and therefore must consider a large strategy space. Even if the patrol area is discretized into a grid or graph structure, the defender must still reason over an exponential number of patrol strategies. For robustness, the defender must protect against multiple boundedly rational adversaries. Bounded rationality models, such as the quantal response (QR) model (McKelvey and Palfrey 1995) and the subjective utility quantal response (SUQR) model (Nguyen et al. 2013), introduce stochastic actions, relaxing the strong assumption in classical game theory that all players are perfectly rational and utility maximizing. These models are able to better predict the actions of human adversaries and thus lead the defender to choose strategies that perform better in practice. However, both QR and SUQR are nonlinear models resulting in a computationally difficult optimization problem for the defender. Combining these factors, MIDAS models a population of boundedly rational adversaries and utilizes available data to learn the behavior models of the adversaries using the subjective utility quantal response (SUQR) model in order to improve the way the defender allocates its patrolling resources.
Due to the relaxations, solving the master produces a marginal strategy x which is a probability distribution over targets. However, the defender ultimately needs a probability distribution over patrols. Additionally, since not all of the spatiotemporal constraints are considered in the master, the relaxed solution x may not be a feasible solution to the original problem. Therefore, the slave checks if the marginal strategy x can expressed as a linear combination, i.e., probability distribution, of patrols. Otherwise, the marginal distribution is infeasible for the original problem. However, given the exponential number of patrol strategies, even performing this optimality check is intractable. Thus, column generation is used within the slave where only a small set of patrols is considered initially in the optimality check and the set is expanded over time. Much like previous examples of column generation in security games, e.g., (Jain et al. 2010a), new patrols are added by solving a minimum cost network flow problem using reduced cost information from the optimality check. If the optimality check fails, then the slave generates a cut which is returned to refine and constrain the master, incrementally bringing it closer to the original problem. The entire process is repeated until an optimal solution is found. Finally, MIDAS has been successfully deployed and evaluated by the USCG in the Gulf of Mexico.
4.5 ScaleUp with FineGrained Spatial Information
Discretization is a standard way to convert a continuous problem to a discrete problem. Therefore, a grid map is often used to describe a large area. However, when finegrained spatial information needs to be considered, each cell in the grid map should be of small size, and the total number of cells is large, which leads to a significant challenge in scalability in security games especially when scheduling constraints need to be satisfied. In this section, we introduce a hierarchical modeling approach for problems with finegrained spatial information, which is used in the context of designing foot patrols in area with complex terrain (Fang et al. 2016).
Domain Example – Wildlife Protection for Area with Complex Terrain
There is an urgent need to protect wildlife from poaching . Indeed, poaching can lead to extinction of species and destruction of ecosystems. For example, poaching is considered a major driver (Chapron et al. 2008) of why tigers are now found in less than 7% of their historical range (Sanderson et al. 2006), with three out of nine tiger subspecies already extinct (IUCN 2015). As a result, efforts have been made by law enforcement agencies in many countries to protect endangered animals; the most commonly used approach is conducting foot patrols. However, given their limited human resources, improving the efficiency of patrols to combat poaching remains a major challenge.
Algorithmic Solution – Hierarchical Modeling Approach
The hierarchical modeling approach allows us to attain a good compromise between scaling up and providing detailed guidance. This approach would be applicable in many other domains for large open area patrolling where security games are applicable, not only other green security games applications, but others including patrolling of large warehouse areas or large open campuses via robots or UAVs.
We leverage insights from hierarchical abstraction for heuristic search such as path planning (Botea et al. 2004) and apply two levels of discretization to the area of interest. We first discretize the area into largesized grid cells and treat every grid cell as a target. We further discretize the grid cells into smallsized raster pieces and describe the spatial information for each raster piece. The defender actions are patrol routes defined over a virtual “street map” – which is built in the terms of raster pieces while aided by the grid cells in this abstraction as described below. With this hierarchical modeling, the model keeps a small number of targets and reduces the number of patrol routes while allowing for details at a finegrained scale. The street map is a graph consisting of nodes and edges, where the set of nodes is a small subset of the raster pieces and edges are sequences of raster pieces linking the nodes. We denote nodes as key access points (KAPs) and edges as route segments. While designing foot patrols in areas with complex terrain, the street map not only helps scalability but also allows us to focus patrolling on preferred terrain features such as ridgelines which patrollers find easier to move around and are important conduits for certain mammal species such as tigers.
The last step is to find route segments to connect the KAPs. Instead of inefficiently finding route segments to connect each pair of KAPs on the map globally, we find route segments locally for each pair of KAPs within the same grid cell, which is sufficient to connect all the KAPs. When finding the route segment, we design a distance measure which estimates the actual patrol effort according to the accessibility type of the raster pieces. Given the distance measure, the route segment is defined as the shortest distance path linking two KAPs within the grid cell.
The defender’s pure strategy is defined as a patrol route on the street map, starting from the base node, walking along route segments, and ending with the base node, with its total distance satisfying the patrol distance limit. The defender’s goal is to find an optimal mixed patrol strategy – a probability distribution over patrol routes. Based on the street map concept, we use a cuttingplane approach (Yang et al. 2013b) that is similar to MIDAS; specifically, in the master component, we use ARROW (Nguyen et al. 2015) algorithm to handle payoff uncertainty using the concept of minimax regret and in the slave component, we also use optimality check and column generation, and in generating new column (new patrol), we use a random selection approach over the street map. This framework is the core of the PAWS (Protection Assistant for Wildlife Security ) application. Collaborating with two NGOs (Panthera and Rimba), PAWS has been deployed in Malaysia for tiger conservation .
5 Addressing Uncertainty in RealWorld Problems
The standard security game model features a number of strong assumptions including that the defender has perfect information about the game payoff matrix as well as the attacker’s behavorial model. Additionally, the defender is assumed to be capable of exactly executing the computed patrolling strategy. However, uncertainty is endemic in realworld security domains and thus is may be impossible or impractical for the defender to the accurately estimate various aspects of the game. Also, there are any number of practicalities and unforeseen events that may force the defender to change their patrolling strategy. These types of uncertainty can significantly deteriorate the effectiveness of the defender’s strategy and thus addressing uncertainty when generating strategies is a key challenge of solving realworld security problems. This section describes several approaches for dealing with various types of uncertainties in SSGs.
(1) applying robust optimization techniques using uncertainty intervals to represent uncertainty in SSGs. For example, BRASS (Pita et al. 2009b) is a robust algorithm that only addresses attackerpayoff uncertainty, RECON (Yin et al. 2011) is another robust algorithm that focuses on addressing defenderstrategy uncertainty, and monotonic maximin (Jiang et al. 2013b) is to handle the uncertainty in the attacker’s bounded rationality . Finally, URAC (Nguyen et al. 2014) is a unified robust algorithm that handles all types of uncertainty; and (2) following Bayesian Stackelberg game model with dynamic execution uncertainty in which the uncertainty is represented using Markov decision process (MDP) where the time factor is incorporated.
In the following, we present two algorithmic solutions which are the representatives of these two approaches: URAC – a unified robust algorithm to handle all types of uncertainty with uncertainty intervals – and the MDPbased algorithm to handle execution uncertainty with an MDP representation of uncertainty.
5.1 Security Patrolling with Unified Uncertainty Space
Domain Example – Security in Los Angeles International Airport .
The ARMOR system (Assistant for Randomized Monitoring over Routes) focuses on two of the security measures at LAX (checkpoints and canine patrols) and optimizes security resource allocation using Bayesian Stackelberg games. Take the vehicle checkpoints model as an example. Assuming that there are n roads, the police’s strategy is placing m < n checkpoints on these roads where m is the maximum number of checkpoints. ARMOR randomizes allocation of checkpoints to roads. The adversary may conduct surveillance of this mixed strategy and may potentially choose to attack through one of these roads. ARMOR models different types of attackers with different payoff functions, representing different capabilities and preferences for the attacker. ARMOR has been successfully deployed since August 2007 at LAX (Jain et al. 2010b).
Although standard SSGbased solutions (i.e., DOBSS) have been demonstrated to improve the defender’s patrolling effectiveness significantly, there remains potential improvements that can be made to further enhance the quality of such solutions such as taking uncertainties in payoff values, in the attacker’s rationality, and in defender’s execution into account. Therefore, we propose the unified robust algorithm, URAC, to handle these types of uncertainties by maximizing the defender’s utility against the worstcase scenario resulting from these uncertainties .
Algorithmic Solution – Uncertainty Dimension Reduction (URAC).
In this section, we present the robust URAC (Unified Robust Algorithmic framework for addressing unCertainties) algorithm for addressing a combination of all uncertainty types (Nguyen et al. 2014). Consider an SSG where there is uncertainty in the attacker’s payoff, the defender’s strategy (including the defender’s execution and the attacker’s observation), and the attacker’s behavior, URAC represents all these uncertainty types (except for the attacker’s behaviors) using uncertainty intervals. Instead of knowing exactly values of these game attributes, the defender only has prior information w.r.t the upper bounds and lower bounds of these attributes. For example, the attacker’s reward if successfully attacking a target t is known to lie within the interval [1, 3]. Furthermore, URAC assumes the attacker monotonically responds to the defender’s strategy. In other words, the higher the expected utility of a target, the more likely that the attacker will attack that target; however, the precise attacking probability is unknown for the defender. This monotonicity assumption is motivated by the quantal response model – a wellknown human behavioral model for capturing the attacker’s decisionmaking (McKelvey and Palfrey 1995).
Based on these uncertainty assumptions, URAC attempts to compute the optimal strategy for the defender by maximizing her utility against the worstcase scenario of uncertainty. The key challenge of this optimization problem is that it involves several types of uncertainty, resulting in multiple minimization steps for determining the worstcase scenario. Nevertheless, URAC introduces a unified representation of all these uncertainty types as an uncertainty set of attacker’s responses. Intuitively, despite of any type of uncertainty mentioned above, what finally affects the defender’s utility is the attacker’s response, which is unknown to the defender due to uncertainty. As a result, URAC can represent the robust optimization problem as a single maximin problem.
However, the infinite uncertainty set of the attacker’s responses depends on the planned mixed strategy for the defender, making this maximin problem difficult to solve if directly applying the traditional method (i.e., taking the dual maximization of the inner minimization of maximin and merging it with the outer maximization – maximin now can be represented a single maximization problem). Therefore, URAC proposes a divideandconquer method in which the defender’s strategy set is divided into subsets such that the uncertainty set of the attacker’s responses is the same for every defender strategy within each subset. This division leads to multiple submaximin problems which can be solved by using the traditional method. The optimal solution of the original maximin problem is now can be computed as a maximum over all the submaximin problems.
5.2 Security Patrolling with Dynamic Execution Uncertainty
Domain Example – TRUSTS for Security in Transit Systems .
The TRUSTS system (Tactical Randomization for Urban Security in Transit Systems) models the patrolling problem as a leaderfollower Stackelberg game (Yin et al. 2012). The leader (LASD) precommits to a mixed strategy patrol (a probability distribution over all pure strategies), and riders observe this mixed strategy before deciding whether to buy the ticket or not. Both ticket sales and fines issued for fare evasion translate into revenue for the government. Therefore, the utility for the leader is the total revenue (total ticket sales plus penalties). The main computational challenge is the exponentially many possible patrol strategies, each subject to both the spatial and temporal constraints of travel within the transit network under consideration. To overcome this challenge, TRUSTS uses a compact representation of the strategy space which captures the spatiotemporal structure of the domain.
The LASD conducted field tests of this TRUSTS system in the LA Metro in 2012, and one of the feedback comments from the officers was that patrols are often interrupted due to execution uncertainty such as emergencies and arrests .
Algorithmic Solution – Marginal MDP Strategy Representation
Utilizing techniques from planning under uncertainty (in particular Markov decision processes), we proposed a general approach to dynamic patrolling games in uncertain environments, which provides patrol strategies with contingency plans (Jiang et al. 2013a). This led to schedules now being loaded onto smartphones and given to officers. If interruptions occur, the schedules are then automatically updated on the smartphone app. The LASD has conducted successful field evaluations using the smartphone app, and the TSA is currently evaluating it toward nationwide deployment. We now describe the solution approach in more detail. Note that the targets, e.g., trains normally follow predetermined schedules; thus, timing is an important aspect which determines the effectiveness of the defender’s patrolling schedules (the defender needs to be at the right location at a specific time in order to protect these moving targets). However, as a result of execution uncertainty (e.g., emergencies or errors), the defender could not carry out her planned patrolling schedule in later time steps. For example, in realworld trials for TRUSTS carried out by Los Angeles Sheriff’s Department (LASD), there is interruption (due to writing citations, felony arrests, and handling emergencies) in a significant fraction of the executions, causing the officers to miss the train they are supposed to catch as following the pregenerated patrolling schedule.
In essence, the transition graph as represented above is augmented to indicate the possibility that there are multiple uncertain outcomes possible from a given state. Solving this transition graph results in marginals over MDP policies. When a sample MDP policy is obtained and loaded on to a smartphone, it provides a patroller not only the current action but contingency actions should the current action fail or succeed. So the MDP policy provides options for the patroller, allowing the system to handle execution uncertainty. A key challenge of computing the SSE for this type of security problem is that the dimension of the space of mixed strategies for the defender is exponential in the number of states in terms of the defender’s times and locations. Therefore, instead of directly computing the mixed strategy, the defender attempts to compute the marginal probabilities of each patrolling unit reaching a state s = (t, l) and taking action a which have dimensions polynomial in the sizes of the MDPs (the details of this approach are provided in Jiang et al. 2013a).
6 Addressing Bounded Rationality and Bounded Surveillance in RealWorld Problems
In addition to bounded rationality, attackers’ bounded surveillance also needs to be considered in realworld domains. In previous sections, a oneshot Stackelberg security game model is used, and it is assumed that the adversaries will conduct extensive surveillance to get a perfect understanding of the defender’s strategy before an attack. However, this assumption does not apply to realworld domains involving frequent and repeated attacks. In carrying out frequent attacks, the attackers generally do not conduct extensive surveillance before performing an attack, and therefore the attackers’ understanding of the defender strategy may not be uptodate. As will be shown later in this section, if the bounded surveillance of attackers is known to the defender, the defender can exploit it to improve her average expected utility by carefully planning changes in her strategy. The improvement may depend on the level of bounded surveillance and the defender’s correct understanding of the bounded surveillance. Therefore, addressing the human adversaries’ boundedly rationality and bounded surveillance is a fundamental challenge for applying security games to a wide variety of domains.
Domain Example – Green Security Domains.
As mentioned earlier, endangered species poaching is reaching critical levels as the populations of these species plummet to unsustainable numbers. The global tiger population, for example, has dropped over 95% from the start of the 1900s and has resulted in three out of nine species extinctions. Depending on the area and animals poached, motivations for poaching range from profit to sustenance, with the former being more common when profitable species such as tigers, elephants, and rhinos are the targets. To counter poaching efforts and to rebuild the species’ populations, countries have set up protected wildlife reserves and conservation agencies tasked with defending these large reserves. Because of the size of the reserves and the common lack of law enforcement resources, conservation agencies are at a significant disadvantage when it comes to deterring and capturing poachers. Agencies use patrolling as a primary method of securing the park. Due to their limited resources, however, patrol managers must carefully create patrols that account for many different variables (e.g., limited patrol units to send out, multiple locations that poachers can attack at varying distances to the outpost).
6.1 Bounded Rationality Modeling and Learning
Wildlife Poaching Game :
In our game, human subjects play the role of poachers looking to place a snare to hunt a hippopotamus in a protected wildlife park. The portion of the park shown in the map is actually a Google Maps view of a portion of the Queen Elizabeth National Park (QENP) in Uganda. The region shown is divided into a 5*5 grid, i.e., 25 distinct cells. Overlaid on the Google Maps view of the park is a heat map, which represents the rangers’ mixed strategy x – a cell i with higher coverage probability x _{ i } is shown more in red, while a cell with lower coverage probability is shown more in green. As the subjects play the game and click on a particular region on the map, they were given detailed information about the poacher’s reward, penalty, and coverage probability at that region: R _{ i } ^{ a }, P _{ i } ^{ a }, and x _{ i } for each target i. However, the participants are unaware of the exact location of the rangers while playing the game, i.e., they do not know the pure strategy that will be played by the rangers, which is drawn randomly from mixed strategy x shown on the game interface. Thus, we model the realworld situation that poachers have knowledge of past pattern of ranger deployment but not the exact location of ranger patrols when they set out to lay snares. In our game, there were nine rangers protecting this park, with each ranger protecting one grid cell. Therefore, at any point in time, only 9 out of the 25 distinct regions in the park are protected. A player succeeds if he places a snare in a region which is not protected by a ranger, else he is unsuccessful.
Similar to Nguyen et al. (2013), here also we recruited human subjects on AMT and asked them to play this game repeatedly for a set of rounds with the defender strategy changing per round based on the behavioral model being used to learn the adversary’s behavior. Before we discuss more about the experiments conducted, we first give a brief overview of the bounded rationality models used in our experiments to learn adversary behavior.
Bounded Rationality Models:
While behavioral models like QR (McFadden 1976) and SUQR (Nguyen et al. 2013) assume that there is a homogeneous population of adversaries, in the real world, we face heterogeneous populations of adversaries. Therefore Bayesian SUQR was proposed to learn the behavioral model for each attack (Yang et al. 2014). Protection Assistant for Wildlife Security (PAWS) is an application which was originally created using Bayesian SUQR. However, in realworld security domains, we may have very limited data or may only have some limited information on the biases displayed by adversaries. An alternative approach is based on robust optimization: instead of assuming a particular model of human decisionmaking, try to achieve good defender expected utility against a range of possible models. One instance of this approach is MATCH (Pita et al. 2012), which guarantees a bound for the loss of the defender to be within a constant factor of the adversary loss if the adversary responds nonoptimally. Another robust solution concept is monotonic maximin (Jiang et al. 2013b), which tries to optimize defender utility against the worstcase monotonic adversary behavior, where monotonicity is the property that actions with higher expected utility is played with higher probability. Recently, there has been attempts to combine such robustoptimization approaches with available behavior data (Haskell et al. 2014) for RSSGs , resulting in a new human behavior model called Robust SUQR. However, one question of research is how these proposed models and algorithms will fare against human subjects in RSSGs. This has been explored in recent research (Kar et al. 2015) in the “firstofitskind” human subjects experiments in RSSGs over a period of 46 weeks with the “Wildlife Poaching” game. A brief description of our experimental observations from the RSSG human subject experiments is presented below.
Results in RSSG Experiments – An Overview:
In our human subject experiment s in RSSGs, we observe that (i) existing approaches (QR, SUQR, Bayesian SUQR) (Haskell et al. 2014; Nguyen et al. 2013; Yang et al. 2014) perform poorly in initial rounds, while Bayesian SUQR which is the basis for PAWS (Yang et al. 2014) performs poorly throughout all rounds; and (ii) surprisingly, simpler models like SUQR which were originally proposed for singleshot games performed better than recent advances like Bayesian SUQR and Robust SUQR which are geared specifically toward addressing repeated SSGs. These results are shown in Fig. 16a–d. Therefore, we proposed a new model called SHARP (Stochastic Human behavior model with AttRactiveness and Probability weighting ) (Kar et al. 2015) which is specifically suited for dynamic settings such as RSSGs . SHARP addresses the limitations of the existing models in the following way: (i) modeling the adversary’s adaptive decisionmaking process in repeated SSGs, SHARP reasons based on success, or failure of the adversary’s past actions on exposed portions of the attack surface, where attack surface is defined as the ndimensional space of the features used to model adversary behavior; (ii) addressing limited exposure to significant portions of the attack surface in initial rounds, SHARP reasons about similarity between exposed and unexposed areas of the attack surface, and also incorporates a discounting parameter to mitigate adversary’s lack of exposure to enough of the attack surface; (iii) addressing the limitation that existing models do not account for the adversary’s weighting of probabilities, we incorporate a two parameter probability weighting function. We discuss these three modeling aspects of SHARP.
SHARP – Probability Weighting:
SHARP – Adaptive Utility Function:
A second major innovation in SHARP is the adaptive nature of the adversary and addressing the issue of attack surface exposure where attack surface α is defined as the ndimensional space of the features used to model adversary behavior. A target profile β _{ k } ∈ α is defined as a point on the attack surface α and can be associated with a target. Exposing the adversary to a lot of different target profiles would therefore mean exposing the adversary to more of the attack surface and gathering valuable information about their behavior. While a particular target location, defined as a distinct cell in the 2D space, can only be associated with one target profile in a particular round, more than one target may be associated with the same target profile in the same round. β _{ k } ^{ i } denotes that target profile β _{ k } is associated with target i in a particular round. Below is an observation from our human subjects data that reveal interesting trends in attacker behavior in RSSGs.
Observation 1.
Consider two sets of adversaries: (i) those who have succeeded in attacking a target associated with a particular target profile in one round and (ii) those who have failed in attacking a target associated with a particular target profile in the same round. In the subsequent round, the first set of adversaries are significantly more likely than the second set of adversaries to attack a target with a target profile which is “similar” to the one they attacked in the earlier round.
6.2 Bounded Surveillance Modeling and Planning
We have discussed above some of the bounded rationality models applied to RSSGs. However, sometimes the adversaries may be bounded by their surveillance capabilities. Therefore, to account for adversaries’ bounded surveillance, more recent work has generalized the perfect Stackelberg assumption, and they assume that the adversaries’ understanding of the defender strategy may not be up to date and can be instead approximated as a convex combination of the defender strategies used in recent rounds (Fang et al. 2015). The RSSG framework, which assumes that the attackers always have uptodate information, can be seen as a special case of this more generalized Green Security Games (GSG) model.
More specifically, a GSG model considers a repeated game between a defender and multiple attackers. Each round corresponds to a period of time, which can be a time interval (e.g., a month) after which the defender (e.g., warden) communicate with local guards to assign them a new strategy. In each round, the defender chooses a mixed strategy at the beginning of the round. Different from RSSG, an attacker in GSG is characterized by his memory length and weights on recent rounds in addition to his SUQR model parameters. The attacker is assumed to respond to a weighted sum of the defender strategies used in recent rounds (within his memory length). The defender aims to maximize her total expected utility over all the rounds.
Due to the bounded surveillance of attackers, the defender can potentially improve her average expected utility by carefully planning changes in her strategy from round to round in a GSG. Based on the GSG model, we provide two algorithms that plan ahead – the generalization of the Stackelberg assumption introduces a need to plan ahead and take into account the effect of defender strategy on future attacker decisions. While the first algorithm plans a fixed number of steps ahead, the second one designs a short sequence of strategies for repeated execution.
For clarity of exposition, we first focus on the case where the attackers have one round memory and have no information about the defender strategy in the current round, i.e., the attackers respond to the defender strategy in the last round. To maximize her average expected utility, the defender could optimize over all rounds simultaneously. However, this approach is computationally expensive when the game has many rounds: it needs to solve a nonconvex optimization problem with at least NT variables where N is the number of targets considered and T is the length of the game. An alternative is the myopic strategy, i.e., the defender can always protect the targets with the highest expected utility in the current round. However, this myopic choice may lead to significant quality degradation as it ignores the impact of current strategy in the future round.
Therefore, we propose an algorithm named PlanAheadM (or PAM) in Fang et al. (2015) that looks ahead a few steps. PAM finds an optimal strategy for the current round as if it is the M ^{ th } last round of the game. If M = 2, the defender chooses a strategy assuming she will play a myopic strategy in the next round and end the game. PAT corresponds to the optimal solution and PA1 is the myopic strategy. Choosing 1 < M < T can balance the solution quality and the computation complexity.
While PAM presents an effective way to design sequential defender strategies, we provide another algorithm called FixedSequenceM (FSM) for GSGs in (Fang et al. 2015). FSM not only has provable theoretical guarantees but may also ease the implementation in practice. The idea of FSM is to find a short sequence of strategies with fixed length M and require the defender to execute this sequence repeatedly. If M = 2, the defender will alternate between two strategies, and she can exploit the attackers’ delayed response. It can be easier to communicate with local guards to implement FSM in green security domains as the guards only need to alternate between several types of maneuvers .
7 Addressing Field Evaluation in RealWorld Problems
Evidence showing the benefits of the algorithms discussed in the previous sections is definitely an important issue that is necessary for us to answer. Unlike conceptual ideas, where we can run thousands of careful simulations under controlled conditions, it is not possible to conduct such experiments in the real world with our deployed applications. Nor is it possible to provide a proof of 100% security – there is no such thing.
Instead, we focus on the specific question of are our gametheoretic algorithms presented better at security resource optimization or security allocation than how they were allocated previously, which was typically relying on human schedulers or a simple dice roll for security scheduling (simple dice roll is often the other “automation” that is used or offered as an alternative to our methods). We have used the following methods to illustrate these ideas. These methods range from simulations to actual field tests.
 1.
Simulations (including using a “machine learning” attacker): We provide simulations of security schedules, e.g., randomized patrols, assignments, comparing our approach to earlier approaches based on techniques used by human schedulers. We have a machine learningbased attacker who learns any patterns and then chooses to attack the facility being protected. Gametheoretic schedulers are seen to perform significantly better in providing higher levels of protections (Jain et al. 2010b; Pita et al. 2008). This is also shown in Fig. 17.
 2.
Human adversaries in the lab: We have worked with a large number of human subjects and security experts (security officials) to have them get through randomized security schedules, where some are schedules generated by our algorithms and some are baseline approaches for comparison. Human subjects are paid money based on the reward they collect by successfully intruding through our security schedules; again our gametheoretic schedulers perform significantly better (Pita et al. 2009a).
 3.
Actual security schedules before and after: For some security applications, we have data on how scheduling was done by humans (before our algorithms were deployed) and how schedules are generated after deployment of our algorithms. For measures of interest to security agencies, e.g., predictability in schedules, it is possible to compare the actual humangenerated schedules vs our algorithmic schedules. Again, gametheoretic schedulers are seen to perform significantly better by avoiding predictability and yet ensuring that more important targets are covered with higher frequency of patrols. Some of this data is published (Shieh et al. 2012) and is also shown in Fig. 18.
 4.
“Adversary” teams simulate attack: In some cases, security agencies have deployed adversary perspective teams or “mock attacker teams” that will attempt to conduct surveillance to plan attacks; this is done before and after our algorithms have been deployed to check which security deployments worked better. This was done by the US Coast Guard indicating that the gametheoretic scheduler provided higher levels of deterrence (Shieh et al. 2012).
 5.
Realtime comparison: human vs algorithm: This is a test we ran on the metro trains in Los Angeles. For a day of patrol scheduling, we provided headtohead comparison of human schedulers trying to schedule 90 officers on patrols vs an automated gametheoretic scheduler. External evaluators then provided an evaluation of these patrols; the evaluators did not know who had generated each of the schedules. The results show that while human schedulers required significant effort even for generating one schedule (almost a day) and the gametheoretic scheduler ran quickly, the external evaluators rated the gametheoretic schedulers higher (with statistical significance) (Fave et al. 2014a).
 6.
Actual data from deployment: This is another test run on the metro trains in LA. We had a comparison of gametheoretic scheduler vs an alternative (in this case a uniform random scheduler augmented with real time human intelligence) to check fare evaders. In 21 days of patrols, the gametheoretic scheduler led to significantly higher numbers of fare evaders captured than the alternative (Fave et al. 2014a,b).
 7.
Domain expert evaluation (internal and external): There have been of course significant numbers of evaluations done by domain experts comparing their own scheduling method with gametheoretic schedulers, and repeatedly the gametheoretic schedulers have come out ahead. The fact that our software is now in use for several years at several different important airports, ports, air traffic, and so on is an indicator to us that the domain experts must consider this software of some value .
8 Conclusions
Security is recognized as a worldwide challenge, and game theory is an increasingly important paradigm for reasoning about complex security resource allocation. We have shown that the general model of security games is applicable (with appropriate variations) to varied security scenarios. There are applications deployed in the real world that have led to a measurable improvement in security. We presented approaches to address four significant challenges: scalability, uncertainty, bounded rationality, and field evaluation in security games.
In short, we introduced specific techniques to handle each of these challenges. For scalability, we introduced three approaches: (i) incremental strategy generation for addressing the problem of large defender strategy spaces, (ii) double oracle incremental strategy generation w.r.t large defender and attacker strategy spaces, (iii) compact representation of strategies for the case of mobile resources and moving targets, (iv) cutting plane (incremental constraint generation) for handling multiple boundedly rational attacker, and (v) a hierarchical approach for incorporating finegrained spatial information. For handling uncertainty we introduced two approaches: (i) dimensionality reduction in uncertainty space for addressing a unification of uncertainties and (ii) Markov Decision Process with marginal strategy representation w.r.t dynamic execution uncertainty. In terms of handling attacker bounded rationality and bounded surveillance, we propose different behavioral models to capture the attackers’ behaviors and introduce human subject experiments with game simulation to learn such behavioral models. Finally, for addressing field evaluation in realworld problems, we discussed two approaches: (i) data from deployment and (ii) mock attacker team.
While the deployed gametheoretic applications have provided a promising start, significant amount of research remains to be done. These are largescale interdisciplinary research challenges that call upon multiagent researchers to work with researchers in other disciplines, be “on the ground” with domain experts and examine realworld constraints and challenges that cannot be abstracted away.
Footnotes
 1.
Note that not all security games in the literature are Stackelberg security games (see Alpcan and Başar 2010).
 2.
Note that mixed strategy solutions apply beyond Stackelberg games.
 3.
DOBSS addresses Bayesian Stackelberg games with multiple follower types, but for simplicity we do not introduce Bayesian Stackelberg games here.
 4.
We use the term green security games also to avoid any confusion that may come about given that terms related to the environment and security have been adopted for other uses. For example, the term “environmental security” broadly speaking refers to threats posed to humans due to environmental issues, e.g., climate change or shortage of food. The term “environmental criminology” on the other hand refers to analysis and understanding of how different environments affect crime.
References
 Alarie Y, Dionne G (2001) Lottery decisions and probability weighting function. J Risk Uncertain 22(1):21–33CrossRefzbMATHGoogle Scholar
 Alpcan T, Başar T (2010) Network security: a decision and gametheoretic approach. Cambridge University Press, Cambridge/New YorkCrossRefzbMATHGoogle Scholar
 An B, Tambe M, Ordonez F, Shieh E, Kiekintveld C (2011) Refinement of strong Stackelberg equilibria in security games. In: Proceedings of the 25th conference on artificial intelligence, pp 587–593Google Scholar
 Blocki J, Christin N, Datta A, Procaccia AD, Sinha A (2013) Audit games. In: Proceedings of the 23rd international joint conference on artificial intelligenceGoogle Scholar
 Blocki J, Christin N, Datta A, Procaccia AD, Sinha A (2015) Audit games with multiple defender resources. In: AAAI conference on artificial intelligence (AAAI)Google Scholar
 Botea A, Müller M, Schaeffer J (2004) Near optimal hierarchical pathfinding. J Game Dev 1:7–28Google Scholar
 Breton M, Alg A, Haurie A (1988) Sequential Stackelberg equilibria in twoperson games. Optim Theory Appl 59(1):71–97MathSciNetCrossRefzbMATHGoogle Scholar
 Brunswik E (1952) The conceptual framework of psychology, vol 1. University of Chicago Press, Chicago/LondonGoogle Scholar
 Chandran R, Beitchman G (2008) Battle for Mumbai ends, death toll rises to 195. Times of India. http://articles.timesofindia.indiatimes.com/20081129/india/27930171_1_tajhotelthreeterroristsnarimanhouse Google Scholar
 Chapron G, Miquelle DG, Lambert A, Goodrich JM, Legendre S, Clobert J (2008) The impact on tigers of poaching versus prey depletion. J Appl Ecol 45:1667–1674CrossRefGoogle Scholar
 Conitzer V, Sandholm T (2006) Computing the optimal strategy to commit to. In: Proceedings of the ACM conference on electronic commerce (ACMEC), pp 82–90Google Scholar
 Durkota K, Lisy V, Kiekintveld C, Bosansky B (2015) Gametheoretic algorithms for optimal network security hardening using attack graphs. In: Proceedings of the 2015 intemational conference on autonomous agents and multiagent systems (AAMAS’15)Google Scholar
 EtchartVincent N (2009) Probability weighting and the level and spacing of outcomes: an experimental study over losses. J Risk Uncertain 39(1):45–63CrossRefzbMATHGoogle Scholar
 Fang F, Jiang AX, Tambe M (2013) Protecting moving targets with multiple mobile resources. J Artif Intell Res 48:583–634MathSciNetzbMATHGoogle Scholar
 Fang F, Stone P, Tambe M (2015) When security games go green: designing defender strategies to prevent poaching and illegal fishing. In: International joint conference on artificial intelligence (IJCAI)Google Scholar
 Fang F, Nguyen TH, Pickles R, Lam WY, Clements GR, An B, Singh A, Tambe M, Lemieux A (2016) Deploying paws: field optimization of the protection assistant for wildlife security. In: Proceedings of the twentyeighth innovative applications of artificial intelligence conference (IAAI 2016)Google Scholar
 Fave FMD, Brown M, Zhang C, Shieh E, Jiang AX, Rosoff H, Tambe M, Sullivan J (2014a) Security games in the field: an initial study on a transit system (extended abstract). In: International conference on autonomous agents and multiagent systems (AAMAS) [Short paper]Google Scholar
 Fave FMD, Jiang AX, Yin Z, Zhang C, Tambe M, Kraus S, Sullivan J (2014b) Gametheoretic security patrolling with dynamic execution uncertainty and a case study on a real transit system. J Artif Intell Res 50:321–367MathSciNetzbMATHGoogle Scholar
 Gonzalez R, Wu G (1999) On the shape of the probability weighting function. Cogn Psychol 38:129–166CrossRefGoogle Scholar
 Hamilton BA (2007) Faregating analysis. Report commissioned by the LA Metro. http://boardarchives.metro.net/Items/2007/11_November/20071115EMACItem2%7.pdfGoogle Scholar
 Haskell WB, Kar D, Fang F, Tambe M, Cheung S, Denicola LE (2014) Robust protection of fisheries with compass. In: Innovative applications of artificial intelligence (IAAI)Google Scholar
 Humphrey SJ, Verschoor A (2004) The probability weighting function: experimental evidence from Uganda, India and Ethiopia. Econ Lett 84(3):419–425CrossRefGoogle Scholar
 IUCN (2015) IUCN red list of threatened species. version 2015.2. http://www.iucnredlist.org
 Jain M, Kardes E, Kiekintveld C, Ordonez F, Tambe M (2010a) Security games with arbitrary schedules: a branch and price approach. In: Proceedings of the 24th AAAI conference on artificial intelligence, pp 792–797Google Scholar
 Jain M, Tsai J, Pita J, Kiekintveld C, Rathi S, Tambe M, Ordonez F (2010b) Software assistants for randomized patrol planning for the LAX airport police and the federal air marshal service. Interfaces 40:267–290CrossRefGoogle Scholar
 Jain M, Korzhyk D, Vanek O, Pechoucek M, Conitzer V, Tambe M (2011) A double oracle algorithm for zerosum security games on graphs. In: Proceedings of the 10th international conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 Jain M, Tambe M, Conitzer V (2013) Security scheduling for realworld networks. In: AAMASGoogle Scholar
 Jiang A, Yin Z, Kraus S, Zhang C, Tambe M (2013a) Gametheoretic randomization for security patrolling with dynamic execution uncertainty. In: AAMASGoogle Scholar
 Jiang AX, Nguyen TH, Tambe M, Procaccia AD (2013b) Monotonic maximin: a robust Stackelberg solution against boundedly rational followers. In: Conference on decision and game theory for security (GameSec)Google Scholar
 Johnson M, Fang F, Yang R, Tambe M, Albers H (2012) Patrolling to maximize pristine forest area. In: Proceedings of the AAAI spring symposium on game theory for security, sustainability and healthGoogle Scholar
 Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econometrica 47(2):263–291CrossRefzbMATHGoogle Scholar
 Kar D, Fang F, Fave FD, Sintov N, Tambe M (2015) “a game of thrones”: when human behavior models compete in repeated Stackelberg security games. In: International conference on autonomous agents and multiagent systems (AAMAS 2015)Google Scholar
 Keteyian A (2010) TSA: federal air marshals. http://www.cbsnews.com/stories/2010/02/01/earlyshow/main6162291.shtml. Retrieved 1 Feb 2011
 Kiekintveld C, Jain M, Tsai J, Pita J, Tambe M, Ordonez F (2009) Computing optimal randomized resource allocations for massive security games. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems (AAMAS), pp 689–696Google Scholar
 Korzhyk D, Conitzer V, Parr R (2010) Complexity of computing optimal Stackelberg strategies in security resource allocation games. In: Proceedings of the 24th AAAI conference on artificial intelligence, pp 805–810Google Scholar
 Leitmann G (1978) On generalized Stackelberg strategies. Optim Theory Appl 26(4):637–643MathSciNetCrossRefzbMATHGoogle Scholar
 Lipton R, Markakis E, Mehta A (2003) Playing large games using simple strategies. In: EC: proceedings of the ACM conference on electronic commerce. ACM, New York, pp 36–41Google Scholar
 McFadden D (1972) Conditional logit analysis of qualitative choice behavior. Technical reportGoogle Scholar
 McFadden D (1976) Quantal choice analysis: a survey. Ann Econ Soc Meas 5(4):363–390Google Scholar
 McKelvey RD, Palfrey TR (1995) Quantal response equilibria for normal form games. Games Econ Behav 10(1):6–38MathSciNetCrossRefzbMATHGoogle Scholar
 Nguyen TH, Yang R, Azaria A, Kraus S, Tambe M (2013) Analyzing the effectiveness of adversary modeling in security games. In: Conference on artificial intelligence (AAAI)Google Scholar
 Nguyen T, Jiang A, Tambe M (2014) Stop the compartmentalization: Unified robust algorithms for handling uncertainties in security games. In: International conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 Nguyen TH, Fave FMD, Kar D, Lakshminarayanan AS, Yadav A, Tambe M, Agmon N, Plumptre AJ, Driciru M, Wanyama F, Rwetsiba A (2015) Making the most of our regrets: regretbased solutions to handle payoff uncertainty and elicitation in green security games. In: Conference on decision and game theory for securityzbMATHGoogle Scholar
 Paruchuri P, Pearce JP, Marecki J, Tambe M, Ordonez F, Kraus S (2008) Playing games with security: an efficient exact algorithm for Bayesian Stackelberg games. In: Proceedings of the 7th international conference on autonomous agents and multiagent systems (AAMAS), pp 895–902Google Scholar
 Pita J, Jain M, Western C, Portway C, Tambe M, Ordonez F, Kraus S, Parachuri P (2008) Deployed ARMOR protection: the application of a gametheoretic model for security at the Los Angeles International Airport. In: Proceedings of the 7th international conference on autonomous agents and multiagent systems (AAMAS), pp 125–132Google Scholar
 Pita J, Bellamane H, Jain M, Kiekintveld C, Tsai J, Ordóñez F, Tambe M (2009a) Security applications: lessons of realworld deployment. ACM SIGecom Exchanges 8(2):5CrossRefGoogle Scholar
 Pita J, Jain M, Ordóñez F, Tambe M, Kraus S, MagoriCohen R (2009b) Effective solutions for realworld Stackelberg games: when agents must deal with human uncertainties. In: The eighth international conference on autonomous agents and multiagent systemszbMATHGoogle Scholar
 Pita J, John R, Maheswaran R, Tambe M, Kraus S (2012) A robust approach to addressing human adversaries in security games. In: European conference on artificial intelligence (ECAI)Google Scholar
 Sanderson E, Forrest J, Loucks C, Ginsberg J, Dinerstein E, Seidensticker J, Leimgruber P, Songer M, Heydlauff A, O’Brien T, Bryja G, Klenzendorf S, Wikramanayake E (2006) Setting priorities for the conservation and recovery of wild tigers: 2005–2015. The technical assessment. Technical report, WCS, WWF, Smithsonian, and NFWFSTF, New York/Washington, DCGoogle Scholar
 Shieh E, An B, Yang R, Tambe M, Baldwin C, DiRenzo J, Maule B, Meyer G (2012) PROTECT: a deployed game theoretic system to protect the ports of the United States. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 von Stackelberg H (1934) Marktform und Gleichgewicht. Springer, ViennaGoogle Scholar
 von Stengel B, Zamir S (2004) Leadership with commitment to mixed strategies. Technical report, LSECDAM200401, CDAM research reportGoogle Scholar
 Tambe M (2011) Security and game theory: algorithms, deployed systems, lessons learned. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
 Tversky A, Kahneman D (1992) Advances in prospect theory: cumulative representation of uncertainty. J Risk Uncertain 5(4):297–323CrossRefzbMATHGoogle Scholar
 Vanek O, Yin Z, Jain M, Bosansky B, Tambe M, Pechoucek M (2012) Gametheoretic resource allocation for malicious packet detection in computer networks. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 Yang R, Kiekintveld C, Ordonez F, Tambe M, John R (2011) Improving resource allocation strategy against human adversaries in security games. In: IJCAIzbMATHGoogle Scholar
 Yang R, Ordonez F, Tambe M (2012) Computing optimal strategy against quantal response in security games. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 Yang R, Jiang AX, Tambe M, Ordóñez F (2013a) Scalingup security games with boundedly rational adversaries: a cuttingplane approach. In: Proceedings of the twentythird international joint conference on artificial intelligence. AAAI Press, pp 404–410Google Scholar
 Yang R, Jiang AX, Tambe M, Ordóñez F (2013b) Scalingup security games with boundedly rational adversaries: a cuttingplane approach. In: IJCAIGoogle Scholar
 Yang R, Ford B, Tambe M, Lemieux A (2014) Adaptive resource allocation for wildlife protection against illegal poachers. In: International conference on autonomous agents and multiagent systems (AAMAS)Google Scholar
 Yin Z, Jain M, Tambe M, Ordonez F (2011) Riskaverse strategies for security games with execution and observational uncertainty. In: Proceedings of the 25th AAAI conference on artificial intelligence (AAAI), pp 758–763Google Scholar
 Yin Z, Jiang A, Johnson M, Tambe M, Kiekintveld C, LeytonBrown K, Sandholm T, Sullivan J (2012) TRUSTS: scheduling randomized patrols for Fare Inspection in Transit Systems. In: Proceedings of the 24th conference on innovative applications of artificial intelligence (IAAI)Google Scholar