Abstract
This chapter introduces a novel framework for modeling interacting humans in a multi-stage game. This ”iterated semi network-form game” framework has the following desirable characteristics: (1) Bounded rational players, (2) strategic players (i.e., players account for one another’s reward functions when predicting one another’s behavior), and (3) computational tractability even on real-world systems. We achieve these benefits by combining concepts from game theory and reinforcement learning. To be precise, we extend the bounded rational ”level-K reasoning” model to apply to games over multiple stages. Our extension allows the decomposition of the overall modeling problem into a series of smaller ones, each of which can be solved by standard reinforcement learning algorithms. We call this hybrid approach ”level-K reinforcement learning”. We investigate these ideas in a cyber battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bono, J., Wolpert, D.H.: Decision-theoretic prediction and policy design of gdp slot auctions (2011), Available at SSRN: http://ssrn.com/abstract=1815222
Brunner, C., Camerer, C.F., Goeree, J.K.: A correction and re-examination of ’stationary concepts for experimental 2x2 games’. American Economic Review (2010)
Busoniu, L., Babuska, R., De Schutter, B., Damien, E.: Reinforcement learning and dynamic programming using function approximators. CRC Press (2010)
Camerer, C.F.: An experimental test of several generalized utility theories. Journal of Risk and Uncertainty 2(1), 61–104 (1989)
Camerer, C.F.: Behavioral game theory: experiments in strategic interaction. Princeton University Press (2003)
Camerer, C., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Quarterly Journal of Economics 119(3), 861–898 (2006)
Cárdenas, A., Amin, A., Sastry, S.: Research challenges for the security of control systems. In: Proceedings of the 3rd Conference on Hot Topics in Security, Berkeley, CA, USA, pp. 6:1–6:6. USENIX Association (2008)
Chellapilla, K., Fogel, D.B.: Evolving an expert checkers playing program without using human expertise. IEEE Transactions on Evolutionary Computation 5(4), 422–428 (2001)
Costa-Gomes, M., Crawford, V.: Cognition and behavior in two-person guessing games: An experimental study. American Economic Review 96(5), 1737–1768 (2006)
Costa-Gomes, M.A., Crawford, V.P., Iriberri, N.: Comparing models of strategic thinking in Van Huyck, Battalio, and Beil’s coordination games. Journal of the European Economic Association (2009)
Crawford, V.P.: Level-k thinking. Plenary lecture. 2007 North American Meeting of the Economic Science Association. Tucson, Arizona (2007)
Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)
Fogel, D.B.: Evolutionary computation: Toward a new philosophy of machine intelligence, 3rd edn. IEEE Press (2006)
Fudenberg, D., Levine, D.K.: The theory of learning in games. MIT Press (1998)
Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24, 49–79 (2005)
Halpern, J.Y., Rego, L.C.: Extensive games with possibly unaware players (2007) (Working paper), http://www.cs.cornell.edu/home/halpern/papers/aamas06.pdf
Harsanyi, J.: Games with Incomplete Information Played by Bayesian Players, I-III. Part I. The Basic Model. Management Science 14(3) (1967)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kagel, J.H., Roth, A.E.: The handbook of experimental economics. Princeton University Press (1995)
Kandori, M., Mailath, M., Rob, R.: Learning, mutation, and long run equilibria in games. Econometrica 61(1), 29–53 (1993)
Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques. MIT Press (2009)
Kullback, S.: Information theory and statistics. John Wiley and Sons, New York (1959)
Kundur, P.: Power system stability and control. McGraw-Hill, New York (1993)
Lee, R., Wolpert, D.: Game Theoretic Modeling of Pilot Behavior during Mid-Air Encounters. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. ISRL, vol. 28, pp. 75–111. Springer, Heidelberg (2012)
Maia, T.: Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4), 343–364 (2009)
Maia, T.V., Frank, M.J.: From reinforcement learning models to psychiatric and neurological. Nature Neuroscience 14, 154–162 (2011)
McKelvey, R., Palfrey, T.R.: Quantal response equilibria for normal form games. Games and Economic Behavior 10(1), 6–38 (1995)
McKelvey, R., Palfrey, T.R.: Quantal response equilibria for extensive form games. Experimental Economics 1, 9–41 (1998), 10.1023/A:1009905800005
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. The Journal of Artificial Intelligence Research 11, 241–276 (1999)
Myerson, R.B.: Game theory: Analysis of conflict. Harvard University Press (1997)
Nagel, R.: Unraveling in guessing games: An experimental study. The American Economic Review 85(5), 1313–1326 (1995)
Plott, C.R., Smith, V.L.: The handbook of experimental economics. North-Holland, Oxford (2008)
Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer (2004)
Rummery, G.A., Niranja, M.: Online Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166. Engineering department, Cambridge University (1994)
Simon, H.A.: Rational choice and the structure of the environment. Psychological Review 63(2), 129–138 (1956)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision problems. In: Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, pp. 284–292 (1994)
Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10(1), 218–254 (1995)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (1998)
Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proceedings of the IEEE 93(5), 965–979 (2005)
Turitsyn, K., Sulc, P., Backhaus, S., Chertkov, M.: Options for control of reactive power by distributed photovoltaic generators. Proceedings of the IEEE 99(6), 1063–1073 (2011)
Wolpert, D.H., Bono, J.W.: Distribution-valued solution concepts. Working paper (2011)
Wolpert, D.H.: Unawareness, information theory, and multiagent influence diagrams. Working paper (2012)
Wright, J.R., Leyton-Brown, K.: Beyond equilibrium: Predicting human behavior in normal form games. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI 2010 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lee, R., Wolpert, D.H., Bono, J., Backhaus, S., Bent, R., Tracey, B. (2013). Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future. In: Guy, T., Karny, M., Wolpert, D. (eds) Decision Making and Imperfection. Studies in Computational Intelligence, vol 474. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36406-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-36406-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36405-1
Online ISBN: 978-3-642-36406-8
eBook Packages: EngineeringEngineering (R0)