Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

Lee, Ritchie; Wolpert, David H.; Bono, James; Backhaus, Scott; Bent, Russell; Tracey, Brendan

doi:10.1007/978-3-642-36406-8_4

Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

Ritchie Lee⁴,
David H. Wolpert^5,6,
James Bono⁷,
Scott Backhaus⁸,
Russell Bent⁹ &
…
Brendan Tracey¹⁰

Chapter

1790 Accesses
2 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 474))

Abstract

This chapter introduces a novel framework for modeling interacting humans in a multi-stage game. This ”iterated semi network-form game” framework has the following desirable characteristics: (1) Bounded rational players, (2) strategic players (i.e., players account for one another’s reward functions when predicting one another’s behavior), and (3) computational tractability even on real-world systems. We achieve these benefits by combining concepts from game theory and reinforcement learning. To be precise, we extend the bounded rational ”level-K reasoning” model to apply to games over multiple stages. Our extension allows the decomposition of the overall modeling problem into a series of smaller ones, each of which can be solved by standard reinforcement learning algorithms. We call this hybrid approach ”level-K reinforcement learning”. We investigate these ideas in a cyber battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bono, J., Wolpert, D.H.: Decision-theoretic prediction and policy design of gdp slot auctions (2011), Available at SSRN: http://ssrn.com/abstract=1815222
Brunner, C., Camerer, C.F., Goeree, J.K.: A correction and re-examination of ’stationary concepts for experimental 2x2 games’. American Economic Review (2010)
Google Scholar
Busoniu, L., Babuska, R., De Schutter, B., Damien, E.: Reinforcement learning and dynamic programming using function approximators. CRC Press (2010)
Google Scholar
Camerer, C.F.: An experimental test of several generalized utility theories. Journal of Risk and Uncertainty 2(1), 61–104 (1989)
Article Google Scholar
Camerer, C.F.: Behavioral game theory: experiments in strategic interaction. Princeton University Press (2003)
Google Scholar
Camerer, C., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Quarterly Journal of Economics 119(3), 861–898 (2006)
Article Google Scholar
Cárdenas, A., Amin, A., Sastry, S.: Research challenges for the security of control systems. In: Proceedings of the 3rd Conference on Hot Topics in Security, Berkeley, CA, USA, pp. 6:1–6:6. USENIX Association (2008)
Google Scholar
Chellapilla, K., Fogel, D.B.: Evolving an expert checkers playing program without using human expertise. IEEE Transactions on Evolutionary Computation 5(4), 422–428 (2001)
Article Google Scholar
Costa-Gomes, M., Crawford, V.: Cognition and behavior in two-person guessing games: An experimental study. American Economic Review 96(5), 1737–1768 (2006)
Article Google Scholar
Costa-Gomes, M.A., Crawford, V.P., Iriberri, N.: Comparing models of strategic thinking in Van Huyck, Battalio, and Beil’s coordination games. Journal of the European Economic Association (2009)
Google Scholar
Crawford, V.P.: Level-k thinking. Plenary lecture. 2007 North American Meeting of the Economic Science Association. Tucson, Arizona (2007)
Google Scholar
Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)
Article Google Scholar
Fogel, D.B.: Evolutionary computation: Toward a new philosophy of machine intelligence, 3rd edn. IEEE Press (2006)
Google Scholar
Fudenberg, D., Levine, D.K.: The theory of learning in games. MIT Press (1998)
Google Scholar
Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24, 49–79 (2005)
MATH Google Scholar
Halpern, J.Y., Rego, L.C.: Extensive games with possibly unaware players (2007) (Working paper), http://www.cs.cornell.edu/home/halpern/papers/aamas06.pdf
Harsanyi, J.: Games with Incomplete Information Played by Bayesian Players, I-III. Part I. The Basic Model. Management Science 14(3) (1967)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kagel, J.H., Roth, A.E.: The handbook of experimental economics. Princeton University Press (1995)
Google Scholar
Kandori, M., Mailath, M., Rob, R.: Learning, mutation, and long run equilibria in games. Econometrica 61(1), 29–53 (1993)
Article MathSciNet MATH Google Scholar
Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques. MIT Press (2009)
Google Scholar
Kullback, S.: Information theory and statistics. John Wiley and Sons, New York (1959)
MATH Google Scholar
Kundur, P.: Power system stability and control. McGraw-Hill, New York (1993)
Google Scholar
Lee, R., Wolpert, D.: Game Theoretic Modeling of Pilot Behavior during Mid-Air Encounters. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. ISRL, vol. 28, pp. 75–111. Springer, Heidelberg (2012)
Chapter Google Scholar
Maia, T.: Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4), 343–364 (2009)
Article MathSciNet Google Scholar
Maia, T.V., Frank, M.J.: From reinforcement learning models to psychiatric and neurological. Nature Neuroscience 14, 154–162 (2011)
Article Google Scholar
McKelvey, R., Palfrey, T.R.: Quantal response equilibria for normal form games. Games and Economic Behavior 10(1), 6–38 (1995)
Article MathSciNet MATH Google Scholar
McKelvey, R., Palfrey, T.R.: Quantal response equilibria for extensive form games. Experimental Economics 1, 9–41 (1998), 10.1023/A:1009905800005
Google Scholar
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. The Journal of Artificial Intelligence Research 11, 241–276 (1999)
MATH Google Scholar
Myerson, R.B.: Game theory: Analysis of conflict. Harvard University Press (1997)
Google Scholar
Nagel, R.: Unraveling in guessing games: An experimental study. The American Economic Review 85(5), 1313–1326 (1995)
Google Scholar
Plott, C.R., Smith, V.L.: The handbook of experimental economics. North-Holland, Oxford (2008)
Google Scholar
Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer (2004)
Google Scholar
Rummery, G.A., Niranja, M.: Online Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166. Engineering department, Cambridge University (1994)
Google Scholar
Simon, H.A.: Rational choice and the structure of the environment. Psychological Review 63(2), 129–138 (1956)
Article Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision problems. In: Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, pp. 284–292 (1994)
Google Scholar
Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10(1), 218–254 (1995)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (1998)
Google Scholar
Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proceedings of the IEEE 93(5), 965–979 (2005)
Article Google Scholar
Turitsyn, K., Sulc, P., Backhaus, S., Chertkov, M.: Options for control of reactive power by distributed photovoltaic generators. Proceedings of the IEEE 99(6), 1063–1073 (2011)
Article Google Scholar
Wolpert, D.H., Bono, J.W.: Distribution-valued solution concepts. Working paper (2011)
Google Scholar
Wolpert, D.H.: Unawareness, information theory, and multiagent influence diagrams. Working paper (2012)
Google Scholar
Wright, J.R., Leyton-Brown, K.: Beyond equilibrium: Predicting human behavior in normal form games. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI 2010 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University Silicon Valley, NASA Ames Research Park, Mail Stop 23-11, Moffett Field, CA, USA, 94035
Ritchie Lee
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, USA, 87501
David H. Wolpert
Los Alamos National Laboratory, MS B256, Los Alamos, NM, USA, 87545
David H. Wolpert
American University, 4400 Massachusetts Ave., NW, Washington DC, USA, 20016
James Bono
Los Alamos National Laboratory, MS K764, Los Alamos, NM, USA, 87545
Scott Backhaus
Los Alamos National Laboratory, MS C933, Los Alamos, NM, USA, 87545
Russell Bent
Stanford University, 496 Lomita Mall, Stanford, CA, USA, 94305
Brendan Tracey

Authors

Ritchie Lee
View author publications
You can also search for this author in PubMed Google Scholar
David H. Wolpert
View author publications
You can also search for this author in PubMed Google Scholar
James Bono
View author publications
You can also search for this author in PubMed Google Scholar
Scott Backhaus
View author publications
You can also search for this author in PubMed Google Scholar
Russell Bent
View author publications
You can also search for this author in PubMed Google Scholar
Brendan Tracey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ritchie Lee .

Editor information

Editors and Affiliations

Theory and Automation, Academy of Sciences, Institute of Information, Pod vodarenskou vezi 4, Prague, 182 08, Czech Republic
Tatiana V. Guy
and Automation, Academy of Sciences, Institute of Information Theory, Pod vodarenskou vezi 4, Prague, Czech Republic
Miroslav Karny
Los Alamos National Laboratory, MS B256, Los Alamos, 87545, USA
David Wolpert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, R., Wolpert, D.H., Bono, J., Backhaus, S., Bent, R., Tracey, B. (2013). Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future. In: Guy, T., Karny, M., Wolpert, D. (eds) Decision Making and Imperfection. Studies in Computational Intelligence, vol 474. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36406-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-36406-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36405-1
Online ISBN: 978-3-642-36406-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics