Skip to main content

Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 474))

Abstract

This chapter introduces a novel framework for modeling interacting humans in a multi-stage game. This ”iterated semi network-form game” framework has the following desirable characteristics: (1) Bounded rational players, (2) strategic players (i.e., players account for one another’s reward functions when predicting one another’s behavior), and (3) computational tractability even on real-world systems. We achieve these benefits by combining concepts from game theory and reinforcement learning. To be precise, we extend the bounded rational ”level-K reasoning” model to apply to games over multiple stages. Our extension allows the decomposition of the overall modeling problem into a series of smaller ones, each of which can be solved by standard reinforcement learning algorithms. We call this hybrid approach ”level-K reinforcement learning”. We investigate these ideas in a cyber battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bono, J., Wolpert, D.H.: Decision-theoretic prediction and policy design of gdp slot auctions (2011), Available at SSRN: http://ssrn.com/abstract=1815222

  2. Brunner, C., Camerer, C.F., Goeree, J.K.: A correction and re-examination of ’stationary concepts for experimental 2x2 games’. American Economic Review (2010)

    Google Scholar 

  3. Busoniu, L., Babuska, R., De Schutter, B., Damien, E.: Reinforcement learning and dynamic programming using function approximators. CRC Press (2010)

    Google Scholar 

  4. Camerer, C.F.: An experimental test of several generalized utility theories. Journal of Risk and Uncertainty 2(1), 61–104 (1989)

    Article  Google Scholar 

  5. Camerer, C.F.: Behavioral game theory: experiments in strategic interaction. Princeton University Press (2003)

    Google Scholar 

  6. Camerer, C., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Quarterly Journal of Economics 119(3), 861–898 (2006)

    Article  Google Scholar 

  7. Cárdenas, A., Amin, A., Sastry, S.: Research challenges for the security of control systems. In: Proceedings of the 3rd Conference on Hot Topics in Security, Berkeley, CA, USA, pp. 6:1–6:6. USENIX Association (2008)

    Google Scholar 

  8. Chellapilla, K., Fogel, D.B.: Evolving an expert checkers playing program without using human expertise. IEEE Transactions on Evolutionary Computation 5(4), 422–428 (2001)

    Article  Google Scholar 

  9. Costa-Gomes, M., Crawford, V.: Cognition and behavior in two-person guessing games: An experimental study. American Economic Review 96(5), 1737–1768 (2006)

    Article  Google Scholar 

  10. Costa-Gomes, M.A., Crawford, V.P., Iriberri, N.: Comparing models of strategic thinking in Van Huyck, Battalio, and Beil’s coordination games. Journal of the European Economic Association (2009)

    Google Scholar 

  11. Crawford, V.P.: Level-k thinking. Plenary lecture. 2007 North American Meeting of the Economic Science Association. Tucson, Arizona (2007)

    Google Scholar 

  12. Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)

    Article  Google Scholar 

  13. Fogel, D.B.: Evolutionary computation: Toward a new philosophy of machine intelligence, 3rd edn. IEEE Press (2006)

    Google Scholar 

  14. Fudenberg, D., Levine, D.K.: The theory of learning in games. MIT Press (1998)

    Google Scholar 

  15. Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24, 49–79 (2005)

    MATH  Google Scholar 

  16. Halpern, J.Y., Rego, L.C.: Extensive games with possibly unaware players (2007) (Working paper), http://www.cs.cornell.edu/home/halpern/papers/aamas06.pdf

  17. Harsanyi, J.: Games with Incomplete Information Played by Bayesian Players, I-III. Part I. The Basic Model. Management Science 14(3) (1967)

    Google Scholar 

  18. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  19. Kagel, J.H., Roth, A.E.: The handbook of experimental economics. Princeton University Press (1995)

    Google Scholar 

  20. Kandori, M., Mailath, M., Rob, R.: Learning, mutation, and long run equilibria in games. Econometrica 61(1), 29–53 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  21. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques. MIT Press (2009)

    Google Scholar 

  22. Kullback, S.: Information theory and statistics. John Wiley and Sons, New York (1959)

    MATH  Google Scholar 

  23. Kundur, P.: Power system stability and control. McGraw-Hill, New York (1993)

    Google Scholar 

  24. Lee, R., Wolpert, D.: Game Theoretic Modeling of Pilot Behavior during Mid-Air Encounters. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. ISRL, vol. 28, pp. 75–111. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  25. Maia, T.: Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4), 343–364 (2009)

    Article  MathSciNet  Google Scholar 

  26. Maia, T.V., Frank, M.J.: From reinforcement learning models to psychiatric and neurological. Nature Neuroscience 14, 154–162 (2011)

    Article  Google Scholar 

  27. McKelvey, R., Palfrey, T.R.: Quantal response equilibria for normal form games. Games and Economic Behavior 10(1), 6–38 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  28. McKelvey, R., Palfrey, T.R.: Quantal response equilibria for extensive form games. Experimental Economics 1, 9–41 (1998), 10.1023/A:1009905800005

    Google Scholar 

  29. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. The Journal of Artificial Intelligence Research 11, 241–276 (1999)

    MATH  Google Scholar 

  30. Myerson, R.B.: Game theory: Analysis of conflict. Harvard University Press (1997)

    Google Scholar 

  31. Nagel, R.: Unraveling in guessing games: An experimental study. The American Economic Review 85(5), 1313–1326 (1995)

    Google Scholar 

  32. Plott, C.R., Smith, V.L.: The handbook of experimental economics. North-Holland, Oxford (2008)

    Google Scholar 

  33. Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer (2004)

    Google Scholar 

  34. Rummery, G.A., Niranja, M.: Online Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166. Engineering department, Cambridge University (1994)

    Google Scholar 

  35. Simon, H.A.: Rational choice and the structure of the environment. Psychological Review 63(2), 129–138 (1956)

    Article  Google Scholar 

  36. Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision problems. In: Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, pp. 284–292 (1994)

    Google Scholar 

  37. Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10(1), 218–254 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  38. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (1998)

    Google Scholar 

  39. Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proceedings of the IEEE 93(5), 965–979 (2005)

    Article  Google Scholar 

  40. Turitsyn, K., Sulc, P., Backhaus, S., Chertkov, M.: Options for control of reactive power by distributed photovoltaic generators. Proceedings of the IEEE 99(6), 1063–1073 (2011)

    Article  Google Scholar 

  41. Wolpert, D.H., Bono, J.W.: Distribution-valued solution concepts. Working paper (2011)

    Google Scholar 

  42. Wolpert, D.H.: Unawareness, information theory, and multiagent influence diagrams. Working paper (2012)

    Google Scholar 

  43. Wright, J.R., Leyton-Brown, K.: Beyond equilibrium: Predicting human behavior in normal form games. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ritchie Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lee, R., Wolpert, D.H., Bono, J., Backhaus, S., Bent, R., Tracey, B. (2013). Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future. In: Guy, T., Karny, M., Wolpert, D. (eds) Decision Making and Imperfection. Studies in Computational Intelligence, vol 474. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36406-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36406-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36405-1

  • Online ISBN: 978-3-642-36406-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics