Skip to main content

On-Line Learning with Imperfect Monitoring

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent that agrees with past observations. Our goal is to have the (unobserved) average reward above the POBE. For the case where the observations (but not necessarily the rewards) depend on the opponent play alone, an algorithm for attaining the POBE is derived. This algorithm is based on an application of approachability theory combined with a worst-case view over the unobserved rewards. We also suggest a simplified solution concept for general signaling structure. This concept may fall short of the POBE.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fudenberg, D., Levine, D.: Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control 19, 1065–1190 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  2. Hannan, J.: Approximation to Bayes risk in repeated play. In: Dresher, M., Tucker, A.W., Wolde, P. (eds.) Contribution to The Theory of Games, III, pp. 97–139. Princeton University Press, Princeton (1957)

    Google Scholar 

  3. Blackwell, D.: Controlled random walks. In: Proc. International Congress of Mathematicians, 1954, vol. III, pp. 336–338. North-Holland, Amsterdam (1956)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multiarmed bandit problem. To appear in SIAM journal of Computation (2002)

    Google Scholar 

  5. Rustichini, A.: Minimizing regret: the general case. Games and Economic Behavior 29, 224–243 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  6. Piccolboni, A., Schindelhauer, C.: Discrete prediction games with arbitrary feedback and loss. In: Helmbold, D., Williamson, B. (eds.) 14th Annual Conference on Computation Learning Theory, pp. 208–223. Springer, Heidelberg (2001)

    Google Scholar 

  7. Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1), 1–8 (1956)

    MATH  MathSciNet  Google Scholar 

  8. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Google Scholar 

  9. Ramakrishnan, K., Floyd, S., Black, D.: The addition of explicit congestion notification (ECN) to IP. IETF, Tech. Rep. (2001)

    Google Scholar 

  10. Mannor, S., Shimkin, N.: Regret minimization in signal space for repeated matrix games with partial observations. Technical report EE- 1242, Faculty of Electrical Engineering, Technion, Israel (March 2000), Available from http://web.mit.edu/~hie/www/pubs.htm

  11. Shimkin, N.: Extremal large deviations in controlled I.I.D. processes with applications to hypothesis testing. Adv. Appl. Prob. 25, 875–894 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  12. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2003). On-Line Learning with Imperfect Monitoring. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics