On-Line Learning with Imperfect Monitoring

Mannor, Shie; Shimkin, Nahum

doi:10.1007/978-3-540-45167-9_40

On-Line Learning with Imperfect Monitoring

Shie Mannor⁸ &
Nahum Shimkin⁹

Conference paper

5268 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent that agrees with past observations. Our goal is to have the (unobserved) average reward above the POBE. For the case where the observations (but not necessarily the rewards) depend on the opponent play alone, an algorithm for attaining the POBE is derived. This algorithm is based on an application of approachability theory combined with a worst-case view over the unobserved rewards. We also suggest a simplified solution concept for general signaling structure. This concept may fall short of the POBE.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fudenberg, D., Levine, D.: Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control 19, 1065–1190 (1995)
Article MATH MathSciNet Google Scholar
Hannan, J.: Approximation to Bayes risk in repeated play. In: Dresher, M., Tucker, A.W., Wolde, P. (eds.) Contribution to The Theory of Games, III, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Blackwell, D.: Controlled random walks. In: Proc. International Congress of Mathematicians, 1954, vol. III, pp. 336–338. North-Holland, Amsterdam (1956)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multiarmed bandit problem. To appear in SIAM journal of Computation (2002)
Google Scholar
Rustichini, A.: Minimizing regret: the general case. Games and Economic Behavior 29, 224–243 (1999)
Article MATH MathSciNet Google Scholar
Piccolboni, A., Schindelhauer, C.: Discrete prediction games with arbitrary feedback and loss. In: Helmbold, D., Williamson, B. (eds.) 14th Annual Conference on Computation Learning Theory, pp. 208–223. Springer, Heidelberg (2001)
Google Scholar
Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1), 1–8 (1956)
MATH MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Google Scholar
Ramakrishnan, K., Floyd, S., Black, D.: The addition of explicit congestion notification (ECN) to IP. IETF, Tech. Rep. (2001)
Google Scholar
Mannor, S., Shimkin, N.: Regret minimization in signal space for repeated matrix games with partial observations. Technical report EE- 1242, Faculty of Electrical Engineering, Technion, Israel (March 2000), Available from http://web.mit.edu/~hie/www/pubs.htm
Shimkin, N.: Extremal large deviations in controlled I.I.D. processes with applications to hypothesis testing. Adv. Appl. Prob. 25, 875–894 (1993)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Shie Mannor
Department of Electrical Engineering, Technion, Haifa, 32000, Israel
Nahum Shimkin

Authors

Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MPI for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Bernhard Schölkopf
University of California, Santa Cruz
Manfred K. Warmuth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2003). On-Line Learning with Imperfect Monitoring. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-45167-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics