Abstract
In the Full Information Game the player sequentially selects one out of K actions. After the player has made his choice, the K payoffs of the actions become known and the player receives the payoff of the action he selected. The Gain-Loss game is the variant of this game, where both gains from [0,1] and losses from [0,1] are possible payoffs. This game has two well studied special cases: the Full Loss game where only losses are allowed, and the Full Gain game where only gains are allowed. For each of these cases the appropriate variant of Freund and Schapire’s algorithm Hedge [7,3] can be used to obtain nearly optimal regrets. Both of these variants have an immediate adaptations to the Full Gain-Loss game. However these solutions are not always optimal.
The first result of this paper is a new variant of algorithm Hedge that achieves a regret of \(O(\sqrt{ \ln K} \sqrt{G_j + L_j})\) for the Full Gain-Loss game, where j is the index of one of the actions in the game, G j , the total gain of j, is the sum of all the positive payoffs that the jth action had in the game, and L j is the absolute value of the sum of all its negative payoffs. In addition, the new algorithm achieves matches the performance of the known Hedge algorithms in the special cases of gains only and losses only.
The second result is an application of the new algorithm that achieves new upper bounds on the regrets of the original Full Gain game and Full Loss game. The new upper bounds are a function of a new parameter.
The third result is a method for combining online learning algorithms online. This method yields an \(O\big(\min \big(\sqrt{L_{opt}\ {\ln K}}\ , \sqrt{(T-L_{opt})\ {\ln K}}\big) \big) \) upper bound on the regret of the the Full Loss game, and an \(O\big(\min \big(\sqrt{G_{opt}\ {\ln K}}\ , \sqrt{(T-G_{opt})\ {\ln K}}\big) \big) \) upper bound on the regret of the the Full Gain game.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allenberg, C.: Individual sequence prediction - upper bounds and applications to complexity. In: Proceedings of the 12th Annual Conference on Computer Learning Theory (1999)
Allenberg, C., Auer, P., Cesa-Bianchi, N.: On the loss version of the adversarial multi-armed bandit problem (to appear)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundation of Computer Science (1995)
Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and Self-Confidence On-line Learning Algorithms. JCSS 64(1), 48–75 (2002)
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., Warmuth, M.K.: How to use expert advice. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, pp. 382–391 (1993)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)
Fiat, A., Foster, D.P., Karloff, H., Rabaniand, Y., Ravid, Y.: Competitive algorithms for Layered Graph Traversal. In: Proceedings of the 32th Annual Symposium on Foundation of Computer Science, p. 288 (1991)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of online learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)
Gittins, J.C.: Multi-armed Bandit Allocation Indices. John Wiley and Sons, Chichester (1989)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Allenberg-Neeman, C., Neeman, B. (2004). Full Information Game with Gains and Losses. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-30215-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23356-5
Online ISBN: 978-3-540-30215-5
eBook Packages: Springer Book Archive