Skip to main content

Full Information Game with Gains and Losses

  • Conference paper
Algorithmic Learning Theory (ALT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3244))

Included in the following conference series:

Abstract

In the Full Information Game the player sequentially selects one out of K actions. After the player has made his choice, the K payoffs of the actions become known and the player receives the payoff of the action he selected. The Gain-Loss game is the variant of this game, where both gains from [0,1] and losses from [0,1] are possible payoffs. This game has two well studied special cases: the Full Loss game where only losses are allowed, and the Full Gain game where only gains are allowed. For each of these cases the appropriate variant of Freund and Schapire’s algorithm Hedge [7,3] can be used to obtain nearly optimal regrets. Both of these variants have an immediate adaptations to the Full Gain-Loss game. However these solutions are not always optimal.

The first result of this paper is a new variant of algorithm Hedge that achieves a regret of \(O(\sqrt{ \ln K} \sqrt{G_j + L_j})\) for the Full Gain-Loss game, where j is the index of one of the actions in the game, G j , the total gain of j, is the sum of all the positive payoffs that the jth action had in the game, and L j is the absolute value of the sum of all its negative payoffs. In addition, the new algorithm achieves matches the performance of the known Hedge algorithms in the special cases of gains only and losses only.

The second result is an application of the new algorithm that achieves new upper bounds on the regrets of the original Full Gain game and Full Loss game. The new upper bounds are a function of a new parameter.

The third result is a method for combining online learning algorithms online. This method yields an \(O\big(\min \big(\sqrt{L_{opt}\ {\ln K}}\ , \sqrt{(T-L_{opt})\ {\ln K}}\big) \big) \) upper bound on the regret of the the Full Loss game, and an \(O\big(\min \big(\sqrt{G_{opt}\ {\ln K}}\ , \sqrt{(T-G_{opt})\ {\ln K}}\big) \big) \) upper bound on the regret of the the Full Gain game.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allenberg, C.: Individual sequence prediction - upper bounds and applications to complexity. In: Proceedings of the 12th Annual Conference on Computer Learning Theory (1999)

    Google Scholar 

  2. Allenberg, C., Auer, P., Cesa-Bianchi, N.: On the loss version of the adversarial multi-armed bandit problem (to appear)

    Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundation of Computer Science (1995)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and Self-Confidence On-line Learning Algorithms. JCSS 64(1), 48–75 (2002)

    MATH  MathSciNet  Google Scholar 

  5. Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., Warmuth, M.K.: How to use expert advice. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, pp. 382–391 (1993)

    Google Scholar 

  6. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)

    Book  MATH  Google Scholar 

  7. Fiat, A., Foster, D.P., Karloff, H., Rabaniand, Y., Ravid, Y.: Competitive algorithms for Layered Graph Traversal. In: Proceedings of the 32th Annual Symposium on Foundation of Computer Science, p. 288 (1991)

    Google Scholar 

  8. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of online learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)

    Google Scholar 

  9. Gittins, J.C.: Multi-armed Bandit Allocation Indices. John Wiley and Sons, Chichester (1989)

    MATH  Google Scholar 

  10. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  11. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  12. Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allenberg-Neeman, C., Neeman, B. (2004). Full Information Game with Gains and Losses. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30215-5_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23356-5

  • Online ISBN: 978-3-540-30215-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics