Abstract
In this communication we outline, without full proofs, a computation of the value function and optimal policies in adiscounted symmetric Poisson-type two-armed bandit problem (TAB) with both continuous and impulse actions. Our purpose is to present one more physically meaningful example in which an explicit solution of the related quasivariational inequalities (QVI) can be found, and especially, in which optimal policies involve series of impulse decisions instantly following one after another. A simpler, undiscounted version of the same problem is considered by D.S.Donchev [2,3].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A.Bensoussan et J.-L.Lions, Contrôle impulsionnel et inéquations quasivariationnelles, Dunod, Paris, 1982.
D.S.Donchev, The two-armed bandit problem with continuous time in presence of gradual and impulsive controls, Russian Math.Surveys 45(1990), #1(271), 200–202.
D.S.Donchev, On the two-armed bandit problem with both continuous and impulsive actions, submitted to Steklov Seminar 3 (editors A.N.Shyryaev et al.).
D.Feldman, Contributions to the “two-armed bandit” problem,Ann. Math. Stat. 33(1962),847–856.
I.Karatzas, Gittins indices in the dynamic allocation problem for diffusion processes, Ann. Prob. 12(1984), 173–192.
A.Mandelbaum, Continuous multi-armed bandits and multiparameter processes, Ann. Prob. 15(1987), 1527–1556.
E.L.Presman, Poissonian version of the two-armed bandit problem with discounting,Theory Prob. 35(1990), 307–317.
E.L.Presman and I.M.Sonin, Sequential control with incomplete data: Bayesian approach, Academic Press, New York, 1990 (Russian edition 1982).
A.A.Yushkevich, On the two-armed bandit problem with continuous time parameter and discounted rewards, Stochastics 23(1988), 299–310.
A.A.Yushkevich, Verification theorems for Markov decision processes with controlled deterministic drift and gradual and impulse controls, Theory Prob. 34(1989), 474–496.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer Science+Business Media New York
About this chapter
Cite this chapter
Yushkevich, A.A. (1993). On a Two-armed Bandit Problem with both Continuous and Impulse Actions and Discounted Rewards. In: Çinlar, E., Chung, K.L., Sharpe, M.J., Bass, R.F., Burdzy, K. (eds) Seminar on Stochastic Processes, 1992. Progress in Probability, vol 33. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-0339-1_13
Download citation
DOI: https://doi.org/10.1007/978-1-4612-0339-1_13
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-6714-0
Online ISBN: 978-1-4612-0339-1
eBook Packages: Springer Book Archive