Wireless Channel Selection with Restless Bandits
Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality.
In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.
YN is supported by Australian Research Council (ARC) grants DP130100156 and DE130100291. JK is supported by DP130100156. The authors are indebted to Aapeli Vuorinen for his contribution to the numerical computations. We also thank the anonymous referee, Michel Mandjes and Thomas Taimre for their comments.
- 1.R. Aguero, M. Garcia, L. Munoz, BEAR: A bursty error auto-regressive model for indoor wireless environments, in 18th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (IEEE, New York, 2007), pp. 1–5Google Scholar
- 2.S.H.A. Ahmad, M. Liu, Multi-channel opportunistic access: a case of restless bandits with multiple plays, in 47th Annual Allerton Conference on Communication, Control, and Computing (IEEE, New York, 2009), pp. 1361–1368Google Scholar
- 4.K. Avrachenkov, L. Cottatellucci, L. Maggi, Slow fading channel selection: a restless multi-armed bandit formulation, in International Symposium on Wireless Communication Systems (ISWCS) (IEEE, New York, 2012), pp. 1083–1087Google Scholar
- 5.D.P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific Belmont, 1995)Google Scholar
- 6.C.R. Dance, T. Silander, When are Kalman-filter restless bandits indexable? arXiv preprint arXiv:1509.04541, 2015Google Scholar
- 7.D. Duchamp, N. Reynolds, Measured performance of a wireless LAN, in 17th Conference on Local Computer Networks (IEEE Press, New York, 1992), pp. 494–499Google Scholar
- 10.J.C. Gittins, Bandit Processes and Dynamic Allocations. J. R. Stat. Soc. Ser. B Methodol. 41 (2), 148–177 (1979)Google Scholar
- 11.S. Guha, K. Munagala, Approximation algorithms for partial-information based stochastic control with Markovian rewards, in 48th Annual Symposium on Foundations of Computer Science (FOCS’07) (IEEE, New York, 2007), pp. 483–493Google Scholar
- 12.S. Guha, K. Munagala, P. Shi, Approximation algorithms for restless bandit problems, in ACM-SIAM Symposium on Discrete Algorithms (SODA) (2009)Google Scholar
- 13.S. Guha, K. Munagala, P. Shi, Approximation algorithms for restless bandit problems. J. ACM 58 (1), 3 (2010)Google Scholar
- 14.O. Hernández-Lerma, J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, vol. 30 (Springer, New York, 2012)Google Scholar
- 16.T. Javidi, B. Krishnamachari, Q. Zhao, M. Liu, Optimality of myopic sensing in multi-channel opportunistic access, in International Conference on Communications (ICC’08) (IEEE, New York, 2008), pp. 2107–2112Google Scholar
- 18.R. Knopp, P.A. Humblet, Information capacity and power control in single-cell multiuser communications, in International Conference on Communications (ICC’95), vol. 1 (IEEE, New York, 1995), pp. 331–335Google Scholar
- 20.J. Kuhn, M. Mandjes, Y. Nazarathy, Exploration vs exploitation with partially observable Gaussian autoregressive arms, in 8th International Conference on Performance Evaluation Methodologies and Tools (Valuetools) (2014)Google Scholar
- 21.K. Liu, Q. Zhao, Channel probing for opportunistic access with multi-channel sensing, in Asilomar Conference on Signals, Systems and Computers (IEEE, New York, 2008)Google Scholar
- 26.Y. Nazarathy, T. Taimre, A. Asanjarani, J. Kuhn, B. Patch, A. Vuorinen, The challenge of stabilizing control for queueing systems with unobservable server states, in 5th Australian Control Conference (AUCC), Nov 2015, pp. 342–347Google Scholar
- 28.J. Niño-Mora, An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations, in Next Generation Internet Networks (NGI) (IEEE, New York, 2008), pp. 231–238.Google Scholar
- 29.J. Niño-Mora, A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors, in Network Control and Optimization, ed. by R. Núñez-Queija, J. Resing. Lecture Notes in Computer Science, vol. 5894 (Springer, Berlin, 2009), pp. 60–74Google Scholar
- 30.J. Niño-Mora, S.S. Villar, Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time, in Proceedings of the 48th IEEE Conference on Decision and Control, 2009 Held Jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009 (IEEE, New York, 2009), pp. 2905–2910Google Scholar
- 32.W. Ouyang, S. Murugesan, A. Eryilmaz, N. B. Shroff, Exploiting channel memory for joint estimation and scheduling in downlink networks, in 30st Annual International Conference on Computer Communications (INFOCOM) (IEEE, New York, 2011), pp. 3056–3064Google Scholar
- 34.M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol. 414 (Wiley, New York, 2009)Google Scholar
- 38.I.M. Verloop, Asymptotically optimal priority policies for indexable and non-indexable restless bandits. Ann. Probab. To appear. Retrieved 07/09/2015 from https://hal.archives-ouvertes.fr/hal-00743781.
- 41.P. Whittle, Restless bandits: activity allocation in a changing world. J. Appl. Probab. 25A, 287–298 (1988). Special volume: A celebration of applied probabilityGoogle Scholar
- 42.Q. Zhao, B. Krishnamachari, Structure and optimality of myopic sensing for opportunistic spectrum access, in International Conference on Communications (ICC’07) (IEEE, New York, 2007)Google Scholar
- 43.Q. Zhao, L. Tong, A. Swami, Decentralized cognitive mac for dynamic spectrum access, in First International Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN) (IEEE, New York, 2005)Google Scholar