Wireless Channel Selection with Restless Bandits

  • Julia KuhnEmail author
  • Yoni Nazarathy
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 248)


Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality.

In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.



YN is supported by Australian Research Council (ARC) grants DP130100156 and DE130100291. JK is supported by DP130100156. The authors are indebted to Aapeli Vuorinen for his contribution to the numerical computations. We also thank the anonymous referee, Michel Mandjes and Thomas Taimre for their comments.


  1. 1.
    R. Aguero, M. Garcia, L. Munoz, BEAR: A bursty error auto-regressive model for indoor wireless environments, in 18th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (IEEE, New York, 2007), pp. 1–5Google Scholar
  2. 2.
    S.H.A. Ahmad, M. Liu, Multi-channel opportunistic access: a case of restless bandits with multiple plays, in 47th Annual Allerton Conference on Communication, Control, and Computing (IEEE, New York, 2009), pp. 1361–1368Google Scholar
  3. 3.
    T.W. Archibald, D. Black, K.D. Glazebrook, Indexability and index heuristics for a simple class of inventory routing problems. Oper. Res. 57 (2), 314–326 (2009)CrossRefGoogle Scholar
  4. 4.
    K. Avrachenkov, L. Cottatellucci, L. Maggi, Slow fading channel selection: a restless multi-armed bandit formulation, in International Symposium on Wireless Communication Systems (ISWCS) (IEEE, New York, 2012), pp. 1083–1087Google Scholar
  5. 5.
    D.P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific Belmont, 1995)Google Scholar
  6. 6.
    C.R. Dance, T. Silander, When are Kalman-filter restless bandits indexable? arXiv preprint arXiv:1509.04541, 2015Google Scholar
  7. 7.
    D. Duchamp, N. Reynolds, Measured performance of a wireless LAN, in 17th Conference on Local Computer Networks (IEEE Press, New York, 1992), pp. 494–499Google Scholar
  8. 8.
    J. Gittins, K. Glazebrook, R. Weber, Multi-Armed Bandit Allocation Indices, 2 edn. (Wiley Online Library, Hoboken, 2011)CrossRefGoogle Scholar
  9. 9.
    E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39 (5), 1253–1265 (1960)CrossRefGoogle Scholar
  10. 10.
    J.C. Gittins, Bandit Processes and Dynamic Allocations. J. R. Stat. Soc. Ser. B Methodol. 41 (2), 148–177 (1979)Google Scholar
  11. 11.
    S. Guha, K. Munagala, Approximation algorithms for partial-information based stochastic control with Markovian rewards, in 48th Annual Symposium on Foundations of Computer Science (FOCS’07) (IEEE, New York, 2007), pp. 483–493Google Scholar
  12. 12.
    S. Guha, K. Munagala, P. Shi, Approximation algorithms for restless bandit problems, in ACM-SIAM Symposium on Discrete Algorithms (SODA) (2009)Google Scholar
  13. 13.
    S. Guha, K. Munagala, P. Shi, Approximation algorithms for restless bandit problems. J. ACM 58 (1), 3 (2010)Google Scholar
  14. 14.
    O. Hernández-Lerma, J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, vol. 30 (Springer, New York, 2012)Google Scholar
  15. 15.
    A. Itai, Z. Rosberg, A golden ratio control policy for a multiple-access channel. IEEE Trans. Autom. Control 29 (8), 712–718 (1984)CrossRefGoogle Scholar
  16. 16.
    T. Javidi, B. Krishnamachari, Q. Zhao, M. Liu, Optimality of myopic sensing in multi-channel opportunistic access, in International Conference on Communications (ICC’08) (IEEE, New York, 2008), pp. 2107–2112Google Scholar
  17. 17.
    L.A. Johnston, V. Krishnamurthy, Opportunistic file transfer over a fading channel: a POMDP search theory formulation with optimal threshold policies. IEEE Trans. Wirel. Commun. 5 (2), 394–405 (2006)CrossRefGoogle Scholar
  18. 18.
    R. Knopp, P.A. Humblet, Information capacity and power control in single-cell multiuser communications, in International Conference on Communications (ICC’95), vol. 1 (IEEE, New York, 1995), pp. 331–335Google Scholar
  19. 19.
    G. Koole, Z. Liu, R. Righter, Optimal transmission policies for noisy channels. Oper. Res. 49 (6), 892–899 (2001)CrossRefGoogle Scholar
  20. 20.
    J. Kuhn, M. Mandjes, Y. Nazarathy, Exploration vs exploitation with partially observable Gaussian autoregressive arms, in 8th International Conference on Performance Evaluation Methodologies and Tools (Valuetools) (2014)Google Scholar
  21. 21.
    K. Liu, Q. Zhao, Channel probing for opportunistic access with multi-channel sensing, in Asilomar Conference on Signals, Systems and Computers (IEEE, New York, 2008)Google Scholar
  22. 22.
    K. Liu, Q. Zhao, Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inf. Theory 56 (11), 5547–5567 (2010)CrossRefGoogle Scholar
  23. 23.
    K. Liu, Q. Zhao, B. Krishnamachari, Dynamic multichannel access with imperfect channel state detection. IEEE Trans. Signal Process. 58 (5), 2795–2808 (2010)CrossRefGoogle Scholar
  24. 24.
    Y. Liu, M. Liu, S.H.A. Ahmad, Sufficient conditions on the optimality of myopic sensing in opportunistic channel access: a unifying framework. IEEE Trans. Inf. Theory 60 (8), 4922–4940 (2014)CrossRefGoogle Scholar
  25. 25.
    S. Murugesan, P. Schniter, N.B. Shroff, Multiuser scheduling in a Markov-modeled downlink using randomly delayed arq feedback. IEEE Trans. Inf. Theory 58 (2), 1025–1042 (2012)CrossRefGoogle Scholar
  26. 26.
    Y. Nazarathy, T. Taimre, A. Asanjarani, J. Kuhn, B. Patch, A. Vuorinen, The challenge of stabilizing control for queueing systems with unobservable server states, in 5th Australian Control Conference (AUCC), Nov 2015, pp. 342–347Google Scholar
  27. 27.
    J. Niño-Mora, Dynamic priority allocation via restless bandit marginal productivity indices. Top 15 (2), 161–198 (2007)CrossRefGoogle Scholar
  28. 28.
    J. Niño-Mora, An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations, in Next Generation Internet Networks (NGI) (IEEE, New York, 2008), pp. 231–238.Google Scholar
  29. 29.
    J. Niño-Mora, A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors, in Network Control and Optimization, ed. by R. Núñez-Queija, J. Resing. Lecture Notes in Computer Science, vol. 5894 (Springer, Berlin, 2009), pp. 60–74Google Scholar
  30. 30.
    J. Niño-Mora, S.S. Villar, Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time, in Proceedings of the 48th IEEE Conference on Decision and Control, 2009 Held Jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009 (IEEE, New York, 2009), pp. 2905–2910Google Scholar
  31. 31.
    J.L. Ny, E. Feron, M. Dahleh et al., Scheduling continuous-time Kalman filters. IEEE Trans. Autom. Control 56 (6), 1381–1394 (2011)CrossRefGoogle Scholar
  32. 32.
    W. Ouyang, S. Murugesan, A. Eryilmaz, N. B. Shroff, Exploiting channel memory for joint estimation and scheduling in downlink networks, in 30st Annual International Conference on Computer Communications (INFOCOM) (IEEE, New York, 2011), pp. 3056–3064Google Scholar
  33. 33.
    C.H. Papadimitriou, J.N. Tsitsiklis, The complexity of optimal queuing network control. Math. Oper. Res. 24 (2), 293–305 (1999)CrossRefGoogle Scholar
  34. 34.
    M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol. 414 (Wiley, New York, 2009)Google Scholar
  35. 35.
    P. Sadeghi, R.A. Kennedy, P.B. Rapajic, R. Shams, Finite-state Markov modeling of fading channels – a survey of principles and applications. IEEE Signal Process. Mag. 25 (5), 57–80 (2008)CrossRefGoogle Scholar
  36. 36.
    R.D. Smallwood, E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21 (5), 1071–1088 (1973)CrossRefGoogle Scholar
  37. 37.
    J.A. Taylor, J.L. Mathieu, Index policies for demand response. IEEE Trans. Power Syst. 29 (3), 1287–1295 (2014)CrossRefGoogle Scholar
  38. 38.
    I.M. Verloop, Asymptotically optimal priority policies for indexable and non-indexable restless bandits. Ann. Probab. To appear. Retrieved 07/09/2015 from
  39. 39.
    K. Wang, L. Chen, Q. Liu, K. Al Agha, On optimality of myopic sensing policy with imperfect sensing in multi-channel opportunistic access. IEEE Trans. Commun. 61 (9), 3854–3862 (2013)CrossRefGoogle Scholar
  40. 40.
    R. Weber, G. Weiss, On an index policy for restless bandits. J. Appl. Probab. 27 (3), 637–648 (1990)CrossRefGoogle Scholar
  41. 41.
    P. Whittle, Restless bandits: activity allocation in a changing world. J. Appl. Probab. 25A, 287–298 (1988). Special volume: A celebration of applied probabilityGoogle Scholar
  42. 42.
    Q. Zhao, B. Krishnamachari, Structure and optimality of myopic sensing for opportunistic spectrum access, in International Conference on Communications (ICC’07) (IEEE, New York, 2007)Google Scholar
  43. 43.
    Q. Zhao, L. Tong, A. Swami, Decentralized cognitive mac for dynamic spectrum access, in First International Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN) (IEEE, New York, 2005)Google Scholar
  44. 44.
    Q. Zhao, B. Krishnamachari, K. Liu, On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance. IEEE Trans. Wirel. Commun. 7 (12), 5431–5440 (2008)CrossRefGoogle Scholar
  45. 45.
    M. Zorzi, R.R. Rao, L.B. Milstein, Error statistics in data transmission over fading channels. IEEE Trans. Commun. 46 (11), 1468–1477 (1998)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.The University of QueenslandBrisbaneAustralia
  2. 2.University of AmsterdamAmsterdamThe Netherlands
  3. 3.The University of QueenslandBrisbaneAustralia

Personalised recommendations