Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection

  • Zhiyong DuEmail author
  • Bin Jiang
  • Qihui Wu
  • Yuhua Xu
  • Kun Xu


The core issue of network selection is to select the optimal network from available network access point (NAP) of heterogeneous wireless networks (HWN). Many previous works evaluate the networks in an ideal environment, i.e., they generally assume that the network state information (NSI) is known and static. However, due to the varying traffic load and radio channel, the NSI could be dynamic and even unavailable for the user in realistic HWN environment, thus most existing network selection algorithms cannot work effectively. Learning-based algorithms can address the problem of uncertain and dynamic NSI, while they commonly need sufficient samples on each option, resulting in unbearable handoff cost. Therefore, this chapter formulates the network selection problem as a multi-armed bandit (MAB) problem and designs two RL-based network selection algorithms with a special consideration on reducing network handoff cost. We prove that the proposed algorithms can achieve optimal order, e.g., logarithmic order regret with limited network handoff cost. Simulation results indicate that the two algorithms can significantly reduce the network handoff cost and improve the transmission performance compared with existing algorithms, simultaneously.


  1. 1.
    Fernandes S, Karmouch A (2012) Vertical mobility management architectures in wireless networks: a comprehensive survey and future directions. IEEE Commun Surv Tutor 14(1):45–63CrossRefGoogle Scholar
  2. 2.
    Niyato D, Hossain E (2009) Dynamics of network selection in heterogeneous wireless networks: an evolutionary game approach. IEEE Trans Veh Technol 58(4):2008–2017CrossRefGoogle Scholar
  3. 3.
    Tabrizi H, Farhadi G, Cioffi J (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)Google Scholar
  4. 4.
    Zhang Y, Yuan Y, Zhou J et al (2009) A weighted bipartite graph based network selection scheme for multi-flows in heterogeneous wireless network. In: IEEE global telecommunications conference (GLOBECOM)Google Scholar
  5. 5.
    Stevens-Navarro E, Lin Y, Wong VWS (2008) An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans Veh Technol 57(2):2008–2017CrossRefGoogle Scholar
  6. 6.
    Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)Google Scholar
  7. 7.
    Wu T, Jing H, Yu X et al (2008) Cost-aware handover decision algorithm for cooperative cellular relaying networks. In: IEEE vehicle technology conference (VTC)Google Scholar
  8. 8.
    Wang L, Binet D (2009) Best permutation: a novel network selection scheme in heterogeneous wireless networks. In: International conference on wireless communications and mobile computing (IWCMC)Google Scholar
  9. 9.
    Hou J, O’Brien DC (2006) Vertical handover decision making algorithm using fuzzy logic for the integrated radio-and-OW system. IEEE Trans Wirel Commun 5(1):176–185CrossRefGoogle Scholar
  10. 10.
    Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22MathSciNetCrossRefGoogle Scholar
  11. 11.
    Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47:235–256CrossRefGoogle Scholar
  12. 12.
    Agrawal R, Teneketzis D, Anantharam V (1998) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching. IEEE Trans Autom Control 33(10):899–906MathSciNetCrossRefGoogle Scholar
  13. 13.
    Chen L, Iellamo S, Coupechoux M (2011) Opportunistic spectrum access with channel switching cost for cognitive radio networks. In: IEEE international conference on communications (ICC)Google Scholar
  14. 14.
    Du Z, Wu Q, Yang P (2016) Learning with handoff cost constraint for network selection in heterogeneous wireless network. Wirel Commun Mob Comput 16(4):441–458CrossRefGoogle Scholar
  15. 15.
    Zhao T, Liu Q, Chen CW (2017) QoE in video transmission: a user experience-driven strategy. IEEE Commun Surv Tutor 19(1):285–302CrossRefGoogle Scholar
  16. 16.
    Quoc-Thinh N, Agoulmine N, Cherkaoui EH et al (2016) Multicriteria optimization of access selection to improve the quality of experience in heterogeneous wireless access networks. IEEE Trans Veh Technol 62(4):1785–1800Google Scholar
  17. 17.
    ITU-T (2003) One-way transmission time. Rec. G.114Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Zhiyong Du
    • 1
    Email author
  • Bin Jiang
    • 1
  • Qihui Wu
    • 2
  • Yuhua Xu
    • 3
  • Kun Xu
    • 1
  1. 1.National University of Defense TechnologyChangshaChina
  2. 2.Nanjing University of Aeronautics and AstronauticsNanjingChina
  3. 3.Army Engineering University of PLANanjingChina

Personalised recommendations