Intelligent Transmission Scheduling Based on Deep Reinforcement Learning

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


With the increasing diversification of ship users’ communication services, the QoS of data transmission has become the limitation of the development of maritime communication. The software-defined maritime communication networks are proposed to solve the problem of communication mode obstacles in heterogeneous networks. Based on this framework, we propose a transmission scheduling scheme based on improved deep Q learning algorithm which combines the deep Q network with softmax classifier (also known as S-DQN algorithm) to improve throughput, balance delay and energy consumption. First of all, the Markov decision process (MDP) is used to realize the optimal scheduling strategy. In addition, the mapping relationship between the optimal policy and the obtained information is established by using the deep Q network in the system. When the input data arrives, after the amounts of data self-learning, the optimal strategy is made as quickly and accurately as possible. The simulation results show that the scheme is better than other traditional schemes under the different quality of service, which verifies the effectiveness of the scheme.


  1. 1.
    Wang, H., Yu, F.R., Zhu, L., Tang, T., Ning, B.: Finite-state Markov modeling for wireless channels in tunnel communication-based train control systems. IEEE Trans. Intell. Transport. Syst. 15(3), 1083–1090 (2014)CrossRefGoogle Scholar
  2. 2.
    Lin, S., et al.: Finite-state Markov modeling for high-speed railway fading channels. IEEE Antennas Wirel. Propagat. Lett. 14, 954–957 (2015)CrossRefGoogle Scholar
  3. 3.
    Svensson, A.: An introduction to adaptive QAM modulation schemes for known and predicted channels. Proc. IEEE 95(12), 2322–2336 (2007)CrossRefGoogle Scholar
  4. 4.
    Pappi, K.N., Lioumpas, A.S., Karagiannidis, G.K.: \(\theta \)-QAM: a parametric quadrature amplitude modulation family and its performance in AWGN and fading channels. IEEE Trans. Commun. 58(4), 1014–1019 (2010)CrossRefGoogle Scholar
  5. 5.
    Li, Q., Zhao, L., Gao, J., Liang, H., Zhao, L., Tang, X.: SMDP-based coordinated virtual machine allocations in cloud-fog computing systems. IEEE Internet Things J. 5(3), 1977–1988 (2018)CrossRefGoogle Scholar
  6. 6.
    Li, M., Zhao, L., Liang, H.: An SMDP-based prioritized channel allocation scheme in cognitive enabled vehicular ad hoc networks. IEEE Trans. Veh. Technol. 66(9), 7925–7933 (2017)CrossRefGoogle Scholar
  7. 7.
    Qin, M., et al.: Learning-aided multiple time-scale SON function coordination in ultra-dense small-cell networks. IEEE Trans. Wireless Commun. 18(4), 2080–2092 (2019)Google Scholar
  8. 8.
    Mnih, V,Kavukcuoglu, K,Silver, D., et al.: Playing Atari with deep reinforcement learning. Computer Science (2013). arXiv:1312.5602
  9. 9.
    Silver, D., Huang, A., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  10. 10.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016)Google Scholar
  11. 11.
    Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  12. 12.
    Ahmed, N.: Data-Free/Data-sparse softmax parameter estimation with structured class geometries. IEEE Signal Processing Lett. 25(9), 1408–1412 (2018)CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Electrical Engineering and IntelligentizationDongguan University of TechnologyDongguanChina
  2. 2.Department of Electrical and Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations