Applying Online Expert Supervision in Deep Actor-Critic Reinforcement Learning

  • Jin Zhang
  • Jiansheng ChenEmail author
  • Yiqing Huang
  • Weitao Wan
  • Tianpeng Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


Deep reinforcement learning (DRL) has been showing its strong power in various decision making and controlling problems, e.g. Atari games and the game of Go. It is inspiring to see DRL agents to outperform even human masters. However, DRL algorithms require a large amount of calculation and exploration, making DRL agents hard to train, especially in problems with large state and action spaces. Also, most DRL algorithms are very sensitive to hyper parameters. To solve these problems, we propose A3COE, a new algorithm combining the A3C algorithm with online expert supervision. We applied it on mini-games of the famous real-time-strategy game StarCraft II. Results show that this algorithm greatly improved the agent’s performance with fewer training steps while acquiring more stable training processes with a greater range of hyper parameters. We also proved that this algorithm works even better with curriculum learning.


Deep reinforcement learning Expert supervision A3C Curriculum learning 


  1. 1.
    Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)
  2. 2.
    Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)CrossRefGoogle Scholar
  3. 3.
    Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp. 1329–1338 (2016)Google Scholar
  4. 4.
    Hester, T., et al.: Deep Q-learning from Demonstrations. arXiv preprint arXiv:1704.03732 (2017)
  5. 5.
    Cruz, J., Gabriel, V., Du, Y., Taylor, M.E.: Pre-training neural networks with human demonstrations for deep reinforcement learning. arXiv preprint arXiv:1709.04083 (2017)
  6. 6.
    Zhang, X., Ma, H.: Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. arXiv preprint arXiv:1801.10459 (2018)
  7. 7.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  8. 8.
    Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
  9. 9.
    Bengio, Y., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, vol. 60, no. 1, pp. 41–48 (2009)Google Scholar
  10. 10.
    Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning. In: 5th International Conference on Learning Representations (2016)Google Scholar
  11. 11.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Networks 9(5), 1054 (1998)CrossRefGoogle Scholar
  12. 12.
    Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)
  13. 13.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jin Zhang
    • 1
  • Jiansheng Chen
    • 1
    Email author
  • Yiqing Huang
    • 1
  • Weitao Wan
    • 1
  • Tianpeng Li
    • 1
  1. 1.Department of Electronic EngineeringTsinghua UniversityBeijingChina

Personalised recommendations