Task Planning in “Block World” with Deep Reinforcement Learning

  • Edward Ayunts
  • Alekasndr I. PanovEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 636)


At the moment reinforcement learning have advanced significantly with discovering new techniques and instruments for training. This paper is devoted to the application convolutional and recurrent neural networks in the task of planning with reinforcement learning problem. The aim of the work is to check whether the neural networks are fit for this problem. During the experiments in a block environment the task was to move blocks to obtain the final arrangement which was the target. Significant part of the problem is connected with the determining on the reward function and how the results are depending in reward’s calculation. The current results show that without modifying the initial problem into more straightforward ones neural networks didn’t demonstrate stable learning process. In the paper a modified reward function with sub-targets and euclidian reward calculation was used for more precise reward determination. Results have shown that none of the tested architectures were not able to achieve goal.



The reported study was supported by RFBR, research Projects No. 16-37-60055 and No. 17-07-00281.


  1. 1.
    Bentivegna, D.C., Ude, A., Atkenson, C.G., Gordon, C.: Humanoid robot learning and game playing using PC-based vision, Switzerland (2002)Google Scholar
  2. 2.
    Mnih, V.: Playing atari with deep reinforcement learning. In: NIPS 2013 (2013)Google Scholar
  3. 3.
    Finn, C., Levine, S.: Deep visual foresight for planning robot motion. In: ICRA 2017 (2017)Google Scholar
  4. 4.
    Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention (2014)Google Scholar
  5. 5.
    Katyal, K.D., Staley, E.W., Johannes, M.S., Wang, I.-J., Reiter, A., Burlina, P.: In-hand robotic manipulation via deep reinforcement learning (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.Federal Research Center “Computer Science and Control” of Russian Academy of SciencesMoscowRussia

Personalised recommendations