Abstract
It is challenging for reinforcement learning (RL) to solve the dynamic goal tasks of robot in sparse reward setting. Dynamic Hindsight Experience Replay (DHER) is a method to solve such problems. However, the learned policy DHER is easy to degrade, and the success rate is low, especially in complex environment. In order to help agents learn purposefully in dynamic goal tasks, avoid blind exploration, and improve the stability and robustness of policy, we propose a guided evaluation method named GEDHER, which assists the agent to learn under the guidance of evaluated expert demonstrations based on the DHER. In addition, We add the Gaussian noise in action sampling to balance the exploration and exploitation, preventing from falling into local optimal policy. Experiment results show that our method outperforms original DHER method in terms of both stability and success rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Nair, A., et al.: Overcoming exploration in reinforcement learning with demonstrations. In: ICRA (2018)
Vecerik, M., et al.: Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. arXiv:1707.08817 (2017)
Wang, Y., et al.: An experienced-based policy gradient method for smooth manipulation. In: IEEE-CYBER (2019)
Fang, M., et al.: DHER: hindsight experience replay for dynamic goals. In: ICLR (2019)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforce-ment learning. Nature 518(7540), 529 (2015)
This, Paul R. Markov decision processes. Comap, Incorporated, (1983) (MDP)
Yang, G., et al.: Reinforcement learning form imperfect demonstrations. In: International Conference on Machine Learning, Stockholm, Sweden, PMLR, vol. 80 (2018)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)
Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots (2007)
Todorov, E., Erez, T., Tassa, Y.: “MuJoCo”: a physics engine for model-based control. In: The IEEE/RSJ International Conference on Intelligent Robots and Systems (2012)
Popov, I., et al.: Data-efficient Deep Reinforcement Learning for Dexterous Manipulation. arXiv preprint arXiv:1704.03073 (2017)
Haarnoja, T., et al.: Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv preprint arXiv:1610.00633 (2016)
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, pp. 438–445
Hester, T., et al.: Learning from Demonstrations for Real World Reinforcement Learning. arXiv preprint arxiv:1704.03732 (2017)
Xu, K., Liu, H., Shen, H., Yang, T.: Structure design and kinematic analysis of a partially-decoupled 3T1R parallel manipulator. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11742, pp. 415–424. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27535-8_37
Heess, N., et al.: Learning continuous control policies by stochastic value gradients. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 2944–2952 (2015)
Funding
This work was supported in part by Trico-Robot plan of NSFC under grant No.91748208, National Major Project under grant No. 2018ZX01028-101, Shaanxi Project under grant No.2018ZDCXLGY0607, NSFC No.61973246, and the program of the Ministry of Education.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, C., Lan, X., Wan, L., Liang, Z., Wang, H. (2020). A Guided Evaluation Method for Robot Dynamic Manipulation. In: Chan, C.S., et al. Intelligent Robotics and Applications. ICIRA 2020. Lecture Notes in Computer Science(), vol 12595. Springer, Cham. https://doi.org/10.1007/978-3-030-66645-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-66645-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66644-6
Online ISBN: 978-3-030-66645-3
eBook Packages: Computer ScienceComputer Science (R0)