Abstract
Recent advances in Reinforcement Learning (RL) have surpassed human-level performance in many simulated environments. However, existing reinforcement learning techniques are incapable of explicitly incorporating already known domain-specific knowledge into the learning process. Therefore, the agents have to explore and learn the domain knowledge independently through a trial and error approach, which consumes both time and resources to make valid responses. Hence, we adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser, which allows integrating domain knowledge in the form of pre-learned policies or pre-defined relationships to enhance the agent’s learning process. Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the Fifth International Ioint Conference on Autonomous Agents and Ultiagent Systems, pp. 720–727 (2006)
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Articial Intelligence (2018)
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE (2018)
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182 (2017)
Kang, K., Belkhale, S., Kahn, G., Abbeel, P., Levine, S.: Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6008–6014. IEEE (2019)
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR) (2016)
Mirowski, P., et al.: Learning to navigate in cities without a map. In: Advances in Neural Information Processing Systems, pp. 2419–2430 (2018)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. arXiv preprint arXiv:1806.05635 (2018)
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML) (2014)
Sutton, R.S., et al.: Introduction to Reinforcement Learning. vol. 2. MIT press Cambridge (1998)
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)
Taylor, M.E., Kuhlmann, G., Stone, P.: Autonomous transfer for reinforcement learning. In: Proceedings of the International Joint Conference Autonomous Agents and Multiagent Systems, Vol. 1, pp. 283–290 (2008)
Taylor, M.E., Stone, P.: Representation transfer for reinforcement learning. In: AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development, pp. 78–85 (2007)
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IEEE/RSJ International Conference Intelligent Robots and Systems, pp. 5026–5033 (2012)
Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI Conference on Artificial Intelligence (2016)
Wayne, G., et al.: Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760 (2018)
Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378 (2017)
Acknowledgement
We thank Prof. Sanath Jayasena and Dr. Ranga Rodrigo for arranging insight discussions which supported this work.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wijesinghe, R., Vithanage, K., Tissera, D., Xavier, A., Fernando, S., Samarawickrama, J. (2021). Transferring Domain Knowledge with an Adviser in Continuous Tasks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-75768-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75767-0
Online ISBN: 978-3-030-75768-7
eBook Packages: Computer ScienceComputer Science (R0)