Transferring Domain Knowledge with an Adviser in Continuous Tasks

Wijesinghe, Rukshan; Vithanage, Kasun; Tissera, Dumindu; Xavier, Alex; Fernando, Subha; Samarawickrama, Jayathu

doi:10.1007/978-3-030-75768-7_16

Rukshan Wijesinghe^15,16,
Kasun Vithanage¹⁶,
Dumindu Tissera^15,16,
Alex Xavier¹⁶,
Subha Fernando¹⁶ &
…
Jayathu Samarawickrama^15,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12714))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1492 Accesses

Abstract

Recent advances in Reinforcement Learning (RL) have surpassed human-level performance in many simulated environments. However, existing reinforcement learning techniques are incapable of explicitly incorporating already known domain-specific knowledge into the learning process. Therefore, the agents have to explore and learn the domain knowledge independently through a trial and error approach, which consumes both time and resources to make valid responses. Hence, we adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser, which allows integrating domain knowledge in the form of pre-learned policies or pre-defined relationships to enhance the agent’s learning process. Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

A survey on model-based reinforcement learning

Article 23 January 2024

References

Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the Fifth International Ioint Conference on Autonomous Agents and Ultiagent Systems, pp. 720–727 (2006)
Google Scholar
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Articial Intelligence (2018)
Google Scholar
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE (2018)
Google Scholar
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182 (2017)
Kang, K., Belkhale, S., Kahn, G., Abbeel, P., Levine, S.: Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6008–6014. IEEE (2019)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
Google Scholar
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Mirowski, P., et al.: Learning to navigate in cities without a map. In: Advances in Neural Information Processing Systems, pp. 2419–2430 (2018)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)
Google Scholar
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Google Scholar
Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. arXiv preprint arXiv:1806.05635 (2018)
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Article Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML) (2014)
Google Scholar
Sutton, R.S., et al.: Introduction to Reinforcement Learning. vol. 2. MIT press Cambridge (1998)
Google Scholar
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)
Google Scholar
Taylor, M.E., Kuhlmann, G., Stone, P.: Autonomous transfer for reinforcement learning. In: Proceedings of the International Joint Conference Autonomous Agents and Multiagent Systems, Vol. 1, pp. 283–290 (2008)
Google Scholar
Taylor, M.E., Stone, P.: Representation transfer for reinforcement learning. In: AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development, pp. 78–85 (2007)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IEEE/RSJ International Conference Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)
Article Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Wayne, G., et al.: Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760 (2018)
Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378 (2017)
Google Scholar

Download references

Acknowledgement

We thank Prof. Sanath Jayasena and Dr. Ranga Rodrigo for arranging insight discussions which supported this work.

Author information

Authors and Affiliations

Department of Electronic and Telecommunication Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Rukshan Wijesinghe, Dumindu Tissera & Jayathu Samarawickrama
CODEGEN QBITS Lab, University of Moratuwa, Moratuwa, Sri Lanka
Rukshan Wijesinghe, Kasun Vithanage, Dumindu Tissera, Alex Xavier, Subha Fernando & Jayathu Samarawickrama

Authors

Rukshan Wijesinghe
View author publications
You can also search for this author in PubMed Google Scholar
Kasun Vithanage
View author publications
You can also search for this author in PubMed Google Scholar
Dumindu Tissera
View author publications
You can also search for this author in PubMed Google Scholar
Alex Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Subha Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Jayathu Samarawickrama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wijesinghe, R., Vithanage, K., Tissera, D., Xavier, A., Fernando, S., Samarawickrama, J. (2021). Transferring Domain Knowledge with an Adviser in Continuous Tasks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-75768-7_16
Published: 08 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75767-0
Online ISBN: 978-3-030-75768-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transferring Domain Knowledge with an Adviser in Continuous Tasks

Abstract

Access this chapter

Similar content being viewed by others

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

A survey on model-based reinforcement learning

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Transferring Domain Knowledge with an Adviser in Continuous Tasks

Abstract

Access this chapter

Similar content being viewed by others

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

A survey on model-based reinforcement learning

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation