Skip to main content

Transferring Domain Knowledge with an Adviser in Continuous Tasks

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12714))

Included in the following conference series:

  • 1492 Accesses

Abstract

Recent advances in Reinforcement Learning (RL) have surpassed human-level performance in many simulated environments. However, existing reinforcement learning techniques are incapable of explicitly incorporating already known domain-specific knowledge into the learning process. Therefore, the agents have to explore and learn the domain knowledge independently through a trial and error approach, which consumes both time and resources to make valid responses. Hence, we adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser, which allows integrating domain knowledge in the form of pre-learned policies or pre-defined relationships to enhance the agent’s learning process. Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  2. Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the Fifth International Ioint Conference on Autonomous Agents and Ultiagent Systems, pp. 720–727 (2006)

    Google Scholar 

  3. Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)

  4. Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016)

    Google Scholar 

  5. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)

  6. Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Articial Intelligence (2018)

    Google Scholar 

  7. Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE (2018)

    Google Scholar 

  8. Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182 (2017)

  9. Kang, K., Belkhale, S., Kahn, G., Abbeel, P., Levine, S.: Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6008–6014. IEEE (2019)

    Google Scholar 

  10. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)

    Google Scholar 

  11. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)

    Google Scholar 

  12. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  13. Mirowski, P., et al.: Learning to navigate in cities without a map. In: Advances in Neural Information Processing Systems, pp. 2419–2430 (2018)

    Google Scholar 

  14. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)

    Google Scholar 

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  16. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)

    Google Scholar 

  17. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)

    Google Scholar 

  18. Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. arXiv preprint arXiv:1806.05635 (2018)

  19. Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)

  20. Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)

    Article  Google Scholar 

  21. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)

    Google Scholar 

  22. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  23. Sutton, R.S., et al.: Introduction to Reinforcement Learning. vol. 2. MIT press Cambridge (1998)

    Google Scholar 

  24. Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)

    Google Scholar 

  25. Taylor, M.E., Kuhlmann, G., Stone, P.: Autonomous transfer for reinforcement learning. In: Proceedings of the International Joint Conference Autonomous Agents and Multiagent Systems, Vol. 1, pp. 283–290 (2008)

    Google Scholar 

  26. Taylor, M.E., Stone, P.: Representation transfer for reinforcement learning. In: AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development, pp. 78–85 (2007)

    Google Scholar 

  27. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IEEE/RSJ International Conference Intelligent Robots and Systems, pp. 5026–5033 (2012)

    Google Scholar 

  28. Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)

    Article  Google Scholar 

  29. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  30. Wayne, G., et al.: Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760 (2018)

  31. Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378 (2017)

    Google Scholar 

Download references

Acknowledgement

We thank Prof. Sanath Jayasena and Dr. Ranga Rodrigo for arranging insight discussions which supported this work.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wijesinghe, R., Vithanage, K., Tissera, D., Xavier, A., Fernando, S., Samarawickrama, J. (2021). Transferring Domain Knowledge with an Adviser in Continuous Tasks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75768-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75767-0

  • Online ISBN: 978-3-030-75768-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics