Skip to main content

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

Abstract

Reinforcement learning has become an established class of powerful machine learning methods operating online on sequential tasks by direct interaction with an environment instead of processing precollected training datasets. At the same time, the nature of many tasks with an inner hierarchical structure has evoked interest in hierarchical RL approaches that introduced the two-level decomposition directly into computational models. These methods are usually composed of lower-level controllers – skills – providing simple behaviors, and a high-level controller which uses the skills to solve the overall task. Skill discovery and acquisition remain principal challenges in hierarchical RL, and most of the relevant works have focused on resolving this issue by using pre-trained skills, fixed during the main learning process, which may lead to suboptimal solutions. We propose a universal pluggable framework of Adaptive Skill Acquisition (ASA), aimed to augment existing solutions by trying to achieve optimality. ASA can observe the high-level controller during its training and identify skills that it lacks to successfully learn the task. These missing skills are subsequently trained and integrated into the hierarchy, enabling better performance of the overall architecture. As we show in the pilot maze-type experiments, the identification of missing skills performs reasonably well, and embedding such skills into the hierarchy may significantly improve the performance of an overall model.

Supported by grant 1/0796/18 from Slovak Grant Agency for Science (VEGA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    ASA can be deployed on multiple levels of a multi-level hierarchy.

References

  1. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  2. Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning with subpolicies specializing for learned subgoals. In: International Conference on Neural Networks and Computational Intelligence, pp. 125–130 (2004)

    Google Scholar 

  3. Garage contributors: Garage: a toolkit for reproducible reinforcement learning research (2019). https://github.com/rlworkgroup/garage

  4. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13(1), 227–303 (2000)

    Article  MathSciNet  Google Scholar 

  5. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp. 1329–1338 (2016)

    Google Scholar 

  6. Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: International Conference on Learning Representations (2017)

    Google Scholar 

  7. Goel, S., Huber, M.: Subgoal discovery for hierarchical reinforcement learning using learned policies. In: Florida AI Research Society Conference, pp. 346–350 (2003)

    Google Scholar 

  8. Kakade, S.M.: A natural policy gradient. In: Advances in Neural Information Processing Systems, pp. 1531–1538 (2002)

    Google Scholar 

  9. Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in Neural Information Processing Systems, pp. 1015–1023 (2009)

    Google Scholar 

  10. Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 3675–3683 (2016)

    Google Scholar 

  11. Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. In: International Conference on Learning Representations (2019)

    Google Scholar 

  12. Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  13. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: International Conference on Machine Learning, vol. 1, pp. 361–368 (2001)

    Google Scholar 

  14. McGovern, E.A., Barto, A.G.: Autonomous discovery of temporal abstractions from interaction with an environment. Ph.D. thesis, University of Massachusetts at Amherst (2002)

    Google Scholar 

  15. Menache, I., Mannor, S., Shimkin, N.: Q-cut—dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36755-1_25

    Chapter  MATH  Google Scholar 

  16. Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)

    Google Scholar 

  17. Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, pp. 1043–1049 (1998)

    Google Scholar 

  18. Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7(2), 123–146 (1995)

    Article  Google Scholar 

  19. Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Artificial Neural Networks, pp. 967–972 (1991)

    Google Scholar 

  20. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

  21. Shu, T., Xiong, C., Socher, R.: Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. In: International Conference on Learning Representations (2018)

    Google Scholar 

  22. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  23. Vezhnevets, A.S., et al.: Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, pp. 3540–3549 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juraj Holas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holas, J., Farkaš, I. (2020). Adaptive Skill Acquisition in Hierarchical Reinforcement Learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics