Abstract
Humans can set suitable subgoals to achieve certain tasks. They can also set sub-subgoals recursively if required. The depth of this recursion is apparently unlimited. Inspired by this behavior, we propose a new hierarchical reinforcement learning architecture called RGoal. RGoal solves the Markov Decision Process (MDP) in an augmented state-action space. In multitask settings, sharing subroutines between tasks makes learning faster. A novel mechanism called thought-mode is a type of model-based reinforcement learning. It combines learned simple tasks to solve unknown complicated tasks rapidly, sometimes in zero-shot time.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)
Singh, S.P.: Reinforcement learning with a hierarchy of abstract models. In: Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, pp. 202–207. AAAI Press (1992)
Singh, S.P.: Scaling reinforcement learning algorithms by learning variable temporal resolution models. In: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, pp. 406–415. Morgan Kaufmann (1992)
Kaelbling, L.P.: Hierarchical learning in stochastic domains: preliminary results. In: Proceedings of the 10th International Conference on Machine Learning, San Francisco, California, pp. 167–173. Morgan Kaufmann (1993)
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Thomas, G.D.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)
Wiewiora, E.: Potential-based shaping and Q-value initialization are equivalent. J. Artif. Intell. Res. 19, 205–208 (2003)
Jong, N., Stone, P.: Hierarchical model-based reinforcement learning: R-Max + MAXQ. In: Proceedings of ICML (2008)
Levy, K.Y., Shimkin, N.: Unified inter and intra options learning using policy gradient methods. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 153–164. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_17
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1312–1320 (2015)
Bacon, P.-L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of AAAI, pp. 1726–1734 (2017)
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5055–5065 (2017)
Acknowledgments
We gratefully acknowledge Yu Kohno and Tatsuji Takahashi for their helpful discussion.
This work was supported by JSPS KAKENHI Grant Number JP18K11488.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ichisugi, Y., Takahashi, N., Nakada, H., Sano, T. (2019). Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)