Realtime Data Mining pp 91-118 | Cite as
Up the Down Staircase: Hierarchical Reinforcement Learning
Chapter
First Online:
- 1.7k Downloads
Abstract
We address the question of how hierarchical, or multigrid, methods may figure in dynamic programming and reinforcement learning for recommendation engines.
After providing a general introduction, we approach the framework of hierarchical methods from both the historical analytical and algebraic viewpoints; we proceed to devising and justifying approaches to apply hierarchical methods to both the model-based as well as the model-free case. In regard to the latter, we set out from the multigrid reinforcement learning algorithms introduced by Ziv in [Ziv04] and extend these methods to finite-horizon problems.
Keywords
Coarse Grid Multigrid Method Bellman Equation Sparse Grid Nodal Basis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
- [AR02]Andre, D., Russel, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the National Conference on Artificial Intelligence, pp. 119–125 (2002)Google Scholar
- [Bakh66]Bakhvalov, N.S.: On the convergence of a relaxation method with natural constraints on the elliptic operator. USSR Comp. Math. Math. Phys. 6, 101–113 (1966)CrossRefGoogle Scholar
- [BC89]Bertsekas, D.P., Castanon, D.A.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Trans. Automat. Contr. 34(6), 589–598 (1998)MathSciNetGoogle Scholar
- [BMR82]Brandt, A., McCormick S.F., Ruge J.W.: Algebraic multigrid (AMG) for automatic multigrid solution with application in geodetic computations. Technical Report CO POB 1852, Institute Computational Studies State University (1982)Google Scholar
- [BPX90]Bramble, J., Pasciak, J., Xu, J.: Parallel multilevel preconditioners. Math. Comput. 55, 1–22 (1990)CrossRefzbMATHMathSciNetGoogle Scholar
- [Bra77]Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31, 333–390 (1977)CrossRefzbMATHGoogle Scholar
- [Dau92]Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)CrossRefzbMATHGoogle Scholar
- [Diet00]Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
- [Fed64]Fedorenko, R.P.: The speed of convergence of one iterative process. USSR Comput. Math. Math. Phys. 4, 227–235 (1964)CrossRefGoogle Scholar
- [GVL96]Goloub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore/London (1996)Google Scholar
- [Ha85]Hackbusch, W.: Multigrid Methods and Applications. Springer, New York (1985)CrossRefGoogle Scholar
- [Ma99]Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)zbMATHGoogle Scholar
- [MRLG05]Marthi, B., Russell, S.J., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proceedings of the 19th International Conference on Artificial Intelligence (IJCAI), pp. 779–785. (2005)Google Scholar
- [Os94]Oswald, P.: Multilevel Finite Element Approximation. B.G. Teubner, Stuttgart (1994)CrossRefzbMATHGoogle Scholar
- [Pap10]Paprotny A.: Hierarchical methods for the solution of dynamic programming equations arising from optimal control problems related to recommendation. Diploma Thesis, TU Hamburg-Harburg (2010)Google Scholar
- [Pap11]Paprotny, A.: Multilevel Methods for Dynamic Programming: Deterministic and Stochastic Iterative Methods with Application to Recommendation Engines. AVM – Akademische Verlagsgemeinschaft, München (2011)Google Scholar
- [PR98]Parr, R., Russel, S.J.: Reinforcement learning with hierarchies of machines. Adv. Neural Inf. Process. Syst., 1088–1095 (1998)Google Scholar
- [SPS99]Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
- [The12]Thess, M.: Multilevel preconditioners for temporal-difference learning methods related to recommendation engines. In: Apel, T., Steinbach, O. (eds.) Advanced Finite Element Methods and Applications. Springer, Berlin (2012)Google Scholar
- [TOS01]Stüben, K.: An introduction to algebraic multigrid. In: Trottenberg, U., Oosterlee, C., Schüller, A. (eds.) Multigrid, pp. 413–532. Academic Press, San Diego (2001)Google Scholar
- [Ziv04]Ziv, O.: Algebraic multigrid for reinforcement learning. Master’s Thesis, Technion (2004)Google Scholar
- [Zu00]Zumbusch, G.: A sparse grid PDE solver. In: Langtangen, H.P., Bruaset A.M., Quak E. (eds.) Advances in Software Tools for Scientific Computing, Proceedings SciTools '98. Lecture Notes in Computational Science and Engineering, vol. 10, chapter 4, pp. 133–178. Springer (2000)Google Scholar
Copyright information
© Springer International Publishing Switzerland 2013