Up the Down Staircase: Hierarchical Reinforcement Learning

  • Alexander Paprotny
  • Michael Thess
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)


We address the question of how hierarchical, or multigrid, methods may figure in dynamic programming and reinforcement learning for recommendation engines.

After providing a general introduction, we approach the framework of hierarchical methods from both the historical analytical and algebraic viewpoints; we proceed to devising and justifying approaches to apply hierarchical methods to both the model-based as well as the model-free case. In regard to the latter, we set out from the multigrid reinforcement learning algorithms introduced by Ziv in [Ziv04] and extend these methods to finite-horizon problems.


Coarse Grid Multigrid Method Bellman Equation Sparse Grid Nodal Basis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [AR02]
    Andre, D., Russel, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the National Conference on Artificial Intelligence, pp. 119–125 (2002)Google Scholar
  2. [Bakh66]
    Bakhvalov, N.S.: On the convergence of a relaxation method with natural constraints on the elliptic operator. USSR Comp. Math. Math. Phys. 6, 101–113 (1966)CrossRefGoogle Scholar
  3. [BC89]
    Bertsekas, D.P., Castanon, D.A.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Trans. Automat. Contr. 34(6), 589–598 (1998)MathSciNetGoogle Scholar
  4. [BMR82]
    Brandt, A., McCormick S.F., Ruge J.W.: Algebraic multigrid (AMG) for automatic multigrid solution with application in geodetic computations. Technical Report CO POB 1852, Institute Computational Studies State University (1982)Google Scholar
  5. [BPX90]
    Bramble, J., Pasciak, J., Xu, J.: Parallel multilevel preconditioners. Math. Comput. 55, 1–22 (1990)CrossRefzbMATHMathSciNetGoogle Scholar
  6. [Bra77]
    Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31, 333–390 (1977)CrossRefzbMATHGoogle Scholar
  7. [Dau92]
    Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)CrossRefzbMATHGoogle Scholar
  8. [Diet00]
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  9. [Fed64]
    Fedorenko, R.P.: The speed of convergence of one iterative process. USSR Comput. Math. Math. Phys. 4, 227–235 (1964)CrossRefGoogle Scholar
  10. [GVL96]
    Goloub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore/London (1996)Google Scholar
  11. [Ha85]
    Hackbusch, W.: Multigrid Methods and Applications. Springer, New York (1985)CrossRefGoogle Scholar
  12. [Ma99]
    Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)zbMATHGoogle Scholar
  13. [MRLG05]
    Marthi, B., Russell, S.J., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proceedings of the 19th International Conference on Artificial Intelligence (IJCAI), pp. 779–785. (2005)Google Scholar
  14. [Os94]
    Oswald, P.: Multilevel Finite Element Approximation. B.G. Teubner, Stuttgart (1994)CrossRefzbMATHGoogle Scholar
  15. [Pap10]
    Paprotny A.: Hierarchical methods for the solution of dynamic programming equations arising from optimal control problems related to recommendation. Diploma Thesis, TU Hamburg-Harburg (2010)Google Scholar
  16. [Pap11]
    Paprotny, A.: Multilevel Methods for Dynamic Programming: Deterministic and Stochastic Iterative Methods with Application to Recommendation Engines. AVM – Akademische Verlagsgemeinschaft, München (2011)Google Scholar
  17. [PR98]
    Parr, R., Russel, S.J.: Reinforcement learning with hierarchies of machines. Adv. Neural Inf. Process. Syst., 1088–1095 (1998)Google Scholar
  18. [SPS99]
    Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  19. [The12]
    Thess, M.: Multilevel preconditioners for temporal-difference learning methods related to recommendation engines. In: Apel, T., Steinbach, O. (eds.) Advanced Finite Element Methods and Applications. Springer, Berlin (2012)Google Scholar
  20. [TOS01]
    Stüben, K.: An introduction to algebraic multigrid. In: Trottenberg, U., Oosterlee, C., Schüller, A. (eds.) Multigrid, pp. 413–532. Academic Press, San Diego (2001)Google Scholar
  21. [Ziv04]
    Ziv, O.: Algebraic multigrid for reinforcement learning. Master’s Thesis, Technion (2004)Google Scholar
  22. [Zu00]
    Zumbusch, G.: A sparse grid PDE solver. In: Langtangen, H.P., Bruaset A.M., Quak E. (eds.) Advances in Software Tools for Scientific Computing, Proceedings SciTools '98. Lecture Notes in Computational Science and Engineering, vol. 10, chapter 4, pp. 133–178. Springer (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Alexander Paprotny
    • 1
  • Michael Thess
    • 2
  1. 1.Research and Developmentprudsys AGBerlinGermany
  2. 2.Research and Developmentprudsys AGChemnitzGermany

Personalised recommendations