Skip to main content

Hierarchical Approaches

  • Chapter
Reinforcement Learning

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Abstract

Hierarchical decomposition tackles complex problems by reducing them to a smaller set of interrelated problems. The smaller problems are solved separately and the results re-combined to find a solution to the original problem. It is well known that the naïve application of reinforcement learning (RL) techniques fails to scale to more complex domains. This Chapter introduces hierarchical approaches to reinforcement learning that hold out the promise of reducing a reinforcement learning problems to a manageable size. Hierarchical Reinforcement Learning (HRL) rests on finding good re-usable temporally extended actions that may also provide opportunities for state abstraction. Methods for reinforcement learning can be extended to work with abstract states and actions over a hierarchy of subtasks that decompose the original problem, potentially reducing its computational complexity. We use a four-room task as a running example to illustrate the various concepts and approaches, including algorithms that can automatically learn the hierarchical structure from interactions with the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agre, P.E., Chapman, D.: Pengi: an implementation of a theory of activity. In: Proceedings of the Sixth National Conference on Artificial Intelligence, AAAI 1987, vol. 1, pp. 268–272. AAAI Press (1987)

    Google Scholar 

  • Amarel, S.: On representations of problems of reasoning about actions. In: Michie, D. (ed.) Machine Intelligence, vol. 3, pp. 131–171. Edinburgh at the University Press, Edinburgh (1968)

    Google Scholar 

  • Andre, D., Russell, S.J.: Programmable reinforcement learning agents. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, pp. 1019–1025. MIT Press (2000)

    Google Scholar 

  • Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: Dechter, R., Kearns, M., Sutton, R.S. (eds.) Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 119–125. AAAI Press (2002)

    Google Scholar 

  • Ashby, R.: Design for a Brain: The Origin of Adaptive Behaviour. Chapman & Hall, London (1952)

    Google Scholar 

  • Ashby, R.: Introduction to Cybernetics. Chapman & Hall, London (1956)

    MATH  Google Scholar 

  • Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, pp. 438–445 (2004)

    Google Scholar 

  • Barto, A.G., Mahadevan, S.: Recent advances in hiearchical reinforcement learning. Special Issue on Reinforcement Learning, Discrete Event Systems Journal 13, 41–77 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)

    MATH  Google Scholar 

  • Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1104–1111. Morgan Kaufmann Publishers Inc., San Francisco (1995)

    Google Scholar 

  • Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S.: Decision-theoretic, high-level agent programming in the situation calculus. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 355–362. AAAI Press (2000)

    Google Scholar 

  • Brooks, R.A.: Elephants don’t play chess. Robotics and Autonomous Systems 6, 3–15 (1990)

    Article  Google Scholar 

  • Castro, P.S., Precup, D.: Using bisimulation for policy transfer in mdps. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, vol. 1, pp. 1399–1400. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)

    Google Scholar 

  • Clark, A., Thornton, C.: Trading spaces: Computation, representation, and the limits of uninformed learning. Behavioral and Brain Sciences 20(1), 57–66 (1997)

    Article  Google Scholar 

  • Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), vol. 5 (1992)

    Google Scholar 

  • Dean, T., Givan, R.: Model minimization in Markov decision processes. In: AAAI/IAAI, pp. 106–111 (1997)

    Google Scholar 

  • Dean, T., Lin, S.H.: Decomposition techniques for planning in stochastic domains. Tech. Rep. CS-95-10, Department of Computer Science Brown University (1995)

    Google Scholar 

  • Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  • Digney, B.L.: Learning hierarchical control structures for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behaviour SAB (1998)

    Google Scholar 

  • Ferrein, A., Lakemeyer, G.: Logic-based robot control in highly dynamic domains. Robot Auton. Syst. 56(11), 980–991 (2008)

    Article  Google Scholar 

  • Fitch, R., Hengst, B., šuc, D., Calbert, G., Scholz, J.: Structural Abstraction Experiments in Reinforcement Learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164–175. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  • Forestier, J., Varaiya, P.: Multilayer control of large Markov chains. IEEE Tansactions Automatic Control 23, 298–304 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Gamow, G., Stern, M.: Puzzle-math. Viking Press (1958)

    Google Scholar 

  • Ghavamzadeh, M., Mahadevan, S.: Continuous-time hierarchial reinforcement learning. In: Proc. 18th International Conf. on Machine Learning, pp. 186–193. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  • Ghavamzadeh, M., Mahadevan, S.: Hierarchical policy gradient algorithms. In: Marine Environments, pp. 226–233. AAAI Press (2003)

    Google Scholar 

  • Hauskrecht, M., Meuleau, N., Kaelbling, L.P., Dean, T., Boutilier, C.: Hierarchical solution of Markov decision processes using macro-actions. In: Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 220–229 (1998)

    Google Scholar 

  • Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the Nineteenth International Conference on Machine Learning, pp. 243–250. Morgan Kaufmann (2002)

    Google Scholar 

  • Hengst, B.: Model Approximation for HEXQ Hierarchical Reinforcement Learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 144–155. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Hengst, B.: Safe State Abstraction and Reusable Continuing Subtasks in Hierarchical Reinforcement Learning. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 58–67. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Hengst, B.: Partial Order Hierarchical Reinforcement Learning. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 138–149. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Hernandez, N., Mahadevan, S.: Hierarchical memory-based reinforcement learning. In: Fifteenth International Conference on Neural Information Processing Systems, Denver (2000)

    Google Scholar 

  • Hutter, M.: Universal algorithmic intelligence: A mathematical top→down approach. In: Artificial General Intelligence, pp. 227–290. Springer, Berlin (2007)

    Chapter  Google Scholar 

  • Jong, N.K., Stone, P.: Compositional models for reinforcement learning. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2009)

    Google Scholar 

  • Jonsson, A., Barto, A.G.: Causal graph based decomposition of factored mdps. Journal of Machine Learning 7, 2259–2301 (2006)

    MathSciNet  MATH  Google Scholar 

  • Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of the Tenth International Conference Machine Learning, pp. 167–173. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  • Konidaris, G., Barto, A.G.: Building portable options: skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 895–900. Morgan Kaufmann Publishers Inc., San Francisco (2007)

    Google Scholar 

  • Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1015–1023 (2009)

    Google Scholar 

  • Konidaris, G., Kuindersma, S., Barto, A.G., Grupen, R.: Constructing skill trees for reinforcement learning agents from demonstration trajectories. In: Advances in Neural Information Processing Systems NIPS, vol. 23 (2010)

    Google Scholar 

  • Korf, R.E.: Learning to Solve Problems by Searching for Macro-Operators. Pitman Publishing Inc., Boston (1985)

    MATH  Google Scholar 

  • Levesque, H., Reiter, R., Lespérance, Y., Lin, F., Scherl, R.: Golog: A logic programming language for dynamic domains. Journal of Logic Programming 31, 59–84 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Mahadevan, S.: Representation discovery in sequential descision making. In: 24th Conference on Artificial Intelligence (AAAI), Atlanta, July 11-15 (2010)

    Google Scholar 

  • Marthi, B., Russell, S., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proc. IJCAI 2005 Edinburgh, Scotland (2005)

    Google Scholar 

  • Marthi, B., Kaelbling, L., Lozano-Perez, T.: Learning hierarchical structure in policies. In: NIPS 2007 Workshop on Hierarchical Organization of Behavior (2007)

    Google Scholar 

  • McGovern, A.: Autonomous Discovery of Abstractions Through Interaction with an Environment. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 338–339. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Mach. Learn. 73, 289–312 (2008a), doi:10.1007/s10994-008-5061-y

    Article  Google Scholar 

  • Mehta, N., Ray, S., Tadepalli, P., Dietterich, T.: Automatic discovery and transfer of maxq hierarchies. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 648–655. ACM, New York (2008b)

    Chapter  Google Scholar 

  • Menache, I., Mannor, S., Shimkin, N.: Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–305. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Moerman, W.: Hierarchical reinforcement learning: Assignment of behaviours to subpolicies by self-organization. PhD thesis, Cognitive Artificial Intelligence, Utrecht University (2009)

    Google Scholar 

  • Moore, A., Baird, L., Kaelbling, L.P.: Multi-value-functions: Efficient automatic action hierarchies for multiple goal mdps. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1316–1323. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  • Mugan, J., Kuipers, B.: Autonomously learning an action hierarchy using a learned qualitative state representation. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1175–1180. Morgan Kaufmann Publishers Inc., San Francisco (2009)

    Google Scholar 

  • Neumann, G., Maass, W., Peters, J.: Learning complex motions by sequencing simpler motion templates. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 753–760. ACM, New York (2009)

    Google Scholar 

  • Nilsson, N.J.: Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research 1, 139–158 (1994)

    Google Scholar 

  • Osentoski, S., Mahadevan, S.: Basis function construction for hierarchical reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, vol. 1, pp. 747–754. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)

    Google Scholar 

  • Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: NIPS (1997)

    Google Scholar 

  • Parr, R.E.: Hierarchical control and learning for Markov decision processes. PhD thesis, University of California at Berkeley (1998)

    Google Scholar 

  • Pineau, J., Thrun, S.: An integrated approach to hierarchy and abstraction for pomdps. CMU Technical Report: CMU-RI-TR-02-21 (2002)

    Google Scholar 

  • Polya, G.: How to Solve It: A New Aspect of Mathematical Model. Princeton University Press (1945)

    Google Scholar 

  • Precup, D., Sutton, R.S.: Multi-time models for temporally abstract planning. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1050–1056. MIT Press (1997)

    Google Scholar 

  • Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Whiley & Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  • Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi Markov decision processes. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 (2003)

    Google Scholar 

  • Rohanimanesh, K., Mahadevan, S.: Decision-theoretic planning with concurrent temporally extended actions. In: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 472–479. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  • Rohanimanesh, K., Mahadevan, S.: Coarticulation: an approach for generating concurrent plans in Markov decision processes. In: ICML 2005: Proceedings of the 22nd international conference on Machine learning, pp. 720–727. ACM Press, New York (2005)

    Google Scholar 

  • Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  • Ryan, M.R.K.: Hierarchical Decision Making. In: Handbook of Learning and Approximate Dynamic Programming. IEEE Press Series on Computational Intelligence. Wiley-IEEE Press (2004)

    Google Scholar 

  • Reid, M.D., Ryan, M.: Using ILP to Improve Planning in Hierarchical Reinforcement Learning. In: Cussens, J., Frisch, A.M. (eds.) ILP 2000. LNCS (LNAI), vol. 1866, pp. 174–190. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  • Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. IEEE Press Series on Computational Intelligence. Wiley-IEEE Press (2004)

    Google Scholar 

  • Simon, H.A.: The Sciences of the Artificial, 3rd edn. MIT Press, Cambridge (1996)

    Google Scholar 

  • Şimşek, O., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of theTwenty-First International Conference on Machine Learning, ICML 2004 (2004)

    Google Scholar 

  • Singh, S.: Reinforcement learning with a hierarchy of abstract models. In: Proceedings of the Tenth National Conference on Artificial Intelligence (1992)

    Google Scholar 

  • Stone, P.: Layered learning in multi-agent systems. PhD, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1998)

    Google Scholar 

  • Strehl, A.L., Diuk, C., Littman, M.L.: Efficient structure learning in factored-state mdps. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol. 1, pp. 645–650. AAAI Press (2007)

    Google Scholar 

  • Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  • Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPS with macro-actions. In: Advances in Neural Information Processing Systems 16 (NIPS-2003) (2004) (to appear)

    Google Scholar 

  • Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 7. MIT Press, Cambridge (1995)

    Google Scholar 

  • Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. In: Neural Computation. MIT Press Journals (2002)

    Google Scholar 

  • Watkins CJCH, Learning from delayed rewards. PhD thesis, King’s College (1989)

    Google Scholar 

  • Wiering, M., Schmidhuber, J.: HQ-learning. Adaptive Behavior 6, 219–246 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernhard Hengst .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hengst, B. (2012). Hierarchical Approaches. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27645-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27644-6

  • Online ISBN: 978-3-642-27645-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics