Structural Abstraction Experiments in Reinforcement Learning

  • Robert Fitch
  • Bernhard Hengst
  • Dorian Šuc
  • Greg Calbert
  • Jason Scholz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3809)


A challenge in applying reinforcement learning to large problems is how to manage the explosive increase in storage and time complexity. This is especially problematic in multi-agent systems, where the state space grows exponentially in the number of agents. Function approximation based on simple supervised learning is unlikely to scale to complex domains on its own, but structural abstraction that exploits system properties and problem representations shows more promise. In this paper, we investigate several classes of known abstractions: 1) symmetry, 2) decomposition into multiple agents, 3) hierarchical decomposition, and 4) sequential execution. We compare memory requirements, learning time, and solution quality empirically in two problem variations. Our results indicate that the most effective solutions come from combinations of structural abstractions, and encourage development of methods for automatic discovery in novel problem formulations.


Reinforcement Learning Joint Action Travel Salesman Problem Sequential Execution Hierarchical Decomposition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Clark, A., Thornton, C.: Trading spaces: Computation, representation, and the limits of uninformed learning. Behavioral and Brain Sciences 20, 57–66 (1997)CrossRefGoogle Scholar
  3. 3.
    Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. In: Neural Computation. MIT Press Journals, Cambridge (2002)Google Scholar
  4. 4.
    Ashby, R.: Introduction to Cybernetics. Chapman & Hall, London (1956)zbMATHGoogle Scholar
  5. 5.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, King’s College (1989)Google Scholar
  7. 7.
    Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi markov decision processes. In: Proc. of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 1011–1018 (2003)Google Scholar
  8. 8.
    Ravindran, B., Barto, A.G.: Model minimization in hierarchical reinforcement learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 196–211. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Dean, T., Givan, R.: Model minimization in markov decision processes. In: AAAI/IAAI, 106–111 (1997)Google Scholar
  10. 10.
    Givan, R., Leach, S.M., Dean, T.: Bounded-parameter markov decision processes. Artificial Intelligence 122, 71–109 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33, 235–262 (1998)zbMATHCrossRefGoogle Scholar
  12. 12.
    Wolpert, D., Tumer, K.: An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center, CA (1999)Google Scholar
  13. 13.
    Braess, D.: Über ein Paradoxon der Verkehrsplanung. Unternehmensforschung 12, 258–268 (1968)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Rohanimanesh, K., Mahadevan, S.: Learning to take concurrent actions. In: NIPS, pp. 1619–1626 (2002)Google Scholar
  15. 15.
    Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the Nineteenth International Conference on Machine Learning, pp. 243–250. Morgan Kaufmann, San Francisco (2002)Google Scholar
  16. 16.
    Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Machine Learning Proceedings of the Tenth International Conference, San Mateo, CA, pp. 167–173. Morgan Kaufmann, San Francisco (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Robert Fitch
    • 1
  • Bernhard Hengst
    • 1
  • Dorian Šuc
    • 1
  • Greg Calbert
    • 2
  • Jason Scholz
    • 2
  1. 1.National ICT AustraliaUniversity of NSWAustralia
  2. 2.Defence Science and Technology OrganizationSalisburyAustralia

Personalised recommendations