Advertisement

Machine Learning

, Volume 81, Issue 3, pp 283–331 | Cite as

Improving reinforcement learning by using sequence trees

  • Sertan Girgin
  • Faruk Polat
  • Reda Alhajj
Article

Abstract

This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure during the learning process. The constructed tree facilitates the process of identifying frequently used action sequences together with states that are visited during the execution of such sequences. The tree is constantly updated and used to implicitly run corresponding options. The effectiveness of the method is demonstrated empirically by conducting extensive experiments on various domains with different properties.

Keywords

Reinforcement learning Options Conditionally terminating sequences Temporal abstractions Semi-Markov decision processes 

References

  1. Asadi, M., & Huber, M. (2005). Autonomous subgoal discovery and hierarchical abstraction for reinforcement learning using Monte Carlo method. In M. M. Veloso, & S. Kambhampati (Eds.), AAAI (pp. 1588–1589). Menlo Park/Cambridge: AAAI Press/MIT Press. Google Scholar
  2. Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379. CrossRefMathSciNetGoogle Scholar
  3. Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press. MATHGoogle Scholar
  4. Bianchi, R. A., Ribeiro, C. H., & Costa, A. H. (2008). Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, 14(2), 135–168. CrossRefGoogle Scholar
  5. Bradtke, S. J., & Duff, M. O. (1994). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393–400). Cambridge: MIT Press. Google Scholar
  6. Chen, F., Gao, Y., Chen, S., & Ma, Z. (2007). Connect-based subgoal discovery for options in hierarchical reinforcement learning. In ICNC ’07: Proceedings of the third international conference on natural computation (pp. 698–702). Los Alamitos: IEEE Computer Society. CrossRefGoogle Scholar
  7. Degris, T., Sigaud, O., & Wuillemin, P.-H. (2006). Learning the structure of factored Markov decision processes in reinforcement learning problems. In ICML ’06: Proceedings of the 23rd international conference on machine learning (pp. 257–264). New York: ACM. CrossRefGoogle Scholar
  8. Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303. MATHMathSciNetGoogle Scholar
  9. Digney, B. (1998). Learning hierarchical control structure for multiple tasks and changing environments. In Proceedings of the fifth conference on the simulation of adaptive behavior: SAB 98. Google Scholar
  10. Girgin, S., Polat, F., & Alhajj, R. (2006a). Effectiveness of considering state similarity for reinforcement learning. In LNCS. The international conference on intelligent data engineering and automated learning. Berlin: Springer. Google Scholar
  11. Girgin, S., Polat, F., & Alhajj, R. (2006b). Learning by automatic option discovery from conditionally terminating sequences. In The 17th European conference on artificial intelligence. Amsterdam: IOS Press. Google Scholar
  12. Girgin, S., Polat, F., & Alhajj, R. (2007). State similarity based approach for improving performance in RL. In LNCS. The international joint conference on artificial intelligent. Berlin: Springer. Google Scholar
  13. Goel, S., & Huber, M. (2003). Subgoal discovery for hierarchical reinforcement learning using learned policies. In I. Russell, & S. M. Haller (Eds.), FLAIRS conference (pp. 346–350). Menlo Park: AAAI Press. Google Scholar
  14. Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Uncertainty in artificial intelligence (pp. 220–229). Google Scholar
  15. Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In the International conference on machine learning. San Mateo: Morgan Kaufman. Google Scholar
  16. Jonsson, A., & Barto, A. G. (2001). Automated state abstraction for options using the u-tree algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 1054–1060). Cambridge: MIT Press. Google Scholar
  17. Kazemitabar, S. J., & Beigy, H. (2009). Automatic discovery of subgoals in reinforcement learning using strongly connected components. In M. Köppen, N. K. Kasabov, & G. G. Coghill (Eds.), Lecture notes in computer science : Vol. 5506. ICONIP (1) (pp. 829–834). Berlin: Springer. Google Scholar
  18. Kozlova, O., Sigaud, O., & Meyer, C. (2009). Automated discovery of options in factored reinforcement learning. In Proceedings of the ICML/UAI/COLT workshop on abstraction in reinforcement learning (pp. 24–29), Montreal, Canada. Google Scholar
  19. Littman, M., Kaelbling, L., & Moore, A. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285. Google Scholar
  20. Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3–4), 293–321. Google Scholar
  21. Mahadevan, S., Marchallek, N., Das, T. K., & Gosavi, A. (1997). Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Proceedings of the 14th international conference on machine learning (pp. 202–210). San Mateo: Morgan Kaufmann. Google Scholar
  22. Mannor, S., Menache, I., Hoze, A., & Klein, U. (2004). Dynamic abstraction in reinforcement learning via clustering. In ICML ’04: Proceedings of the 21st international conference on machine learning (pp. 71–78). New York: ACM. Google Scholar
  23. McGovern, A. (1998). Acquire-macros: an algorithm for automatically learning macro-actions. In The neural information processing systems conference (NIPS’98) workshop on abstraction and hierarchy in reinforcement learning. Google Scholar
  24. McGovern, A. (2002). Autonomous discovery of temporal abstractions from interactions with an environment. Ph.D. thesis, University of Massachusetts Amherts, May 2002. Google Scholar
  25. McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML ’01: Proceedings of the 18th international conference on machine learning (pp. 361–368). San Mateo: Morgan Kaufmann. Google Scholar
  26. McGovern, A., & Sutton, R. S. (1998). Macro-actions in reinforcement learning: an empirical analysis. Technical Report 98-79, University of Massachusetts, Department of Computer Science. Google Scholar
  27. Menache, I., Mannor, S., & Shimkin, N. (2002). Q-cut—dynamic discovery of sub-goals in reinforcement learning. In ECML ’02: Proceedings of the 13th European conference on machine learning (pp. 295–306). London: Springer. CrossRefGoogle Scholar
  28. Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: a tool for research on multiagent systems. Applied Artificial Intelligence, 12(2–3), 233–250. Google Scholar
  29. Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In NIPS ’97: Proceedings of the 1997 conference on advances in neural information processing systems 10 (pp. 1043–1049). Cambridge: MIT Press. Google Scholar
  30. Parr, R. E. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California at Berkeley. Google Scholar
  31. Piater, J. H., Cohen, P. R., Zhang, X., & Atighetchi, M. (1998). A randomized ANOVA procedure for comparing performance curves. In ICML ’98: Proceedings of the fifteenth international conference on machine learning (pp. 430–438). San Mateo: Morgan Kaufmann. Google Scholar
  32. Precup, D., Sutton, R. S., & Singh, S. P. (1998). Theoretical results on reinforcement learning with temporally abstract options. In European conference on machine learning (pp. 382–393). Google Scholar
  33. Simsek, O., & Barto, A. G. (2004). Using relative novelty to identify useful temporal abstractions in reinforcement learning. In ICML ’04: Proceedings of the 21st international conference on machine learning. Banff, Canada. Google Scholar
  34. Simsek, O., Wolfe, A. P., & Barto, A. G. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. In ICML ’05: Proceedings of the 22nd international conference on machine learning. Google Scholar
  35. Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning. In Proceedings of the 5th international symposium on abstraction, reformulation and approximation (pp. 212–223). London: Springer. CrossRefGoogle Scholar
  36. Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. In Proceedings of the eighteenth international conference on machine learning (pp. 537–544). San Mateo: Morgan Kaufmann. Google Scholar
  37. Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188. CrossRefGoogle Scholar
  38. Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. (2006). Keepaway soccer: from machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, & Y. Takahashi (Eds.), RoboCup-2005: Robot Soccer World Cup IX (Vol. 4020, pp. 93–105). Berlin: Springer. CrossRefGoogle Scholar
  39. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press. A Bradford Book. Google Scholar
  40. Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211. MATHCrossRefMathSciNetGoogle Scholar
  41. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3/4), 279–292. MATHCrossRefGoogle Scholar
  42. Zang, P., Zhou, P., Minnen, D., & Isbell, C. (2009). Discovering options from example trajectories. In ICML ’09: Proceedings of the 26th annual international conference on machine learning (pp. 1217–1224). New York: ACM. Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Department of Computer EngineeringMiddle East Technical UniversityAnkaraTurkey
  2. 2.Department of Computer ScienceUniversity of CalgaryCalgaryCanada
  3. 3.Department of Computer ScienceGlobal UniversityBeirutLebanon

Personalised recommendations