Abstract
This paper presents a comparative analysis of three Reinforcement Learning algorithms (Q-learning, Q(\(\lambda \))-learning and QS-learning) and their heuristically-accelerated variants (HAQL, HAQ(\(\lambda \)) and HAQS) where heuristics bias action selection, thus speeding up the learning. The experiments were performed in a simulated robot soccer environment which reproduces the conditions of a real competition league environment. The results clearly demonstrate that the use of heuristics substantially improves the performance of the learning algorithms.
The author acknowledges the current support of FAPESP – project n\(^{o}\) 2012/12640-1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Watkins, C.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
Wiering, M., Schmidhuber, J.: Fast online q(lambda). Mach. Learn. 33(1), 105–115 (1998)
Ribeiro, C., Szepesvári, C.: Q-learning combined with spreading: convergence and results. In: ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), pp. 32–36 (1996)
Ribeiro, C., Pegoraro, R., Costa, A.: Experience generalization for concurrent reinforcement learners: the minimax-qs algorithm. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1239–1245. ACM, NY (2002)
Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Heuristically accelerated Q–learning: a new approach to speed up reinforcement learning. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 245–254. Springer, Heidelberg (2004)
Peng, J., Williams, R.: Incremental multi-step q-learning. Mach. Learn. 22(1–3), 283–290 (1996)
Wiering, M., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B 38(4), 930–936 (2008)
Bianchi, R., Ribeiro, C., Costa, A.: Accelerating autonomous learning by using heuristic selection of actions. J. Heuristics 14(2), 135–168 (2008)
Bianchi, R., Ribeiro, C., Costa, A.: Heuristic selection of actions in multiagent reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 690–696. Morgan Kaufmann Publishers Inc. (2007)
Gurzoni Jr, J.A., Tonidandel, F., Bianchi, R.A.C.: Market-based dynamic task allocation using heuristically accelerated reinforcement learning. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 365–376. Springer, Heidelberg (2011)
Bianchi, R.A.C., Ros, R., Lopez de Mantaras, R.: Improving reinforcement learning by using case based heuristics. In: McGinty, L., Wilson, D.C. (eds.) ICCBR 2009. LNCS, vol. 5650, pp. 75–89. Springer, Heidelberg (2009)
Bianchi, R., Martins, M., Ribeiro, C., Costa, A.: Heuristically-accelerated multiagent reinforcement learning. IEEE Trans. Cybern. 44(2), 252–265 (2013)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning (ML-94), pp. 157–163. Morgan Kaufmann, New Brunswick (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martins, M.F., Bianchi, R.A.C. (2014). Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance. In: Natraj, A., Cameron, S., Melhuish, C., Witkowski, M. (eds) Towards Autonomous Robotic Systems. TAROS 2013. Lecture Notes in Computer Science(), vol 8069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43645-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-43645-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43644-8
Online ISBN: 978-3-662-43645-5
eBook Packages: Computer ScienceComputer Science (R0)