The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots

  • Todd Hester
  • Peter Stone
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8371)


The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations on-line. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this paper, we present the texplore ROS code release, which contains texplore, the first algorithm to address all of these challenges together. We demonstrate texplore learning to control the velocity of an autonomous vehicle in real-time. texplore has been released as an open-source ROS repository, enabling learning on a variety of robot tasks.


Reinforcement Learning Markov Decision Processes Robots 


  1. 1.
    Beeson, P., O’Quin, J., Gillan, B., Nimmagadda, T., Ristroph, M., Li, D., Stone, P.: Multiagent interactions in urban driving. Journal of Physical Agents 2(1), 15–30 (2008)Google Scholar
  2. 2.
    Brafman, R., Tennenholtz, M.: R-Max - a general polynomial time algorithm for near-optimal reinforcement learning. In: IJCAI (2001)Google Scholar
  3. 3.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dietterich, T.: The MAXQ method for hierarchical reinforcement learning. In: ICML, pp. 118–126 (1998)Google Scholar
  5. 5.
    Hester, T.: TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. PhD thesis, Department of Computer Science, University of Texas at Austin, Austin, TX (December 2012)Google Scholar
  6. 6.
    Hester, T., Quinlan, M., Stone, P.: Generalized model learning for reinforcement learning on a humanoid robot. In: ICRA (May 2010)Google Scholar
  7. 7.
    Hester, T., Quinlan, M., Stone, P.: RTMBA: A real-time model-based reinforcement learning architecture for robot control. In: ICRA (2012)Google Scholar
  8. 8.
    Hester, T., Stone, P.: Real time targeted exploration in large domains. In: ICDL (August 2010)Google Scholar
  9. 9.
    Hester, T., Stone, P.: TEXPLORE: Real-time sample-efficient reinforcement learning for robots. Machine Learning 87, 10–20 (2012)Google Scholar
  10. 10.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)Google Scholar
  11. 11.
    Kohl, N., Stone, P.: Machine learning for fast quadrupedal locomotion. In: AAAI Conference on Artificial Intelligence (2004)Google Scholar
  12. 12.
    Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: IJCAI (2007)Google Scholar
  13. 13.
    Ng, A., Kim, H.J., Jordan, M., Sastry, S.: Autonomous helicopter flight via reinforcement learning. In: NIPS (2003)Google Scholar
  14. 14.
    Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.: ROS: An open-source robot operating system. In: ICRA Workshop on Open Source Software (2009)Google Scholar
  15. 15.
    Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
  16. 16.
    Quinlan, R.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)Google Scholar
  17. 17.
    Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Cambridge University Engineering Department (1994)Google Scholar
  18. 18.
    Sutton, R.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: ICML, pp. 216–224 (1990)Google Scholar
  19. 19.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Tanner, B., White, A.: RL-Glue: Language-independent software for reinforcement-learning experiments. JMLR 10, 2133–2136 (2009)Google Scholar
  21. 21.
    Watkins, C.: Learning From Delayed Rewards. PhD thesis. University of Cambridge (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Todd Hester
    • 1
  • Peter Stone
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at AustinAustinUSA

Personalised recommendations