Control of a Free-Falling Cat by Policy-Based Reinforcement Learning

  • Daichi Nakano
  • Shin-ichi Maeda
  • Shin Ishii
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)


Autonomous control of nonholonomic systems is one big challenge, because there is no unified control method that can handle any nonholonomic systems even if the dynamics are known. To this challenge, in this study, we propose a reinforcement learning (RL) approach which enables the controller to acquire an appropriate control policy even without knowing the detailed dynamics. In particular, we focus on the control problem of a free-falling cat system whose dynamics are highly-nonlinear and nonholonomic. To accelerate the learning, we take the policy gradient method that exploits the basic knowledge of the system, and present an appropriate policy representation for the task. It is shown that this RL method achieves remarkably faster learning than that by the existing genetic algorithm-based method.


Free-falling cat Nonholonomic system Policy gradient method 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nakamura, Y.: Nonholonomic robot systems, Part 1: what’s a nonholonomic robot? Journal of RSJ 11, 521–528 (1993)Google Scholar
  2. 2.
    Brockett, R.W.: Asymptotic stability and feedback stabilization. Progress in Mathematics 27, 181–208 (1983)MathSciNetGoogle Scholar
  3. 3.
    Mita, T.: Introduction to nonlinear control Theory-Skill control of underactuated robots. SHOKODO Co., Ltd. (2000) (in Japanese)Google Scholar
  4. 4.
    Murray, R.M., Sastry, S.S.: Nonholonomic motion planning: steering using sinusoids. IEEE Transactions on Automatic Control 38, 700–716 (1993)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Holamoto, S., Funasako, T.: Feedback control of a planar space robot using a moving manifold. Journal of RSJ 25, 745–751 (1993)Google Scholar
  6. 6.
    Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21, 682–697 (2008)CrossRefGoogle Scholar
  7. 7.
    Miyamae, A., et al.: Instance-based policy learning by real-coded genetic algorithms and its application to control of nonholonomic systems. Transactions of the Japanese Society for Artificial Intelligence 24, 104–115 (2009)CrossRefGoogle Scholar
  8. 8.
    Tsuchiya, C., et al.: SLIP: A sophisticated learner for instance-based policy using hybrid GA. Transactions of SICE 42, 1344–1352 (2006)Google Scholar
  9. 9.
    Nakamura, Y., Mukherjee, R.: Nonholonomic path planning of space robots via a bidirectional approach. IEEE Transactions on Robotics and Automation 7, 500–514 (1991)CrossRefGoogle Scholar
  10. 10.
    Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Ge, X., Chen, L.: Optimal control of nonholonomic motion planning for a free-falling cat. Applied Mathematics and Mechanics 28, 601–607 (2007)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Daichi Nakano
    • 1
  • Shin-ichi Maeda
    • 1
  • Shin Ishii
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityUjiJapan

Personalised recommendations