Advertisement

Learning Complex Behaviors via Sequential Composition and Passivity-Based Control

  • Gabriel A. D. LopesEmail author
  • Esmaeil Najafi
  • Subramanya P. Nageshrao
  • Robert Babuška
Chapter
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 42)

Abstract

The model-free paradigm of Reinforcement learning (RL) is a theoretical strength. However in practice, the stringent assumptions required for optimal solutions (full state space exploration) and experimental issues, such as slow learning rates, render model-free RL a practical weakness. This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics. In our approach space is handled by Sequential Composition (a technique commonly used in robotics) and time is handled by the use of passivity-based control methods (a standard nonlinear control approach) towards speeding up learning and providing a stopping time criteria. Sequential composition in effect partitions the state space and allows for the composition of controllers, each having different domains of attraction (DoA) and goal sets. This results in learning taking place in subsets of the state space. Passivity-based control (PBC) is a model-based control approach where total energy is computable. This total energy can be used as a candidate Lyapunov function to evaluate the stability of a controller and find estimates of its DoA. This enables learning in finite time: while learning the candidate Lyapunov function is monitored online to approximate the DoA of the learned controller. Once this DoA covers relevant states, from the point of view of sequential composition, the learning process is stopped. The result of this process is a collection of learned controllers that cover a desired range of the state space, and can be composed in sequence to achieve various desired goals. Optimality is lost in favour of practicality. Other implications include safety while learning and incremental learning.

Keywords

Reinforcement Learning Markov Decision Process Sequential Composition Candidate Lyapunov Function Reinforcement Learning Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M (2009) Natural actor–critic algorithms. Automatica 45(11):2471–2482Google Scholar
  2. Burridge RR, Rizzi AA, Koditschek DE (1999) Sequential composition of dynamically dexterous robot behaviors. Int J Robot Res 18(6):534–555CrossRefGoogle Scholar
  3. Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC PressGoogle Scholar
  4. Chesi G (2004) Estimating the domain of attraction for uncertain polynomial systems. Automatica 40(11):1981–1986CrossRefMathSciNetzbMATHGoogle Scholar
  5. Chesi G (2011) Domain of attraction: analysis and control via SOS programming. SpringerGoogle Scholar
  6. Chiang HD, Hirsch MW, Wu FF (1988) Stability regions of nonlinear autonomous dynamical systems. IEEE Trans Autom Control 33(1):16–27CrossRefMathSciNetzbMATHGoogle Scholar
  7. Conner DC, Choset H, Rizzi AA (2009) Flow-through policies for hybrid controller synthesis applied to fully actuated systems. IEEE Trans Robot 25(1):136–146CrossRefGoogle Scholar
  8. Fujimoto K, Sugie T (2001) Canonical transformation and stabilization of generalized hamiltonian systems. Syst Control Lett 42(3):217–227CrossRefMathSciNetGoogle Scholar
  9. Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42(6):1291–1307Google Scholar
  10. Hachicho O (2007) A novel LMI-based optimization algorithm for the guaranteed estimation of the domain of attraction using rational Lyapunov functions. J Frankl Inst 344(5):535–552CrossRefMathSciNetzbMATHGoogle Scholar
  11. Henrion D, Korda M (2013) Convex computation of the region of attraction of polynomial control systems. In: Proceedings of the european control conference, pp 676–681Google Scholar
  12. Khalil HK (2002) Nonlinear systems, vol 3. Prentice hallGoogle Scholar
  13. Konda VR, Tsitsiklis JN (2003) On actor–critic algorithms. SIAM j Control Optim 42(4):1143–1166Google Scholar
  14. Konidaris G, Barreto AS (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems. pp 1015–1023Google Scholar
  15. Le Ny J, Pappas GJ (2012) Sequential composition of robust controller specifications. In: Proceedings of the IEEE international conference on robotics and automation. pp 5190–5195Google Scholar
  16. Lindemann SR, LaValle SM (2009) Simple and efficient algorithms for computing smooth, collision-free feedback laws over given cell decompositions. Int J Robot Res 28(5):600–621CrossRefGoogle Scholar
  17. Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less time. Mach Learn 13(1):103–130Google Scholar
  18. Najafi E, Lopes GA, Babuska R (2013) Reinforcement learning for sequential composition control. In: Proceedings of the IEEE international conference on decision and control. pp 7265–7270Google Scholar
  19. Najafi E, Lopes GA, Babuska R (2014a) Balancing a legged robot using state-dependent Riccati equation control. In: Proceedings of the 19th IFAC world congress, vol 19. pp 2177–2182Google Scholar
  20. Najafi E, Lopes GA, Nageshrao SP, Babuska R (2014b) Rapid learning in sequential composition control. In: Proceedings of the IEEE international conference on decision and controlGoogle Scholar
  21. Ortega R (1998) Passivity-based control of Euler-Lagrange systems: mechanical, electrical and electromechanical applications. SpringerGoogle Scholar
  22. Ortega R, Garcia-Canseco E (2004) Interconnection and damping assignment passivity-based control: a survey. Eur J Control 10(5):432–450CrossRefMathSciNetzbMATHGoogle Scholar
  23. Ortega R, Van der Schaft AJ, Mareels I, Maschke B (2001) Putting energy back in control. IEEE Control Syst 21(2):18–33CrossRefGoogle Scholar
  24. Ortega R, Van Der Schaft A, Maschke B, Escobar G (2002) Interconnection and damping assignment passivity-based control of port-controlled hamiltonian systems. Automatica 38(4):585–596CrossRefMathSciNetzbMATHGoogle Scholar
  25. Ortega R, van der Schaft A, Castaños F, Astolfi A (2008) Control by interconnection and standard passivity-based control of port-hamiltonian systems. IEEE Trans Autom Control 53(11):2527–2542CrossRefGoogle Scholar
  26. Packard A, Topcu U, Seiler P, Balas G (2010) Help on SOS. IEEE Control Syst 30(4):18–23CrossRefMathSciNetGoogle Scholar
  27. Parrilo PA (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D thesis, California Institute of TechnologyGoogle Scholar
  28. Sprangers O, Babuska R, Nageshrao S, Lopes G (2015) Reinforcement learning for Port-Hamiltonian systems. IEEE Trans Cybern 45(5):1003–1013CrossRefGoogle Scholar
  29. Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT PressGoogle Scholar
  30. Tedrake R, Manchester IR, Tobenkin M, Roberts JW (2010) LQR-trees: feedback motion planning via sums-of-squares verification. Int J Robot Res 29(8):1038–1052CrossRefGoogle Scholar
  31. van der Schaft A, Jeltsema D (2014) Port-hamiltonian systems theory: an introductory overview. Found Trends Syst Control 1(2–3):173–378CrossRefGoogle Scholar
  32. Vidyasagar M (2002) Nonlinear systems analysis, vol 42. SIAMGoogle Scholar
  33. West DB, et al (2001) Introduction to graph theory, 2nd edn. Prentice hallGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gabriel A. D. Lopes
    • 1
    Email author
  • Esmaeil Najafi
    • 1
  • Subramanya P. Nageshrao
    • 1
  • Robert Babuška
    • 1
  1. 1.Delft Center for Systems and ControlDelft University of TechnologyDelftThe Netherlands

Personalised recommendations