Reinforcement learning in feedback control
- 3.3k Downloads
Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.
This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.
A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.
KeywordsReinforcement learning Feedback control Benchmarks Nonlinear control
- Anderson, C., & Miller, W. (1990). Challenging control problems. In Neural networks for control (pp. 475–410). Google Scholar
- Anderson, C. W., Hittle, D., Katz, A., & Kretchmar, R. M. (1997). Synthesis of reinforcement learning, neural networks, and pi control applied to a simulated heating coil. Journal of Artificial Intelligence in Engineering, 11(4), 423–431. Google Scholar
- Boyan, J., & Littman, M. (1994). Packet routing in dynamically changing networks—a reinforcement learning approach. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6. Google Scholar
- Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In: Andvances in neural information processing systems 8. Google Scholar
- CTM (1996). Digital Control Tutorial. University of Michigan, www.engin.umich.edu/group/ctm (online).
- Dullerud, G. P. F. (2000). A course in robust control theory: A convex approach. New York: Springer. Google Scholar
- Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intellifent Computing, 24(4). Google Scholar
- Hafner, R. (2009). Dateneffiziente selbstlernende neuronale Regler. PhD thesis, University of Osnabrueck. Google Scholar
- Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy. Google Scholar
- Jordan, M. I., & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In D. Touretzky (Ed.), Advances in neural information processing systems (NIPS) 2 (pp. 324–331). San Mateo: Morgan Kaufmann. Google Scholar
- Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. PhD thesis, Colorado State University, Fort Collins, CO. Google Scholar
- Krishnakumar, K., & Gundy-burlet, K. (2001). Intelligent control approaches for aircraft applications (Technical report). National Aeronautics and Space Administration, Ames Research. Google Scholar
- Ljung, L. (1999). System identification theory for the user (2nd ed.). Upper Saddle River: PTR Prentice Hall. Google Scholar
- Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International symposium on experimental robotics. Google Scholar
- Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE international conference on intelligent robotics systems (Iros 2006). Google Scholar
- Riedmiller, M. (2005). Neural fitted q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. of the European conference on machine learning, ECML 2005, Porto, Portugal. Google Scholar
- Riedmiller, M., Hafner, R., Lange, S., & Timmer, S. (2006). Clsquare—software framework for closed loop control. Available at http://ml.informatik.uni-freiburg.de/research/clsquare.
- Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007a). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer. Best Paper Award. Google Scholar
- Riedmiller, M., Peters, J., & Schaal, S. (2007b). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 07), Honolulu, USA. Google Scholar
- Schiffmann, W., Joost, M., & Werner, R. (1993). Comparison of optimized backpropagation algorithms. In Proc. of ESANN’93, Brussels (pp. 97–104). Google Scholar
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (adaptive computation and machine learning). Cambridge: MIT Press. Google Scholar
- Szepesvari, C. (2009). Successful application of rl. Available at http://www.ualberta.ca/szepesva/RESEARCH/RLApplications.html.
- Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136. Google Scholar
- Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., & White, S. R. (2004). A multi-agent systems approach to autonomic computing. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 464–471). Washington: IEEE Computer Society. Google Scholar
- Underwood, D. M., & Crawford, R. R. (1991). Dynamic nonlinear modeling of a hot-water-to-air heat exchanger for control applications. ASHRAE Transactions, 97(1), 149–155. Google Scholar
- Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, Cambridge University. Google Scholar
- Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94. Google Scholar
- Yang, Z.-J., & Minashima, M. (2001). Robust nonlinear control of a feedback linearizable voltage-controlled magnetic levitation system. Transactions of the Institute of Electrical Engeneers of Japan, 1203–1211. Google Scholar
- Yang, Z.-J., Tsubakihara, H., Kanae, S., & Wada, K. (2007). Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer. Transactions of IEE of Japan, 127-C(12), 2118–2125. Google Scholar