Machine Learning

, Volume 84, Issue 1–2, pp 137–169 | Cite as

Reinforcement learning in feedback control

Challenges and benchmarks from technical process control


Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.

This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.

A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.


Reinforcement learning Feedback control Benchmarks Nonlinear control 


  1. Anderson, C., & Miller, W. (1990). Challenging control problems. In Neural networks for control (pp. 475–410). Google Scholar
  2. Anderson, C. W., Hittle, D., Katz, A., & Kretchmar, R. M. (1997). Synthesis of reinforcement learning, neural networks, and pi control applied to a simulated heating coil. Journal of Artificial Intelligence in Engineering, 11(4), 423–431. Google Scholar
  3. Bellman, R. (1957). Dynamic programming. Princeton: Princeton Univ Press. MATHGoogle Scholar
  4. Boyan, J., & Littman, M. (1994). Packet routing in dynamically changing networks—a reinforcement learning approach. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6. Google Scholar
  5. Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In: Andvances in neural information processing systems 8. Google Scholar
  6. CTM (1996). Digital Control Tutorial. University of Michigan, (online).
  7. Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian process dynamic programming. Neurocomputing, 72(7–9), 1508–1524. CrossRefGoogle Scholar
  8. Dullerud, G. P. F. (2000). A course in robust control theory: A convex approach. New York: Springer. Google Scholar
  9. El-Fakdi, A., & Carreras, M. (2008). Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In International conference on intelligent robots and systems, 2008. IROS 2008. IEEE/RSJ (pp. 3635–3640). CrossRefGoogle Scholar
  10. Farrel, J. A., & Polycarpou, M. M. (2006). Adaptive approximation based control. New York: Wiley Interscience. CrossRefGoogle Scholar
  11. Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intellifent Computing, 24(4). Google Scholar
  12. Goodwin, G. C., & Payne, R. L. (1977). Dynamic system identification: experiment design and data analysis. New York: Academic Press. MATHGoogle Scholar
  13. Hafner, R. (2009). Dateneffiziente selbstlernende neuronale Regler. PhD thesis, University of Osnabrueck. Google Scholar
  14. Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy. Google Scholar
  15. Jordan, M. I., & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In D. Touretzky (Ed.), Advances in neural information processing systems (NIPS) 2 (pp. 324–331). San Mateo: Morgan Kaufmann. Google Scholar
  16. Kaloust, J., Ham, C., & Qu, Z. (1997). Nonlinear autopilot control design for a 2-dof helicopter model. IEE Proceedings. Control Theory and Applications, 144(6), 612–616. MATHCrossRefGoogle Scholar
  17. Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. PhD thesis, Colorado State University, Fort Collins, CO. Google Scholar
  18. Krishnakumar, K., & Gundy-burlet, K. (2001). Intelligent control approaches for aircraft applications (Technical report). National Aeronautics and Space Administration, Ames Research. Google Scholar
  19. Kwan, C., Lewis, F., & Kim, Y. (1999). Robust neural network control of rigid link flexible-joint robots. Asian Journal of Control, 1(3), 188–197. CrossRefGoogle Scholar
  20. Liu, D., Javaherian, H., Kovalenko, O., & Huang, T. (2008). Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 38(4), 988–993. CrossRefGoogle Scholar
  21. Ljung, L. (1999). System identification theory for the user (2nd ed.). Upper Saddle River: PTR Prentice Hall. Google Scholar
  22. Martinez, J. J., Sename, O., & Voda, A. (2009). Modeling and robust control of blu-ray disc servo-mechanisms. Mechatronics, 19(5), 715–725. CrossRefGoogle Scholar
  23. Nelles, O. (2001). Nonlinear system identification. Berlin: Springer. MATHGoogle Scholar
  24. Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International symposium on experimental robotics. Google Scholar
  25. Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE international conference on intelligent robotics systems (Iros 2006). Google Scholar
  26. Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007. CrossRefGoogle Scholar
  27. Riedmiller, M. (2005). Neural fitted q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. of the European conference on machine learning, ECML 2005, Porto, Portugal. Google Scholar
  28. Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN), San Francisco (pp. 586–591). CrossRefGoogle Scholar
  29. Riedmiller, M., Hafner, R., Lange, S., & Timmer, S. (2006). Clsquare—software framework for closed loop control. Available at
  30. Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007a). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer. Best Paper Award. Google Scholar
  31. Riedmiller, M., Peters, J., & Schaal, S. (2007b). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 07), Honolulu, USA. Google Scholar
  32. Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–74. CrossRefGoogle Scholar
  33. Schiffmann, W., Joost, M., & Werner, R. (1993). Comparison of optimized backpropagation algorithms. In Proc. of ESANN’93, Brussels (pp. 97–104). Google Scholar
  34. Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Deylon, B., Glorennec, Y. P., Hjalmarsson, H., & Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, 1691–1724. MATHCrossRefGoogle Scholar
  35. Slotine, J. E., & Li, W. (1991). Applied nonlinear control. New York: Prentice Hall. MATHGoogle Scholar
  36. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (adaptive computation and machine learning). Cambridge: MIT Press. Google Scholar
  37. Szepesvari, C. (2009). Successful application of rl. Available at
  38. Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136. Google Scholar
  39. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277. MATHGoogle Scholar
  40. Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., & White, S. R. (2004). A multi-agent systems approach to autonomic computing. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 464–471). Washington: IEEE Computer Society. Google Scholar
  41. Underwood, D. M., & Crawford, R. R. (1991). Dynamic nonlinear modeling of a hot-water-to-air heat exchanger for control applications. ASHRAE Transactions, 97(1), 149–155. Google Scholar
  42. Wang, Y., & Si, J. (2001). On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12(2), 264–276. MathSciNetCrossRefGoogle Scholar
  43. Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, Cambridge University. Google Scholar
  44. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292. MATHGoogle Scholar
  45. Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94. Google Scholar
  46. Yang, Z.-J., & Minashima, M. (2001). Robust nonlinear control of a feedback linearizable voltage-controlled magnetic levitation system. Transactions of the Institute of Electrical Engeneers of Japan, 1203–1211. Google Scholar
  47. Yang, Z.-J., & Tateishi, M. (2001). Adaptive robust nonlinear control of a magnetic levitation system. Automatica, 37(7), 1125–1131. MATHCrossRefGoogle Scholar
  48. Yang, Z.-J., Tsubakihara, H., Kanae, S., & Wada, K. (2007). Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer. Transactions of IEE of Japan, 127-C(12), 2118–2125. Google Scholar
  49. Yang, Z.-J., Kunitoshi, K., Kanae, S., & Wada, K. (2008). Adaptive robust output feedback control of a magnetic levitation system by k-filter approach. IEEE Transactions on Industrial Electronics, 55(1), 390–399. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Machine Learning LabAlbert-Ludwigs University FreiburgFreiburg im BreisgauGermany

Personalised recommendations