Reinforcement Learning for Fuzzy Agents: Application to a Pighouse Environment Control

  • Lionel Jouffe
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 84)


Fuzzy Actor-Critic Learning (FACL) and Fuzzy Q-Learning (FQL) are reinforcement learning methods based on Dynamic Programming (DP) principles. In this chapter, they are used to tune on line the conclusion part of Fuzzy Inference Systems (FIS). The only information available for learning is the system feedback, which describes in terms of reward and punishment the task the fuzzy agent has to realize. At each time step, the agent receives a reinforcement signal according to the last action it has performed in the previous state. The problem involves optimizing not only the direct reinforcement, but also the total amount of reinforcements the agent can receive in the future. To illustrate the use of these two learning methods, we first applied them to a problem in which we have to find a fuzzy controller to drive a boat from one bank to another, across a river with a strong non-linear current. Then, we used the well-known Cart-Pole Balancing and Mountain-Car problems to be able to compare our methods to other reinforcement learning methods, and focus on important characteristic aspects of FACL and FQL. The experimental studies had shown the superiority of these methods with respect to the other related methods we can find in the literature. We also found that our generic methods allow us to learn every kind of reinforcement learning problem (continuous states, discrete/continuous actions, various type of reinforcement functions). Thanks to this flexibility, these learning methods have been applied successfully in an industrial problem, to discover a policy for pighouse environment control.


Learning Rate Reinforcement Learning Fuzzy Controller Fuzzy Inference System Discrete Action 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baird, L. and Klopf, A. (1993), “Reinforcement learning with high-dimensional, continuous actions,” Technical Report WL-TR-93–1147, Wright-Pattersson Air Force Base, Wright Laboratory, Ohio.Google Scholar
  2. 2.
    Barto, A. (1989), “Connectionist learning for control: An overview,” Technical Report 89–89, COINS, University of Massachusetts.Google Scholar
  3. 3.
    Barto, A. and Anderson, C. (1985), “Structural learning in connectionist systems,” Proceedings of the Seventh Annual Conference of the Cognitive Science Society.Google Scholar
  4. 4.
    Barto, A. and Sutton, R. (1981), “Goal seeking components for adaptive intelligence: An initial assessment,” Technical Report AFWAL-TR-81–1070, Air Force Wright Aeronautical Laboratories/Avionics Laboratories, Ohio.Google Scholar
  5. 5.
    Barto, A., Sutton, R., and Anderson, C. (1983), “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-13, no. 5, pp. 834–846.CrossRefGoogle Scholar
  6. 6.
    Barto, A., Sutton, R., and Watkins, C. (1990), Learning and Computational Neuroscience: Foundations of Adaptive Networks, chapter Learning and Sequential Decision Making, MIT Press, Cambridge, pp. 539–602.Google Scholar
  7. 7.
    Bellman, R. (1957), Dynamic Programming, Princeton University Press, Princeton, NJ.MATHGoogle Scholar
  8. 8.
    Berenji, H. (1991), “Refinement of approximate reasoning-based controllers by reinforcement learning,” Proceedings of the Eighth International Workshop on Machine Learning, pp. 475–479.Google Scholar
  9. 9.
    Berenji, H., Chen, Y., Lee, C., Jang, J., and Murugesan, S. (1990), “A hierarchical approach to designing approximate reasoning-based controllers for dynamic physical systems,” Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pp. 362–369.Google Scholar
  10. 10.
    Berenji, H. and Khedkar, P. (1992), “Learning and tuning fuzzy logic controllers through reinforcements,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 724–740.CrossRefGoogle Scholar
  11. 11.
    Bertsekas, D. (1987), Dynamic Programming: Deterministic and Stochastic Models, Prentice Hall, Englewood Cliffs, NJ.MATHGoogle Scholar
  12. 12.
    Boyan, J. and Moore, A. (1995), “Generalization in reinforcement learning: Safely approximating the value function,” Advances in Neural Information Processing Systems, Tesauro, G., Touretzky, D., and Leen, T., Eds, Cambridge M.A, MIT Press, vol. 7.Google Scholar
  13. 13.
    Carse, B. and Fogarty, T. (1994), Parallel Problem Solving from Nature, chapter A Fuzzy Classifier System using the Pittsburg Approach, pp. 260–269, Springer-Verlag.Google Scholar
  14. 14.
    Cichosz, P. (1995), “Truncating temporal differences: On the efficient implementation of TD(a) for reinforcement learning,” Journal of Artificial Intelligence Research, vol. 2, pp. 287–318.Google Scholar
  15. 15.
    Cordon, O. and Herrera, F. (1996), Genetic Algorithms and Soft Computing, chapter A Hybrid Genetic Algorithm-Evolution Strategy Process for Learning Fuzzy Logic Controller Knowledge Bases, pp. 251–278, Physica-Verlag.Google Scholar
  16. 16.
    Dayan, P. and Sejnowshi, T. (1993), “TD(A) converges with probability 1,” Machine Learning, vol. 14, pp. 295–301.Google Scholar
  17. 17.
    Dutertre, C., Jouffe, L., Vaudelet, J., and Rousseau, P. (1997), “Incidence du réglage de la ventilation sur les paramètres d’ambiance d’une porcherie d’engraissement. Résultats issus de la modlisation thermique d’un bâtiment d’élevage,” Techni-Porc, 20.1.97, pp. 13–24.Google Scholar
  18. 18.
    Glorennec, P. (1994), “Fuzzy Q-learning and dynamical Fuzzy Q-learning,” Proceedings of FUZZ-IEEE’94, Third International Conference on Fuzzy Systems, Orlando, vol. 1, pp. 474–479.Google Scholar
  19. 19.
    Glorennec, P. (1994), “Fuzzy Q-learning and evolutionary strategy for adaptive fuzzy control,” Proceedings of EUFIT’94, Second European Congress on Intelligent Techniques and Soft Computing, Aachen, Germany, vol. 1, pp. 35–40.Google Scholar
  20. 20.
    Glorennec, P. (1996), Genetic Algorithms and Soft Computing, chapter Constrained Optimization of FIS using an Evolutionary Method, pp. 349–368. Physica-Verlag.Google Scholar
  21. 21.
    Glorennec, P. and Jouffe, L. (1996), “A reinforcement learning method for an autonomous robot,” Proceedings of EUFIT’96, Fourth European Congress on Intelligent Techniques and Soft Computing, Aachen, Germany, pp. 1100–1104.Google Scholar
  22. 22.
    Gullapalli, V. (1990), “A stochastic reinforcement learning algorithm for learning real-valued functions,” Neural Networks, vol. 3, pp. 671–692.CrossRefGoogle Scholar
  23. 23.
    Gullapalli, V. (1993), “Robust control under extreme uncertainty,” Advances in Neural Information Processing Systems, Giles, C. L., Hanson, S. J., and Cowan, J. D., Eds, San Mateo, Morgan Kauffmann, vol. 5, pp. 327–334.Google Scholar
  24. 24.
    Horikawa, S., Furuhashi, T., Okuma, S., and Uchikawa, Y. (1990), “A fuzzy controller using a neural network and its capability to learn expert’s control rules,” Proceedings of the International Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan, pp. 103–106.Google Scholar
  25. 25.
    Jaakkola, T., Jordan, M., and Singh, S. (1994), “On the convergence of stochastic iterative dynamic programming algorithms,” Neural Computation, vol. 6, no. 6, pp. 1185–1201.MATHCrossRefGoogle Scholar
  26. 26.
    Jacobs, R. (1988), “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295–307.CrossRefGoogle Scholar
  27. 27.
    Jordan, M. (1992), “Forward models: Supervised learning with a distal teacher,” Cognitive Science, vol. 16, pp. 307–354.CrossRefGoogle Scholar
  28. 28.
    Jouffe, L. (1996), “Actor-critic learning based on fuzzy inference system,” Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Beijing, China, vol. 1, pp. 339–344.Google Scholar
  29. 29.
    Jouffe, L. (1997), Fuzzy Inference Systems Learning by Reinforcement Methods: Application to a Pig House Atmosphere Control, PhD thesis, University of Rennes I (in French).Google Scholar
  30. 30.
    Jouffe, L. (1997), “Ventilation control learning with FACL,” Proceedings of FUZZ-IEEE’97, Sixth International Conference on Fuzzy Systems, Barcelona, Spain, pp. 1719–1724.CrossRefGoogle Scholar
  31. 31.
    Jouffe, L. and Glorennec, P. (1996), “Comparison between connectionist and fuzzy Q-learning,” Proceedings of IIZUKA’96, Fourth International Conference on Soft Computing, Iizuka, Fukuoka, Japan, vol. 2, pp. 557–560.Google Scholar
  32. 32.
    Kaelbling, L. (1994), “Associative reinforcement learning: Functions in kDNF,” Machine Learning, vol. 15, no. 3, pp. 279–298.MATHGoogle Scholar
  33. 33.
    Kaelbling, L., Littman, M., and Moore, A. (1996), “Reinforcement learning: A survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237–285.Google Scholar
  34. 34.
    Karr, C. (1991), “Applying genetic algorithms to fuzzy logics,” AI Expert, vol. 6, pp. 38–43.MathSciNetGoogle Scholar
  35. 35.
    Kesten, H. (1958), “Accelerated stochastic approximation,” Annals of Mathematical Statistics, vol. 29, pp. 41–59.MathSciNetMATHCrossRefGoogle Scholar
  36. 36.
    Klopf, A. (1972), `Brain function and adaptive systems - a heterostatic theory,“ Technical Report AFCRL-72–0164, Air Force Cambridge Research Laboratories, Bedford, MA.Google Scholar
  37. 37.
    Lee, C. (1990), “Fuzzy logic in control systems: Fuzzy logic controller — Part I,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 20, no. 2, pp. 404–418.MATHCrossRefGoogle Scholar
  38. 38.
    Lee, C. (1990), “Fuzzy logic in control systems: Fuzzy logic controller — Part II,” IEEE Transactions on Systems, Man and Cybernetics, vol. 20, no. 2, pp. 419–435.MATHCrossRefGoogle Scholar
  39. 39.
    Lee, C. (1991), “A self learning rule-based controller employing approximate reasoning and neural net concepts,” International Journal of Intelligent Systems, vol. 6, pp. 71–93.CrossRefGoogle Scholar
  40. 40.
    Lin, C. and Lee, C. (1991), “Neural-network-based fuzzy logic control and decision system,” IEEE Transactions on Computers, vol. 40, no. 12, pp. 1320–1336.MathSciNetCrossRefGoogle Scholar
  41. 41.
    Lin, L. (1991), “Self-improvement based on reinforcement learning, planning and teaching,” Proceedings of the Eighth International Workshop on Machine Learning, pp. 323–327.Google Scholar
  42. 42.
    Lin, L. (1992), “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, no. 3, pp. 293–321.Google Scholar
  43. 43.
    Michie, D. and Chambers, R. (1968), “”boxes“: An experiment in adaptive control,” Machine Intelligence, vol. 2, pp. 137–152.MATHGoogle Scholar
  44. 44.
    Mitchell, T. and Thrun, S. (1993), “Explanation-based neural network learning for robot control,” Advances in Neural Information Processing Systems, vol. 5, San Mateo. Morgan KaufmannGoogle Scholar
  45. 45.
    Moore, A. (1991), “Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces,” Proceedings of the Eighth International Conference on Machine Learning, Birnbaum, L. and Collins, G., Eds, pp. 333–337.Google Scholar
  46. 46.
    Nomura, H., Hayashi, I., and Wakami, N. (1991), “A self-tuning method of fuzzy control by descent method,” Proceedings of IFSA’91, First International Fuzzy Systems Association World Congress, vol. Engineering, pp. 155–158.Google Scholar
  47. 47.
    Parodi, A. and Bonelli, P. (1993), “A new approach to fuzzy classifier system,” Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo, CA, pp. 223–230.Google Scholar
  48. 48.
    Peng, J. and Williams, R. (1994), “Incremental multi-step Q-learning,” Proceedings of the Eleventh International Conference on Machine Learning, Rutgers University In New Brunswick, pp. 226–232.Google Scholar
  49. 49.
    Piché, S. (1994), “Steepest descent algorithms for neural networks controllers and filters,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 198–212.CrossRefGoogle Scholar
  50. 50.
    Puterman, M. (1994), Markov Decision Processes — Discrete Stochastic Dynamic Programming, John Wiley and Sons, New York, NY.MATHCrossRefGoogle Scholar
  51. 51.
    Rummery, G. and Niranjan, M. (1994), “On-line Q-learning using connectionist systems,” Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England.Google Scholar
  52. 52.
    Saridis, G. (1970), “Learning applied to successive approximation algorithms,” IEEE Transactions on Systems Science and Cybernetics, vol. SSC-6, pp. 97–103.CrossRefGoogle Scholar
  53. 53.
    Singh, S. and Sutton, R. (1996), “Reinforcement learning with replacing eligibility traces,” Machine Learning, vol. 22, no. 1, pp. 123–158.MATHGoogle Scholar
  54. 54.
    Skinner, B. (1953), Science and Human Behavior, Macmillan, New York.Google Scholar
  55. 55.
    Sutton, R. (1984), Temporal Credit Assignment in Reinforcement Learning, PhD thesis, University of Massachusetts, Amherst, MA.Google Scholar
  56. 56.
    Sutton, R. (1988), “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, pp. 9–44.Google Scholar
  57. 57.
    Sutton, R. (1990), “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” Proceedings of the Seventh International Conference on Machine Learning, San Mateo, pp. 216–224.Google Scholar
  58. 58.
    Sutton, R. (1996), “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” Advances in Neural Information Processing Systems, Tesauro, G., Tourezky, D., and Leen, T., Eds, Cambridge M.A, MIT Press, vol. 8, pp. 1038–1044.Google Scholar
  59. 59.
    Takagi, T. and Sugeno, M. (1985), “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, no. 1, pp. 116–132.CrossRefGoogle Scholar
  60. 60.
    Tesauro, G. (1992), “Practical issues in temporal difference learning,” Machine Learning, vol. 8, pp. 257–277.MATHGoogle Scholar
  61. 61.
    Thrun, S. (1992), “Efficient exploration in reinforcement learning,” Technical Report CMU-CS-92–102, Carnegie-Mellon University, Pittsburgh.Google Scholar
  62. 62.
    Thrun, S. and Möller, K. (1992), “Active exploration in dynamic environments,” Advances in Neural Information Processing Systems, vol. 4.Google Scholar
  63. 63.
    TrnSys (1983), A Transient System Simulation Program, University of Wisconsin, Madison, USA.Google Scholar
  64. 64.
    Watkins, C. (1989), Learning from Delayed Rewards, PhD thesis, Cambridge University, Cambridge, England.Google Scholar
  65. 65.
    Widrow, B. and Hoff, M. (1960), “Adaptive switching circuit,” 1960 Wescon Convention Record; Part IV, pp. 96–104. Reprinted in J.A. Anderson and E. Rosenfeld, Neuro-Computing: Foundations of Research, MIT Press, Cambridge, MA, 1988.Google Scholar
  66. 66.
    Williams, R. and Baird, L. (1993), “Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems,” Technical Report NU-CCS-93–11, Northeastern University, College of Computer Science.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Lionel Jouffe
    • 1
  1. 1.Centre de Recherche du Groupe ESIEAParc Universitaire Laval-ChangéLavalFrance

Personalised recommendations