Abstract
A class of learning tasks is described that combines aspects of learning automaton tasks and supervised learning pattern-classification tasks. We call these associative reinforcement learning tasks. An algorithm is presented, called the associative reward-penalty, or A R−P , algorithm, for which a form of optimal performance has been proved. This algorithm simultaneously generalizes a class of stochastic learning automata and a class of supervised learning pattern-classification methods. Simulation results are presented that illustrate the associative reinforcement learning task and the performance of the the A R−P algorithm. Additional simulation results are presented showing how cooperative activity in networks of interconnected A R−P automata can olve difficult nonlinear associative learning problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K.S. Narendra and M.A.L. Thathachar, “Learning Automata—A Survey,” IEEE Trans. Syst., Man, Cybern., vol. 4, pp. 323–334, 1974.
K.S. Narendra and S. Lakshmivarahan, “Learning Automata—A Critique,” J. Cybern. and Inf. Sci., vol. 1, pp. 53–65, 1977.
P. Mars, K.S. Narendra, and M. Crystall, “Learning Automata Control of Computer Communication Networks,” Proc. Third Yale Workshop on Applications of Adaptive Systems Theory, 1983.
L.G. Mason, “Learning Automata and Telecommunications Switching,” Proc. Third Yale Workshop on Applications of Adaptive Systems Theory, 1983.
R.M. Wheeler and K.S. Narendra, “Models for Decentralized Decisionmaking,” Report No. 8403, Electrical Engineering, Yale University, 1984.
R.A. Jarvis, “Teaching a Stochastic Automaton to Skillfully Play Hand/Eye Games,” J. of Cybern. and Inf. Sci., vol. 1, pp. 161–177, 1977.
S. Lakshmivarahan, Learning Algorithms and Applications Springer-Verlag, New York, 1981.
I.H. Witten, “An Adaptive Optimal Controller for Discrete-time Markov Environments,” Inf. and Contr., vol. 34, pp. 286–295, 1977.
A.G. Barto and P. Anandan, “Pattern Recognizing Stochastic Learning Automata,” IEEE Trans. on Syst., Man, Cybern., vol. 15, pp. 360–375, 1985.
R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis Wiley, New York, 1973.
M.A.L. Thathachar and K.R. Ramakrishnan, “An Automaton Model of a Hierarchical Learning System,” Proc. 8th Triennial World Congress, IFAC Control Science and Technology, Kyoto, Japan, pp. 1065–1070, 1981.
A.G. Barto, Editor. “Simulation Experiments with Goal-seeking Adaptive Elements,” Air Force Wright Aeronautical Laboratories/Avionics Laboratory Technical Report AFWAL-TR-84–1022, Wright-Patterson AFB, Ohio, 1984.
A.G. Barto, C.W. Anderson, and R.S. Sutton, “Synthesis of Nonlinear Control Surfaces by a Layered Associative Search Network,” Biol. Cybern., vol. 43, pp. 175–185, 1982.
A.G. Barto and R.S. Sutton, “Landmark Learning: An Illustration of Associative Search,” Biol. Cybern., vol. 42, pp. 1–8, 1981.
A.G. Barto, R.S. Sutton, and C.W. Anderson, “Neuronlike Elements That Can Solve Difficult Learning Control Problems,” IEEE Trans. on Syst., Man, Cybern., vol. SMC13, pp. 834–846, 1983.
A.G. Barto, R.S. Sutton, and P.S. Brouwer, “Associative Search Network: A Reinforcement Learning Associative Memory,” Biol. Cybern., vol. 40, pp 201–211, 1981.
R.S. Sutton and A.G. Barto, “Toward a Modern Theory of Adaptive Networks: Expectation and Prediction,” Psych. Rev., vol. 88, pp. 135–171, 1981.
J.A. Feldman (Ed.), Special Issue on Connectionist Models and Their Applications, Cognitive Science, vol. 9, 1985.
G. Hinton and J. Anderson, Parallel Models of Associative Memory Erlbaum, Hilsdale, N. J., 1981.
T. Kohonen, Associative Memory: A System Theoretic Approach Springer, Berlin, 1977.
A.H. Klopf, The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence Hemisphere, Washington, D.C., 1982.
D.H. Ackley, G.E. Hinton, and T.J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” Cognitive Science, vol. 9, pp. 147–169, 1985.
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Internal Representations by Error Propagation,” ICS Report 8506, Institute for Cognitive Science, University of California, San Diego, 1985.
B. Widrow and M.E. Hoff, “Adaptive Switching Circuits,” 1960 WESCON Convention Record Part IV, pp. 96–104, 1960.
R.L. Kasyap, C.C. Blaydon, and K.S. Fu, “Stochastic Approximation,” in Adaptation, Learning and Pattern Recognition Systems: Theory and Applications J.M. Mendel and K.S. Fu, Eds. Academic Press, New York, 1970.
B. Widrow, N.K.. Gupta, and S. Maitra, “Punish/Reward: Learning with a Critic in Adaptive Threshold Systems,” IEEE Trans. on Syst., Man, Cybern., vol. 5, pp. 455465, 1973.
S. Lakshmivarahan, “e-optimal Learning Algorithms—Non-absorbing Barrier Type,” Technical Report EECS 7901, School of Electrical Engineering and Computer Sciences, University of Oklahoma, Norman, Oklahoma, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1986 Springer Science+Business Media New York
About this chapter
Cite this chapter
Barto, A.G., Anandan, P., Anderson, C.W. (1986). Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata. In: Narendra, K.S. (eds) Adaptive and Learning Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-1895-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-4757-1895-9_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4757-1897-3
Online ISBN: 978-1-4757-1895-9
eBook Packages: Springer Book Archive