Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem

  • S. Bozinovski


The paper discusses important issues for reinforcement learning agents, the issue of delayed reinforcement learning (DRL). It points out that an early agent, the Crossbar Adaptive Array (CAA) architecture, not widely known in connectionist and reinforcement learning community, was the first to solve the DRL problem among connectionist agents. The work contributes toward understanding the initial neuron-like computational efforts to solve the DRL problem, giving a comparison between CAA and the well-known Actor/Critic (AC) architecture. It also points out relevant contemporary issues of autonomous agents, the issue of genetic/behavioral environment and the issue of emotion based learning architectures.


Reinforcement Learning Learning Control Learning Agent Negative Argument Genetic Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Keller, F., Schoenfeld, W.: Principles of Psychology. Appleton-Century-Croffts, 1950.Google Scholar
  2. [2]
    Bellman R.: Dynamic Programming. Princeton University Press 1957Google Scholar
  3. [3]
    Minsky, M.: Steps toward artificial intelligence, Proceedings of the IRE, pp. 8–30, 1961Google Scholar
  4. [4]
    Michie, D., Chambers, R.: BOXES: An experiment in adaptive control. In E. Dale, and D. Michie, eds. Machine Intelligence 2: pp. 137–152. Oliver and Boyd, 1968Google Scholar
  5. [5]
    Bozinovski S.: Inverted pendulum learning control. ANW Memo, December 10, COINS Department, University of Massachusetts, Amherst, 1981Google Scholar
  6. [6]
    Bozinovski S. A self-learning system using secondary reinforcement. Published Abstracts of the Sixth European Meeting on Cybernetics and Systems, Vienna, April 1982aGoogle Scholar
  7. [7]
    Bozinovski, S. A self-learning system using secondary reinforcement. In R. Trappl, ed. Cybernetics and Systems Research, pp. 397–402, North Holland. 1982bGoogle Scholar
  8. [8]
    Bozinovski, S.; Anderson C: Associative memory as a controller of an unstable system: Simulation of a learning control. In Proceedings of the IEEE Mediterranean Electrotechnical Conference, C5.11, Athens, Greece, 1983Google Scholar
  9. [9]
    Barto, A.; Sutton, R.; Anderson, C: Neuronlike elements that can solve difficult learning control problems. IEEE Trans. Systems, Man, and Cybernetics 13: pp. 834–846, 1983.Google Scholar
  10. [10]
    Rumelhart D., McClelland J., and the PDP group: Parallel Distributed Processing, MIT Press, 1986Google Scholar
  11. [11]
    Watkins, C: Learning from Delayed Rewards. Ph. D. Thesis, Kings College, Cambridge, England, 1989Google Scholar
  12. [12]
    Bozinovski, S.: Consequence Driven Systems, Go-cmar Press 1995Google Scholar
  13. [13]
    Barto A.: Reinforcement learning, In O. Omidvar and D. Elliot (Eds.) Neural Systems for Control, pp. 7–29, Academic Press 1997Google Scholar

Copyright information

© Springer-Verlag Wien 1999

Authors and Affiliations

  • S. Bozinovski
    • 1
  1. 1.Electrical Engineering FacultyUniversity of SkopjeSkopjeMacedonia

Personalised recommendations