Learning with Reinforcement: the Role of Immediate Feedback and the Internal Model of the Situation
- 1 Downloads
Human behavior in conditions of partial indeterminacy of the outcome is characterized by correspondence between the frequency of actions and the probability that they will be reinforced. We investigated the role of reward and punishment probability signals in this phenomenon. A total of 29 adult subjects performed a task consisting of making a choice from two alternatives, where one stimulus of the pair was rewarded in 70% of cases and the other in 30%. Before appearance of a preference for the high-payoff stimulus, subjects showed a paradoxical susceptibility to rare, nonrepresentative reward and punishment signals. This points to the existence of an implicit assessment of the probability of reward on selection of each stimulus. Divergence of this result from the model provoked the subjects to change strategy. This mechanism may underlie the phenomenon of probability matching.
Keywordsprobabilistic learning reward and punishment prediction errors search behavior reaction time
Unable to display preview. Download preview PDF.
- Herrnstein, R. J., “The matching law,” in: Papers in Psychology and Economics, Harvard University Press (2000).Google Scholar
- Ivleva, N. Yu. and Ivlev, D. A., “The specific role of dopamine in the striatum in operant learning,” Zh. Vyssh. Nerv. Deyat., 64, No. 3, 251–254 (2014).Google Scholar
- Kozunova, G. L., “Learning in conditions of probabilistic reinforcement and its role in adaptive and maladaptive behavior in humans,” Sovremen. Zarubezh. Psikhol., 5, No. 4, 85–96 (2016).Google Scholar
- Simonov, P. V., “The needs-information theory of the emotions,” Vopr. Psikhol., 6, 44–56 (1982).Google Scholar