Neuroscience and Behavioral Physiology

, Volume 49, Issue 9, pp 1150–1158 | Cite as

Learning with Reinforcement: the Role of Immediate Feedback and the Internal Model of the Situation

  • G. L. KozunovaEmail author
  • N. A. Voronin
  • V. V. Venediktov
  • T. A. Stroganova

Human behavior in conditions of partial indeterminacy of the outcome is characterized by correspondence between the frequency of actions and the probability that they will be reinforced. We investigated the role of reward and punishment probability signals in this phenomenon. A total of 29 adult subjects performed a task consisting of making a choice from two alternatives, where one stimulus of the pair was rewarded in 70% of cases and the other in 30%. Before appearance of a preference for the high-payoff stimulus, subjects showed a paradoxical susceptibility to rare, nonrepresentative reward and punishment signals. This points to the existence of an implicit assessment of the probability of reward on selection of each stimulus. Divergence of this result from the model provoked the subjects to change strategy. This mechanism may underlie the phenomenon of probability matching.


probabilistic learning reward and punishment prediction errors search behavior reaction time 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bechara, A. and Damasio, A. R., “The somatic marker hypothesis: A neural theory of economic decision,” Games Econ. Behav., 52, No. 2, 336–372 (2005).CrossRefGoogle Scholar
  2. Bechara, A., Tranel, D., and Damasio, H., “Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions,” Brain, 123, No. 11, 2189–2202 (2000).CrossRefGoogle Scholar
  3. Bereby-Meyer, Y. and Erev, I., “On learning to become a successful loser: A comparison of alternative abstractions of learning processes in the loss domain,” J. Math. Psychol., 42, No. 2, 266–286 (1998).CrossRefGoogle Scholar
  4. Bitterman, M. E., “The comparative analysis of learning,” Science, 188, No. 4189, 699–709 (1975).CrossRefGoogle Scholar
  5. Douglas, R. J. and Pribram, K. H., “Learning and limbic lesions,” Neuropsychologia, 4, No. 3, 197–220 (1966).CrossRefGoogle Scholar
  6. Frank, M. J., Seeberger, L. C., and O’Reilly, R. C., “By carrot or by stick: cognitive reinforcement learning in parkinsonism,” Science, 306, No. 5703, 1940–1943 (2004).CrossRefGoogle Scholar
  7. Friston, K. J., Daunizeau, J., Kilner, J., and Kiebel, S. J., “Action and behavior: a free-energy formulation,” Biol. Cybern., 102, No. 3, 227–260 (2010).CrossRefGoogle Scholar
  8. Gaissmaier, W., Wilke, A., Scheibehenne, B., et al., “Betting on illusory patterns: Probability matching in habitual gamblers,” J. Gambl. Stud., 32, 143–156 (2016).CrossRefGoogle Scholar
  9. Herrnstein, R. J., “The matching law,” in: Papers in Psychology and Economics, Harvard University Press (2000).Google Scholar
  10. Ivleva, N. Yu. and Ivlev, D. A., “The specific role of dopamine in the striatum in operant learning,” Zh. Vyssh. Nerv. Deyat., 64, No. 3, 251–254 (2014).Google Scholar
  11. Kasanova, Z., Waltz, J. A., Strauss, G. P., et al., “Optimizing vs. matching: response strategy in a probabilistic learning task is associated with negative symptoms of schizophrenia,” Schizophr. Res., 127, No. 1, 215–222 (2011).CrossRefGoogle Scholar
  12. Koehler, D. J. and James, G., “Probability matching in choice under uncertainty: Intuition versus deliberation,” Cognition, 113, No. 1, 123–127 (2009).CrossRefGoogle Scholar
  13. Kozunova, G. L., “Learning in conditions of probabilistic reinforcement and its role in adaptive and maladaptive behavior in humans,” Sovremen. Zarubezh. Psikhol., 5, No. 4, 85–96 (2016).Google Scholar
  14. Schultz, W. and Dickinson, A., “Neuronal coding of prediction errors,” Annu. Rev. Neurosci., 23, No. 1, 473–500 (2000).CrossRefGoogle Scholar
  15. Shanks, D. R., Tunney, R. J., and McCarthy, J. D., “A re-examination of probability matching and rational choice,” J. Behav. Decis. Making, 15, No. 3, 233–250 (2002).CrossRefGoogle Scholar
  16. Simonov, P. V., “The needs-information theory of the emotions,” Vopr. Psikhol., 6, 44–56 (1982).Google Scholar
  17. Talmi, D., Fuentemilla, L., Litvak, V., et al., “An MEG signature corresponding to an axiomatic model of reward prediction error,” Neuroimage, 59, No. 1, 635–645 (2012).CrossRefGoogle Scholar
  18. Unturbe, J. and Corominas, J., “Probability matching involves rule-generating ability: A neuropsychological mechanism dealing with probabilities,” Neuropsychology, 21, No. 5, 621–630 (2007).CrossRefGoogle Scholar
  19. West, R. F. and Stanovich, K. E., “Is probability matching smart? Associations between probabilistic choices and cognitive ability,” Mem. Cognit., 31, No. 2, 243–251 (2003).CrossRefGoogle Scholar
  20. Wolford, G., Newman, S. E., Miller, M. B., and Wig, G. S., “Searching for patterns in random sequences,” Can. J. Exp. Psychol., 58, No. 4, 221–228 (2004).CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • G. L. Kozunova
    • 1
    Email author
  • N. A. Voronin
    • 1
  • V. V. Venediktov
    • 1
  • T. A. Stroganova
    • 1
  1. 1.Center for Neurocognitive Research (MEG-Center)Moscow State University of Psychology and EducationMoscowRussia

Personalised recommendations