Psychological and Neuroscientific Connections with Reinforcement Learning

Shah, Ashvin

doi:10.1007/978-3-642-27645-3_16

Psychological and Neuroscientific Connections with Reinforcement Learning

Ashvin Shah³

Chapter

29k Accesses
3 Citations
1 Altmetric

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Abstract

The field of Reinforcement Learning (RL) was inspired in large part by research in animal behavior and psychology. Early research showed that animals can, through trial and error, learn to execute behavior that would eventually lead to some (presumably satisfactory) outcome, and decades of subsequent research was (and is still) aimed at discovering the mechanisms of this learning process. This chapter describes behavioral and theoretical research in animal learning that is directly related to fundamental concepts used in RL. It then describes neuroscientific research that suggests that animals and many RL algorithms use very similar learning mechanisms. Along the way, I highlight ways that research in computer science contributes to and can be inspired by research in psychology and neuroscience.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aldridge, J.W., Berridge, K.C.: Coding of serial order by neostriatal neurons: a “natural action” approach to movement sequence. The Journal of Neuroscience 18, 2777–2787 (1998)
Google Scholar
Alexander, G.E., DeLong, M.R., Strick, P.L.: Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience 9, 357–381 (1986)
Google Scholar
Ashby, F.G., Ennis, J., Spiering, B.: A neurobiological theory of automaticity in perceptual categorization. Psychological Review 114, 632–656 (2007)
Google Scholar
Ashby, F.G., Turner, B.O., Horvitz, J.C.: Cortical and basal ganglia contributions to habit learning and automaticity. Trends in Cognitive Sciences 14, 208–215 (2010)
Google Scholar
Atallah, H.E., Lopez-Paniagua, D., Rudy, J.W., O’Reilly, R.C.: Separate neural substrates for skill learning and performance in ventral and dorsal striatum. Nature Neuroscience 10, 126–131 (2007)
Google Scholar
Balleine, B.W., O’Dohrety, J.P.: Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010)
Google Scholar
Balleine, B.W., Delgado, M.R., Hikosaka, O.: The role of the dorsal striatum in reward and decision-making. The Journal of Neuroscience 27, 8161–8165 (2007)
Google Scholar
Balleine, B.W., Liljeholm, M., Ostlund, S.B.: The integrative function of the basal ganglia in instrumental conditioning. Behavioural Brain Research 199, 43–52 (2009)
Google Scholar
Bar-Gad, I., Morris, G., Bergman, H.: Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia. Progress in Neurobiology 71, 439–473 (2003)
Google Scholar
Barnes, T.D., Kubota, Y., Hu, D., Jin, D.Z., Graybiel, A.M.: Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature 437, 1158–1161 (2005)
Google Scholar
Barto, A.G.: Learning by statistical cooperation of self-interested neuron-like computing elements. Human Neurobiology 4, 229–256 (1985)
Google Scholar
Barto, A.G.: Adaptive critics and the basal ganglia. In: Houk, J.C., Davis, J.L., Beiser, D.G. (eds.) Models of Information Processing in the Basal Ganglia, ch. 11, pp. 215–232. MIT Press, Cambridge (1995)
Google Scholar
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13, 341–379 (2003)
MathSciNet Google Scholar
Barto, A.G., Sutton, R.S.: Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element. Behavioral Brain Research 4, 221–235 (1982)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernectics 13, 835–846 (1983)
Google Scholar
Bayer, H.M., Glimcher, P.W.: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005)
Google Scholar
Belin, D., Jonkman, S., Dickinson, A., Robbins, T.W., Everitt, B.J.: Parallel and interactive learning processes within the basal ganglia: relevance for the understanding of addiction. Behavioural Brain Research 199, 89–102 (2009)
Google Scholar
Berridge, K.C.: The debate over dopamine’s role in reward: The case for incentive salience. Psychopharmacology 191, 391–431 (2007)
Google Scholar
Berridge, K.C., Robinson, T.E.: What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience? Brain Research Reviews 28, 309–369 (1998)
Google Scholar
Berridge, K.C., Robinson, T.E., Aldridge, J.W.: Dissecting components of reward: ’Liking,’ ’wanting,’ and learning. Current Opinion in Pharamacology 9, 65–73 (2009)
Google Scholar
Björklund, A., Dunnett, S.B.: Dopamine neuron systems in the brain: an update. Trends in Neurosciences 30, 194–202 (2007)
Google Scholar
Bogacz, R., Gurney, K.: The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Computation 19, 442–477 (2007)
MathSciNet Google Scholar
Botvinick, M.M., Niv, Y., Barto, A.G.: Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective. Cognition 113, 262–280 (2009)
Google Scholar
Brandon, S.E., Vogel, E.G., Wagner, A.R.: Computational theories of classical conditioning. In: Moore, J.W. (ed.) A Neuroscientist’s Guide to Classical Conditioning, ch. 7, pp. 232–310. Springer, New York (2002)
Google Scholar
Bromberg-Martin, E.S., Matsumoto, M., Hikosaka, O.: Dopamine in motivational control: Rewarding, aversive, and alerting. Neuron 68, 815–834 (2010)
Google Scholar
Brown, P.L., Jenkins, H.M.: Auto-shaping of the pigeon’s key-peck. Journal of the Experimental Analysis of Behavior 11, 1–8 (1968)
Google Scholar
Calabresi, P., Picconi, B., Tozzi, A., DiFilippo, M.: Dopamine-mediated regulation of corticostriatal synaptic plasticity. Trends in Neuroscience 30, 211–219 (2007)
Google Scholar
Cannon, C.M., Palmiter, R.D.: Reward without dopamine. Journal of Neuroscience 23, 10,827–10,831 (2003)
Google Scholar
Cardinal, R.N., Parkinson, J.A., Hall, J., Everitt, B.J.: Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioural Reviews 26, 321–352 (2002)
Google Scholar
Cohen, M.X.: Neurocomputational mechanisms of reinforcement-guided learning in humans: a review. Cognitive, Affective, and Behavioral Neuroscience 8, 113–125 (2008)
Google Scholar
Cohen, M.X., Frank, M.J.: Neurocomputational models of the basal ganglia in learning, memory, and choice. Behavioural Brain Research 199, 141–156 (2009)
Google Scholar
Corrado, G., Doya, K.: Understanding neural coding through the model-based analysis of decision-making. The Journal of Neuroscience 27, 8178–8180 (2007)
Google Scholar
Daw, N.D., Doya, K.: The computational neurobiology of learning and reward. Current Opinion in Neurobiology 16, 199–204 (2006)
Google Scholar
Daw, N.D., Touretzky, D.S.: Long-term reward prediction in TD models of the dopamine system. Neural Computation 14, 2567–2583 (2002)
Google Scholar
Daw, N.D., Kakade, S., Dayan, P.: Opponent interactions between serotonin and dopamine. Neural Networks 15, 603–616 (2002)
Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8, 1704–1711 (2005)
Google Scholar
Daw, N.D., Courville, A.C., Tourtezky, D.S.: Representation and timing in theories of the dopamine system. Neural Computation 18, 1637–1677 (2006a)
MathSciNet Google Scholar
Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006b)
Google Scholar
Dayan, P., Daw, N.D.: Connections between computational and neurobiological perspectives on decision making. Cognitive, Affective, and Behavioral Neuroscience 8, 429–453 (2008)
Google Scholar
Dayan, P., Niv, Y.: Reinforcement learning: the good, the bad, and the ugly. Current Opinion in Neurobiology 18, 185–196 (2008)
Google Scholar
Dayan, P., Niv, Y., Seymour, B., Daw, N.D.: The misbehavior of value and the discipline of the will. Neural Networks 19, 1153–1160 (2006)
Google Scholar
Dickinson, A.: Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society of London B: Biological Sciences 308, 67–78 (1985)
Google Scholar
Dickinson, A., Balleine, B.W.: Motivational control of goal-directed action. Animal Learning and Behavior 22, 1–18 (1994)
Google Scholar
Doll, B.B., Frank, M.J.: The basal ganglia in reward and decision making: computational models and empirical studies. In: Dreher, J., Tremblay, L. (eds.) Handbook of Reward and Decision Making, ch. 19, pp. 399–425. Academic Press, Oxford (2009)
Google Scholar
Dommett, E., Coizet, V., Blaha, C.D., Martindale, J., Lefebvre, V., Mayhew, N.W.J.E., Overton, P.G., Redgrave, P.: How visual stimuli activate dopaminergic neurons at short latency. Science 307, 1476–1479 (2005)
Google Scholar
Doya, K.: What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex? Neural Networks 12, 961–974 (1999)
Google Scholar
Doya, K.: Reinforcement learning: Computational theory and biological mechanisms. HFSP Journal 1, 30–40 (2007)
Google Scholar
Doya, K.: Modulators of decision making. Nature Neuroscience 11, 410–416 (2008)
Google Scholar
Doyon, J., Bellec, P., Amsel, R., Penhune, V., Monchi, O., Carrier, J., Lehéricy, S., Benali, H.: Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural Brain Research 199, 61–75 (2009)
Google Scholar
Eckerman, D.A., Hienz, R.D., Stern, S., Kowlowitz, V.: Shaping the location of a pigeon’s peck: Effect of rate and size of shaping steps. Journal of the Experimental Analysis of Behavior 33, 299–310 (1980)
Google Scholar
Ferster, C.B., Skinner, B.F.: Schedules of Reinforcement. Appleton-Century-Crofts, New York (1957)
Google Scholar
Fiorillo, C.D., Tobler, P.N., Schultz, W.: Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003)
Google Scholar
Frank, M.J.: Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. Journal of Cognitive Neuroscience 17, 51–72 (2005)
Google Scholar
Frank, M.J., Claus, E.D.: Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review 113, 300–326 (2006)
Google Scholar
Frank, M.J., Seeberger, L.C., O’Reilly, R.C.: By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004)
Google Scholar
Gardner, R.: Multiple-choice decision behavior. American Journal of Psychology 71, 710–717 (1958)
Google Scholar
Gläscher, J.P., O’Doherty, J.P.: Model-based approaches to neuroimaging combining reinforcement learning theory with fMRI data. Wiley Interdisciplinary Reviews: Cognitive Science 1, 501–510 (2010)
Google Scholar
Gläscher, J.P., Daw, N.D., Dayan, P., O’Doherty, J.P.: States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010)
Google Scholar
Glimcher, P.W.: Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics. MIT Press, Cambridge (2003)
Google Scholar
Glimcher, P.W., Rustichini, A.: Neuroeconomics: The consilience of brain and decision. Science 306, 447–452 (2004)
Google Scholar
Gluck, M.A.: Behavioral and neural correlates of error correction in classical conditioning and human category learning. In: Gluck, M.A., Anderson, J.R., Kosslyn, S.M. (eds.) Memory and Mind: A Festschrift for Gordon H. Bower, ch. 18, pp. 281–305. Lawrence Earlbaum Associates, New York (2008)
Google Scholar
Gold, J.I., Shadlen, M.N.: The neural basis of decision making. Annual Review of Neuroscience 30, 535–574 (2007)
Google Scholar
Goldman-Rakic, P.S.: Cellular basis of working memory. Neuron 14, 447–485 (1995)
Google Scholar
Goodnow, J.T.: Determinants of choice-distribution in two-choice situations. The American Journal of Psychology 68, 106–116 (1955)
Google Scholar
Gormezano, I., Schneiderman, N., Deaux, E.G., Fuentes, I.: Nictitating membrane: Classical conditioning and extinction in the albino rabbit. Science 138, 33–34 (1962)
Google Scholar
Grafton, S.T., Hamilton, A.F.: Evidence for a distributed hierarchy of action representation in the brain. Human Movement Science 26, 590–616 (2007)
Google Scholar
Graybiel, A.M.: The basal ganglia: learning new tricks and loving it. Current Opinion in Neurobiology 15, 638–644 (2005)
Google Scholar
Graybiel, A.M.: Habits, rituals, and the evaluative brain. Annual Review of Neuroscience 31, 359–387 (2008)
Google Scholar
Graybiel, A.M., Aosaki, T., Flahrety, A.W., Kimura, M.: The basal ganglia and adaptive motor control. Science 265, 1826–1831 (1994)
Google Scholar
Green, L., Myerson, J.: A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin 130, 769–792 (2004)
Google Scholar
Grupen, R., Huber, M.: A framework for the development of robot behavior. In: 2005 AAAI Spring Symposium Series: Developmental Robotics. American Association for the Advancement of Artificial Intelligence, Palo Alta (2005)
Google Scholar
Gurney, K.: Reverse engineering the vertebrate brain: Methodological principles for a biologically grounded programme of cognitive modelling. Cognitive Computation 1, 29–41 (2009)
Google Scholar
Gurney, K., Prescott, T.J., Redgrave, P.: A computational model of action selection in the basal ganglia. I. A new functional anatomy. Biological Cybernetics 84, 401–410 (2001)
Google Scholar
Gurney, K., Prescott, T.J., Wickens, J.R., Redgrave, P.: Computational models of the basal ganglia: From robots to membranes. Trends in Neuroscience 27, 453–459 (2004)
Google Scholar
Haber, S.N.: The primate basal ganglia: Parallel and integrative networks. Journal of Chemical Neuroanatomy 26, 317–330 (2003)
Google Scholar
Haber, S.N., Kim, K.S., Mailly, P., Calzavara, R.: Reward-related cortical inputs define a large striatal region in primates that interface with associative cortical inputs, providing a substrate for incentive-based learning. The Journal of Neuroscience 26, 8368–8376 (2006)
Google Scholar
Haruno, M., Kawato, M.: Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks 19, 1242–1254 (2006)
Google Scholar
Hazy, T.E., Frank, M.J., O’Reilly, R.C.: Neural mechanisms of acquired phasic dopamine repsonses in learning. Neuroscience and Biobehavioral Reviews 34, 701–720 (2010)
Google Scholar
Herrnstein, R.J.: Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior 4, 267–272 (1961)
Google Scholar
Hikosaka, O.: Basal ganglia mechanisms of reward-oriented eye movement. Annals of the New York Academy of Science 1104, 229–249 (2007)
Google Scholar
Hollerman, J.R., Schultz, W.: Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience 1, 304–309 (1998)
Google Scholar
Horvitz, J.C.: Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000)
Google Scholar
Houk, J.C., Wise, S.P.: Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: Their role in planning and controlling action. Cerebral Cortex 5, 95–110 (1995)
Google Scholar
Houk, J.C., Adams, J.L., Barto, A.G.: A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk, J.C., Davis, J.L., Beiser, D.G. (eds.) Models of Information Processing in the Basal Ganglia, ch. 13, pp. 249–270. MIT Press, Cambridge (1995)
Google Scholar
Houk, J.C., Bastianen, C., Fansler, D., Fishbach, A., Fraser, D., Reber, P.J., Roy, S.A., Simo, L.S.: Action selection and refinement in subcortical loops through basal ganglia and cerebellum. Philosophical Transactions of the Royal Society of London B: Biological Sciences 362, 1573–1583 (2007)
Google Scholar
Hull, C.L.: Principles of Behavior. Appleton-Century-Crofts, New York (1943)
Google Scholar
Humphries, M.D., Prescott, T.J.: The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Progress in Neurobiology 90, 385–417 (2010)
Google Scholar
Ito, M., Doya, K.: Validation of decision-making models and analysis of decision variables in the rat basal ganglia. The Journal of Neuroscience 29, 9861–9874 (2009)
Google Scholar
Joel, D., Weiner, I.: The organization of the basal ganglia-thalamocortical circuits: Open interconnected rather than closed segregated. Neuroscience 63, 363–379 (1994)
Google Scholar
Joel, D., Niv, Y., Ruppin, E.: Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks 15, 535–547 (2002)
Google Scholar
Joshua, M., Adler, A., Bergman, H.: The dynamics of dopamine in control of motor behavior. Current Opinion in Neurobiology 19, 615–620 (2009)
Google Scholar
Kamin, L.J.: Predictability, surprise, attention, and conditioning. In: Campbell, B.A., Church, R.M. (eds.) Punishment and Aversive Behavior, pp. 279–296. Appleton-Century-Crofts, New York (1969)
Google Scholar
Kehoe, E.J., Schreurs, B.G., Graham, P.: Temporal primacy overrides prior training in serial compound conditioning of the rabbit’s nictitating membrane response. Animal Learning and Behavior 15, 455–464 (1987)
Google Scholar
Kim, H., Sul, J.H., Huh, N., Lee, D., Jung, M.W.: Role of striatum in updating values of chosen actions. The Journal of Neuroscience 29, 14,701–14,712 (2009)
Google Scholar
Kishida, K.T., King-Casas, B., Montague, P.R.: Neuroeconomic approaches to mental disorders. Neuron 67, 543–554 (2010)
Google Scholar
Klopf, A.H.: The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence. Hemisphere Publishing Corporation, Washington DC (1982)
Google Scholar
Kobayashi, S., Schultz, W.: Influence of reward delays on responses of dopamine neurons. The Journal of Neuroscience 28, 7837–7846 (2008)
Google Scholar
Konidaris, G.D., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1015–1023. MIT Press, Cambridge (2009)
Google Scholar
Lau, B., Glimcher, P.W.: Value representations in the primate striatum during matching behavior. Neuron 58, 451–463 (2008)
Google Scholar
Ljungberg, T., Apicella, P., Schultz, W.: Responses of monkey dopamine neurons during learning of behavioral reactions. Journal of Neurophysiology 67, 145–163 (1992)
Google Scholar
Ludvig, E.A., Sutton, R.S., Kehoe, E.J.: Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Computation 20, 3034–3054 (2008)
Google Scholar
Maia, T.V.: Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, and Behavioral Neuroscience 9, 343–364 (2009)
Google Scholar
Maia, T.V., Frank, M.J.: From reinforcement learning models to psychiatric and neurobiological disorders. Nature Neuroscience 14, 154–162 (2011)
Google Scholar
Matsumoto, K., Suzuki, W., Tanaka, K.: Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301, 229–232 (2003)
Google Scholar
Matsuzaka, Y., Picard, N., Strick, P.: Skill representation in the primary motor cortex after long-term practice. Journal of Neurophysiology 97, 1819–1832 (2007)
Google Scholar
McHaffie, J.G., Stanford, T.R., Stein, B.E., Coizet, V., Redgrave, P.: Subcortical loops through the basal ganglia. Trends in Neurosciences 28, 401–407 (2005)
Google Scholar
Middleton, F.A., Strick, P.L.: Basal-ganglia“projections” to the prefrontal cortex of the primate. Cerebral Cortex 12, 926–935 (2002)
Google Scholar
Miller, E.K., Cohen, J.D.: An integrative theory of prefrontal cortex function. Annual Review of Neuroscience 24, 167–202 (2001)
Google Scholar
Miller, J.D., Sanghera, M.K., German, D.C.: Mesencephalic dopaminergic unit activity in the behaviorally conditioned rat. Life Sciences 29, 1255–1263 (1981)
Google Scholar
Mink, J.W.: The basal ganglia: Focused selection and inhibition of competing motor programs. Progress in Neurobiology 50, 381–425 (1996)
Google Scholar
Mirolli, M., Mannella, F., Baldassarre, G.: The roles of the amygdala in the affective regulation of body, brain, and behaviour. Connection Science 22, 215–245 (2010)
Google Scholar
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience 16, 1936–1947 (1996)
Google Scholar
Montague, P.R., Hyman, S.E., Cohen, J.D.: Computational roles for dopamine in behavioural control. Nature 431, 760–767 (2004)
Google Scholar
Montague, P.R., King-Casas, B., Cohen, J.D.: Imaging valuation models in human choice. Annual Review of Neuroscience 29, 417–448 (2006)
Google Scholar
Moore, J.W., Choi, J.S.: Conditioned response timing and integration in the cerebellum. Learning and Memory 4, 116–129 (1997)
Google Scholar
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., Bergman, H.: Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience 9, 1057–1063 (2006)
Google Scholar
Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y., Tanji, J.: Activity in the lateral prefrontal cortex reflects multiple steps of future events in action plans. Neuron 50, 631–641 (2006)
Google Scholar
Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., Hikosaka, O.: Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269–280 (2004)
Google Scholar
Ng, A., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and applications to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 278–287 (1999)
Google Scholar
Nicola, S.M.: The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology 191, 521–550 (2007)
Google Scholar
Niv, Y.: Reinforcement learning in the brain. Journal of Mathematical Psychology 53, 139–154 (2009)
MathSciNet Google Scholar
Niv, Y., Duff, M.O., Dayan, P.: Dopamine, uncertainty, and TD learning. Behavioral and Brain Functions 1, 6 (2005)
Google Scholar
Niv, Y., Daw, N.D., Dayan, P.: Choice values. Nature Neuroscience 9, 987–988 (2006a)
Google Scholar
Niv, Y., Joel, D., Dayan, P.: A normative perspective on motivation. Trends in Cognitive Sciences 10, 375–381 (2006b)
Google Scholar
Nomoto, K., Schultz, W., Watanabe, T., Sakagami, M.: Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli. The Journal of Neuroscience 30, 10,692–10,702 (2010)
Google Scholar
O’Doherty, J.P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., Dolan, R.J.: Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004)
Google Scholar
Olds, J., Milner, P.: Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. Journal of Comparative and Physiological Psychology 47, 419–427 (1954)
Google Scholar
O’Reilly, R.C., Frank, M.J.: Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation 18, 283–328 (2006)
MathSciNet Google Scholar
Packard, M.G., Knowlton, B.J.: Learning and memory functions of the basal ganglia. Annual Review of Neuroscience 25, 563–593 (2002)
Google Scholar
Pasupathy, A., Miller, E.K.: Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005)
Google Scholar
Pavlov, I.P.: Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. Oxford University Press, Toronto (1927)
Google Scholar
Pennartz, C.M., Berke, J.D., Graybiel, A.M., Ito, R., Lansink, C.S., van der Meer, M., Redish, A.D., Smith, K.S., Voorn, P.: Corticostriatal interactions during learning, memory processing, and decision making. The Journal of Neuroscience 29, 12,831–12,838 (2009)
Google Scholar
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J., Frith, C.D.: Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006)
Google Scholar
Phelps, E.A., LeDoux, J.E.: Contributions of the amygdala to emotion processing: From animal models to human behavior. Neuron 48, 175–187 (2005)
Google Scholar
Poldrack, R.A., Sabb, F.W., Foerde, K., Tom, S.M., Asarnow, R.F., Bookheimer, S.Y., Knowlton, B.J.: The neural correlates of motor skill automaticity. The Journal of Neuroscience 25, 5356–5364 (2005)
Google Scholar
Pompilio, L., Kacelnik, A.: State-dependent learning and suboptimal choice: when starlings prefer long over short delays to food. Animal Behaviour 70, 571–578 (2005)
Google Scholar
Redgrave, P., Gurney, K.: The short-latency dopamine signal: a role in discovering novel actions? Nature Reviews Neuroscience 7, 967–975 (2006)
Google Scholar
Redgrave, P., Gurney, K., Reynolds, J.: What is reinforced by phasic dopamine signals? Brain Research Reviews 58, 322–339 (2008)
Google Scholar
Redgrave, P., Rodriguez, M., Smith, Y., Rodriguez-Oroz, M.C., Lehericy, S., Bergman, H., Agid, Y., DeLong, M.R., Obeso, J.A.: Goal-directed and habitual control in the basal ganglia: implications for Parkinson’s disease. Nature Reviews Neuroscience 11, 760–772 (2010)
Google Scholar
Redish, A.D., Jensen, S., Johnson, A.: A unified framework for addiction: Vulnerabilities in the decision process. Behavioral and Brain Sciences 31, 415–487 (2008)
Google Scholar
Rescorla, R.A., Wagner, A.R.: A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Prokasy, W.F. (eds.) Classical Conditioning II: Current Research and Theory, pp. 64–99. Appleton-Century-Crofts, New York (1972)
Google Scholar
Richardson, W.K., Warzak, W.J.: Stimulus stringing by pigeons. Journal of the Experimental Analysis of Behavior 36, 267–276 (1981)
Google Scholar
Roesch, M.R., Calu, D.J., Schoenbaum, G.: Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience 10, 1615–1624 (2007)
Google Scholar
Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., Schoenbaum, G.: Ventral striatal neurons encode the value of the chosen action in rats deciding between differently delayed or sized rewards. The Journal of Neuroscience 29, 13,365–13,376 (2009)
Google Scholar
Samejima, K., Doya, K.: Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of the New York Academy of Sciences 1104, 213–228 (2007)
Google Scholar
Samejima, K., Ueda, Y., Doya, K., Kimura, M.: Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005)
Google Scholar
Satoh, T., Nakai, S., Sato, T., Kimura, M.: Correlated coding of motivation and outcome of decision by dopamine neurons. The Journal of Neuroscience 23, 9913–9923 (2003)
Google Scholar
Schultz, W.: Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. Journal of Neurophysiology 56, 1439–1461 (1986)
Google Scholar
Schultz, W.: Predictive reward signal of dopamine neurons. Journal of Neurophysiology 80, 1–27 (1998)
Google Scholar
Schultz, W.: Behavioral theories and the neurophysiology of reward. Annual Review of Psychology 57, 8–115 (2006)
Google Scholar
Schultz, W.: Multiple dopamine functions at different time courses. Annual Review of Neuroscience 30, 259–288 (2007)
Google Scholar
Schultz, W.: Dopamine signals for reward value and risk: basic and recent data. Behavioral and Brain Functions 6, 24 (2010)
Google Scholar
Schultz, W., Apicella, P., Ljungberg, T.: Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. The Journal of Neuroscience 13, 900–913 (1993)
Google Scholar
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Google Scholar
Schultz, W., Tremblay, L., Hollerman, J.R.: Changes in behavior-related neuronal activity in the striatum during learning. Trends in Neuroscience 26, 321–328 (2003)
Google Scholar
Seger, C.A., Miller, E.K.: Category learning in the brain. Annual Review of Neuroscience 33, 203–219 (2010)
Google Scholar
Selfridge, O.J., Sutton, R.S., Barto, A.G.: Training and tracking in robotics. In: Joshi, A. (ed.) Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pp. 670–672. Morgan Kaufmann, San Mateo (1985)
Google Scholar
Shah, A.: Biologically-based functional mechanisms of motor skill acquisition. PhD thesis, University of Massachusetts Amherst (2008)
Google Scholar
Shah, A., Barto, A.G.: Effect on movement selection of an evolving sensory representation: A multiple controller model of skill acquisition. Brain Research 1299, 55–73 (2009)
Google Scholar
Shanks, D.R., Tunney, R.J., McCarthy, J.D.: A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making 15, 233–250 (2002)
Google Scholar
Siegel, S., Goldstein, D.A.: Decision making behaviour in a two-choice uncertain outcome situation. Journal of Experimental Psychology 57, 37–42 (1959)
Google Scholar
Skinner, B.F.: The Behavior of Organisms. Appleton-Century-Crofts, New York (1938)
Google Scholar
Staddon, J.E.R., Cerutti, D.T.: Operant behavior. Annual Review of Psychology 54, 115–144 (2003)
Google Scholar
Sutton, R.S.: Learning to predict by methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review 88, 135–170 (1981)
Google Scholar
Sutton, R.S., Barto, A.G.: A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, pp. 355–378 (1987)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tanji, J., Hoshi, E.: Role of the lateral prefrontal cortex in executive behavioral control. Physiological Reviews 88, 37–57 (2008)
Google Scholar
Thorndike, E.L.: Animal Intelligence: Experimental Studies. Macmillan, New York (1911)
Google Scholar
Tindell, A.J., Berridge, K.C., Zhang, J., Pecina, S., Aldridge, J.W.: Ventral pallidal neurons code incentive motivation: Amplification by mesolimbic sensitization and amphetamine. European Journal of Neuroscience 22, 2617–2634 (2005)
Google Scholar
Tobler, P.N., Dickinson, A., Schultz, W.: Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. The Journal of Neuroscience 23, 10,402–10,410 (2003)
Google Scholar
Tobler, P.N., Fiorillo, C.D., Schultz, W.: Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005)
Google Scholar
Tolman, E.C.: Cognitive maps in rats and men. The Psychological Review 55, 189–208 (1948)
Google Scholar
Tolman, E.C.: There is more than one kind of learning. Psychological Review 56, 44–55 (1949)
Google Scholar
Waelti, P., Dickinson, A., Schultz, W.: Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001)
Google Scholar
Wallis, J.D.: Orbitofrontal cortex and its contribution to decision-making. Annual Review of Neuroscience 30, 31–56 (2007)
Google Scholar
Watson, J.B.: Behavior: An Introduction to Comparative Psychology. Holt, New York (1914)
Google Scholar
Wickens, J.R.: Synaptic plasticity in the basal ganglia. Behavioural Brain Research 199, 119–128 (2009)
Google Scholar
Wickens, J.R., Budd, C.S., Hyland, B.I., Arbuthnott, G.W.: Striatal contributions to reward and decision making. Making sense of regional variations in a reiterated processing matrix. Annals of the New York Academy of Sciences 1104, 192–212 (2007)
Google Scholar
Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: 1960 WESCON Convention Record Part IV, pp. 96–104. Institute of Radio Engineers, New York (1960)
Google Scholar
Wilson, C.J.: Basal ganglia. In: Shepherd, G.M. (ed.) The Synaptic Organization of the Brain, ch. 9, 5th edn., pp. 361–414. Oxford University Press, Oxford (2004)
Google Scholar
Wise, R.A.: Dopamine, learning and motivation. Nature Reviews Neuroscience 5, 483–494 (2004)
Google Scholar
Wolpert, D.: Probabilistic models in human sensorimotor control. Human Movement Science 27, 511–524 (2007)
Google Scholar
Wörgötter, F., Porr, B.: Temporal sequence learning, prediction, and control: A review of different models and their relation to biological mechanisms. Neural Computation 17, 245–319 (2005)
Google Scholar
Wrase, J., Kahnt, T., Schlagenhauf, F., Beck, A., Cohen, M.X., Knutson, B., Heinz, A.: Different neural systems adjust motor behavior in response to reward and punishment. NeuroImage 36, 1253–1262 (2007)
Google Scholar
Wyvell, C.L., Berridge, K.C.: Intra-accumbens amphetamine increases the conditioned incentive salience of sucrose reward: Enhancement of reward “wanting” without enhanced “liking” or response reinforcement. Journal of Neuroscience 20, 8122–8130 (2000)
Google Scholar
Yin, H.H., Ostlund, S.B., Balleine, B.W.: Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. European Journal of Neuroscience 28, 1437–1448 (2008)
Google Scholar
Yu, A., Dayan, P.: Uncertainty, neuromodulation and attention. Neuron 46, 681–692 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of Sheffield, Sheffield, UK
Ashvin Shah

Authors

Ashvin Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashvin Shah .

Editor information

Editors and Affiliations

Fac. Mathematics &, Natural Sciences, University of Groningen, Nijenborgh 9, Groningen, 9747 AG, Netherlands
Marco Wiering
, Artificial Intelligence, Radboud University Nijmegen, B.02.30 Spinozagebouw, Montessorilaan 3, Nijmegen, 6500, Netherlands
Martijn van Otterlo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shah, A. (2012). Psychological and Neuroscientific Connections with Reinforcement Learning. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-27645-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27644-6
Online ISBN: 978-3-642-27645-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics