Abstract
We propose two basal ganglia (BG) models for autonomous behavior learning: the BG system model and the BG spiking neural network model. These models were developed on the basis of reinforcement learning (RL) theories and neuroscience principals of behavioral learning. The BG system model focuses on problems with RL input selection and reward setting. This model assumes that parallel BG modules receive a variety of inputs. We also propose an automatic setting method of internal reward for this model. The BG spiking neural network model focuses on problems with biological neural network architecture, ambiguous inputs and the mechanism of timing. This model accounts for the neurophysiological characteristics of neurons and differential functions of the direct and indirect pathways. We demonstrate that the BG system model achieves goals in fewer trials by learning the internal state representation, whereas the BG spiking neural network model has the capacity for probabilistic selection of action. Our results suggest that these two models are a step toward developing an autonomous learning system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Reiner, A., Medina, L., Veenman, C.L.: Structural and functional evolution of the basal ganglia in vertebrates. Brain Res. Brain Res. Rev. 28(3), 235–285 (1998)
Barto, A.G., Sutton, R.S., Anderson, C.: Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics, SMC 13, 834–846 (1983)
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Berns, G.S., McClure, S.M., Pagnoni, G., Montague, P.R.: Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001)
Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H., Kawato, M.: A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24, 1660–1665 (2004)
McHaffie, J.G., Jiang, H., May, P.J., Coizet, V., Overton, P.G., Stein, B.E., Redgrave, P.: A direct projection from superior colliculus to substantia nigra pars compacta in the cat. Neurosci. 138, 221–234 (2006)
Balleine, B.W., Delgado, M.R., Hikosaka, O.: The role of the dorsal striatum in reward and decision-making. J. Neurosci. 27, 8161–8165 (2007)
Niv, Y., Schoenbaum, G.: Dialogues on prediction errors. Trends Cogn. Sci. 12(7), 265–272 (2008)
Dayan, P., Niv, Y.: Reinforcement learning: The Good. The Bad and The Ugly, Curr. Opin. Neurobiol. 18(2), 185–196 (2008)
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005)
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proc. of the Seventh International Conference on Machine Learning, Austin, TX (1990)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1), 81–138 (1995)
Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3(1), 9–44 (1988)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)
Coutureau, E., Killcross, S.: Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167–174 (2003)
Balleine, B.W., Killcross, A.S., Dickinson, A.: The effect of lesions of the basolateral amygdale on instrumental conditioning. J. Neurosci. 23, 666–675 (2003)
Balleine, B.W.: Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86, 717–730 (2005)
Valentin, V.V., Dickinson, A., O’Doherty, J.P.: Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007)
Alexander, G.E., et al.: Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986)
Parent, A., Hazrati, L.N.: Functional anatomy of the basal ganglia.1. The cortico–basal ganglia–thalamo–cortical loop. Brain Res. Rev. 20, 91–127 (1995)
Middleton, F.A., Strick, P.L.: Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Rev. 31, 236–250 (2000)
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Matsumoto, M., Hikosaka, O.: Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007)
Comoli, E., Coizet, V., Boyes, J., Bolam, J.P., Canteras, N.S., Quirk, R.H., Overton, P.G., Redgrave, P.: A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat. Neurosci. 6(9), 974–980 (2003)
Zhou, F.M., Liang, Y., Dani, J.A.: Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat. Neurosci. 4(12), 1224–1229 (2001)
Partridge, J.G., Apparsundaram, S., Gerhardt, G.A., Ronesi, J., Lovinger, D.M.: Nicotinic acetylcholine receptors interact with dopamine in induction of striatal long-term depression. J. Neurosci. 22(7), 2541–2549 (2002)
Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S.: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7(8), 887–893 (2004)
Graybiel, A.M.: Habits, Rituals, and the Evaluative Brain. Annu. Rev. Neurosci. 31, 359–387 (2008)
Pasupathy, A., Miller, E.K.: Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005)
Redgrave, P., Prescott, T.J., Gurney, K.: The basal ganglia: a vertebrate solution to the selection problem? Neurosci. 89, 1009–1023 (1999)
Gurney, K., Prescott, T.J., Redgrave, P.: A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour. Biol. Cybern. 84, 411–423 (2001)
Prescott, T.J., Gurney, K., Montes-Gonzalez, F., Humphries, M.D., Redgrave, P.: The robot basal ganglia: action selection by an embedded model of the basal ganglia. In: Nicholson, L., Faull, R. (eds.) Basal Ganglia VII, pp. 349–356. Plenum Press
Humphries, M.D., Stewart, R.D., Gurney, K.N.: A physiologically plausible model of action selection and oscillatory activity in the basal ganglia. J. Neurosci. 26(50), 12921–12942 (2006)
Bogacz, R., Gurney, K.: The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions. Neural. Compu. 19, 442–477 (2007)
Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural. Comput. 14(6), 1347–1369 (2002)
Hallett, M., Shahani, B., Young, R.: EMG analysis of patients with cerebellar lesions. Journal of Neurology, Neurosurgery, and Psychiatry 38, 1163–1169 (1975)
Hore, J., Wild, B., Diener, H.C.: Cerebellar dysmetria at the elbow, wrist, and fingers. J. Neurophysiol. 65, 563–571 (1991)
Jeuptner, M., Rijntjes, M., Weiller, C., Faiss, J.H., Timmann, D., Mueller, S., Diener, H.C.: Localization of cerebellar timing processes using PET. Neurology 45, 1540–1545 (1995)
O’Boyle, D.J., Freeman, J.S., Cody, F.W.J.: The accuracy and precision of timing of self-paced, repetitive movements in subjects with Parkinson’s disease. Brain 119, 51–70 (1996)
Lo, C.-C., Wang, X.-J.: Cortico–basal ganglia circuit mechanism for a decision threshold in reaction time tasks. Nat. Neurosci. 9, 956–963 (2006)
Maimon, G., Assad, J.: A cognitive signal for the proactivetiming of action in macaque LIP. Nat. Neuro. 9(7), 948–955 (2006)
Doya, K.: What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Netw. 12, 961–974 (1999)
Romanelli, P., Esposito, V., Schaal, D.W., Heit, G.: Somatotopy in the basal ganglia: experimental and clinical evidence for segregated sensorimotor channels. Brain Res. Brain Res. Rev. 48, 112–128 (2005)
Middleton, F.A., Strick, P.L.: Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Brain Res. Rev. 31, 236–250 (2000)
Takeuchi, J., Shouno, O., Tsujino, H.: Modular neural networks for reinforcement learning with temporal intrinsic rewards. In: Proc. of 2007 International Joint Conference on Neural Networks (IJCNN) (2007)
Jaeger, H.: The ‘echo state’ approach to analysing and training recurrent neural networks. GMD report 148, German National Research Center for Information Technology (2001)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Nishida, S., Ishii, K., Furukawa, T.: An online adaptation control system using mnSOM. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 935–942. Springer, Heidelberg (2006)
Schmidhuber, J.: Curious model-building control system. In: Proc. International Joint Conference on Neural Networks (IJCNN 1991), pp. 1458–1463 (1991)
Oudeyer, P.Y., Kaplan, F., Hafner, V.V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(1), 265–286 (2007)
Plenz, D., Kitai, S.T.: A basal ganglia pacemaker formed by the subthalamic nucleus and external globus pallidus. Nature 400, 677–682 (1999)
Diesmann, M., Gewaltig, M.-O.: NEST: An Environment for Neural Systems Simulations. Forschung und wisschenschaftliches Rechnen, Beiträge zum Heinz-Billing-Preis 2001. Ges. für Wiss. Datenverarbeitung, 43–70 (2002)
Matsumoto, G., Tsujino, H.: Design of a brain computer using the novel principles of output-driven operation and memory-based architecture. In: Ono, T., Matsumoto, G., Llinas, R., Berthoz, A., Norgen, R., Nishijo, H., Tamura, R. (eds.) Cognition and Emotion in the Brain, pp. 529–546. Elsevier Science B.V, Amsterdam (2003)
Watanabe, T., Nanez, J.E., Sasaki, Y.: Perceptual learning without perception. Nature 413, 844–848 (2001)
Barto, A.G., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collection of skills. In: Proc. of the 3rd International Conference on Developmental Learning (ICDL) (2004)
Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1281–1288. MIT Press, Cambridge (2005)
Tsujino, H.: Output-driven operation and memory-based architecture principles embedded in a real-world device. J. Integr. Neurosci. 3(2), 133–142 (2004)
Koerner, E., Tsujino, H., Masutani, T.: A Cortical-type Modular Neural Network for Hypothetical Reasoning. Neural Netw. 10, 791–814 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tsujino, H., Takeuchi, J., Shouno, O. (2009). Basal Ganglia Models for Autonomous Behavior Learning. In: Sendhoff, B., Körner, E., Sporns, O., Ritter, H., Doya, K. (eds) Creating Brain-Like Intelligence. Lecture Notes in Computer Science(), vol 5436. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00616-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-00616-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00615-9
Online ISBN: 978-3-642-00616-6
eBook Packages: Computer ScienceComputer Science (R0)