A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations
Decision making is a fundamental ability for intelligent agents (e.g., humanoid robots and unmanned aerial vehicles). During decision making process, agents can improve the strategy for interacting with the dynamic environment through reinforcement learning. Many state-of-the-art reinforcement learning models deal with relatively smaller number of state-action pairs, and the states are preferably discrete, such as Q-learning and Actor-Critic algorithms. While in practice, in many scenario, the states are continuous and hard to be properly discretized. Better autonomous decision making methods need to be proposed to handle these problems. Inspired by the mechanism of decision making in human brain, we propose a general computational model, named as prefrontal cortex-basal ganglia (PFC-BG) algorithm. The proposed model is inspired by the biological reinforcement learning pathway and mechanisms from the following perspectives: (1) Dopamine signals continuously update reward-relevant information for both basal ganglia and working memory in prefrontal cortex. (2) We maintain the contextual reward information in working memory. This has a top-down biasing effect on reinforcement learning in basal ganglia. The proposed model separates the continuous states into smaller distinguishable states, and introduces continuous reward function for each state to obtain reward information at different time. To verify the performance of our model, we apply it to many UAV decision making experiments, such as avoiding obstacles and flying through window and door, and the experiments support the effectiveness of the model. Compared with traditional Q-learning and Actor-Critic algorithms, the proposed model is more biologically inspired, and more accurate and faster to make decision.
KeywordsPrefrontal cortex Working memory Basal ganglia Dopamine system Brain-inspired decision making model
This study was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB02060007), and Beijing Municipal Commission of Science and Technology (Z161100000216124). We would like to thank all the anonymous reviewers for all the constructive comments, which enables this paper to be with much better shape.
Compliance with Ethical Standards
Conflict of Interests
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
- 3.Humphrys M. Action selection methods using reinforcement learning. Proceedings of the International Conference on Simulation of Adaptive Behavior; 1996. p. 135–144.Google Scholar
- 4.Arel I. Theoretical foundations of artificial general intelligence, chapter deep reinforcement learning as foundation for artificial general Intelligence:89–102. 2012.Google Scholar
- 5.Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. 2013. arXiv:1312.5602.
- 6.Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533.CrossRefPubMedGoogle Scholar
- 7.Hearn RA, Granger RH. Learning hierarchical representations and behaviors. Association for the Advancement of Artificial Intelligence. 2008. Google Scholar
- 21.Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. Proceedings of the 33th international conference on machine learning; 2016. p. 1928–1937.Google Scholar
- 22.Timothy P, Lillicrap J, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. 2015. arXiv:1509.02971.
- 23.Hasselt HV, Guez A, Silver D. Deep reinforcement learning with double q-learning. Proceedings of the 30th AAAI conference on artificial intelligence; 2016.Google Scholar
- 24.Nair A, Srinivasan P, Blackwell S, Alcicek C, Fearon R, De Maria A, Panneershelvam V, Suleyman M, Beattie C, Petersen S. Massively parallel methods for deep reinforcement learning. 2015. arXiv:1507.04296.
- 26.Morimoto J, Doyayy K. Hierarchical reinforcement learning of low-dimensional subgoals and high-dimensional trajectories. Proceedings of the 5th International Conference on Neural Information Processing; 1998. p. 850–853.Google Scholar
- 27.Smart WD, Kaelbling LP. Practical reinforcement learning in continuous spaces. Proceedings of the 17th International Conference on Machine Learning; 2000. p. 903–910.Google Scholar
- 28.Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems; 2007. p. 833–840.Google Scholar
- 31.Ellaithy K, Bogdan M. A reinforcement learning framework for spiking networks with dynamic synapses. Comput Intell Neuroscience. 2011;2011(3):713–750.Google Scholar
- 35.Debnath S, Nassour J. Extending cortical-basal inspired reinforcement learning model with success-failure experience. Proceedings of 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics; 2014. p. 293–298.Google Scholar
- 38.Sutton RS, Barto AG. 1998. Reinforcement Learning: an introduction, chapter the reinforcement learning problem:70–71.Google Scholar
- 39.Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:188–190. 1998.Google Scholar
- 40.Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter evaluative feedback:40–42. 1998.Google Scholar
- 41.Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:185–186. 1998.Google Scholar