Abstract
Reinforcement learning techniques are increasingly being used to solve difficult problems in control and combinatorial optimization with promising results. Implicit imitation can accelerate reinforcement learning (RL) by augmenting the Bellman equations with information from the observation of expert agents (mentors). We propose two extensions that permit imitation of agents with heterogeneous actions: feasibility testing, which detects infeasible mentor actions, and k-step repair, which searches for plans that approximate infeasible actions. We demonstrate empirically that both of these extensions allow imitation agents to converge more quickly in the presence of heterogeneous actions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wei Zhang and Thomas G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI-95, pages 1114–1120, Montreal, 1995.
Justin A. Boyan and Andrew W. Moore. Learning evaluation functions for global optimization and boolean satisfiability. In AAAI-98, pages 3–10, July 26-30, 1998, Madison, Wisconsin, 1998.
Bob Price and Craig Boutilier. Implicit imitation in multiagent reinforcement learning. In ICML-99, pages 325–334, Bled, SI, 1999.
Paul Bakker and Yasuo Kuniyoshi. Robot see, robot do: An overview of robot imitation. In AISB96 Workshop on Learning in Robots and Animals, pages 3–11, Brighton,UK, 1996.
C. G. Atkeson and S. Schaal. Robot learning from demonstration. In ICML-97, pages 12–20, Nashville, TN, 1997.
Aude Billard and Gillian Hayes. Learning to communicate through imitation in autonomous robots. In ICANN-97, pages 763-68, Lausanne, Switzerland, 1997.
G. M. Hayes and J. Demiris. A robot controller using learning by imitation. Technical Report DAI No. 676, University of Edinburgh. Dept. of Artificial Intelligence, 1994.
Yasuo Kuniyoshi, Masayuki Inaba, and Hirochika Inoue. Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6):799–822, 1994.
T. M. Mitchell, S. Mahadevan, and L. Steinberg. LEAP: A learning apprentice for VLSI design. In IJCAI-85, pages 573–580, Los Altos, California, 1985. Morgan Kaufmann Publishers, Inc.
Paul E. Utgoff and Jeffrey A. Clouse. Two kinds of training information for evaluation function learning. In AAAI-91, pages 596–600, Anaheim, CA, 1991. AAAI Press.
Chrystopher Nehaniv and Kerstin Dautenhahn. Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation. In EWLR-98, pages 64–72, Edinburgh, 1998.
Dorian Šuc and Ivan Bratko. Skill reconstruction as induction of LQ controllers with subgoals. In IJCAI-97, pages 914–919, Nagoya, 1997.
Maja J. Mataric, Matthew Williamson, John Demiris, and Aswath Mohan. Behaviour-based primitives for articulated control. In SAB-98, pages 165–170, Zurich, 1998.
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13(1):103-30, 1993.
Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge,MA, 1993.
George A. F. Seber. Multivariate Observations. Wiley, New York, 1984.
J. Mi and Allan R. Sampson. A comparison of the Bonferroni and Scheffé bounds. Journal of Statistical Planning and Inference, 36:101–105, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Price, B., Boutilier, C. (2001). Imitation and Reinforcement Learning in Agents with Heterogeneous Actions. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_11
Download citation
DOI: https://doi.org/10.1007/3-540-45153-6_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42144-3
Online ISBN: 978-3-540-45153-2
eBook Packages: Springer Book Archive