Imitation and Reinforcement Learning in Agents with Heterogeneous Actions

Price, Bob; Boutilier, Craig

doi:10.1007/3-540-45153-6_11

Bob Price³ &
Craig Boutilier⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2056))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1143 Accesses
1 Citations

Abstract

Reinforcement learning techniques are increasingly being used to solve difficult problems in control and combinatorial optimization with promising results. Implicit imitation can accelerate reinforcement learning (RL) by augmenting the Bellman equations with information from the observation of expert agents (mentors). We propose two extensions that permit imitation of agents with heterogeneous actions: feasibility testing, which detects infeasible mentor actions, and k-step repair, which searches for plans that approximate infeasible actions. We demonstrate empirically that both of these extensions allow imitation agents to converge more quickly in the presence of heterogeneous actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wei Zhang and Thomas G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI-95, pages 1114–1120, Montreal, 1995.
Google Scholar
Justin A. Boyan and Andrew W. Moore. Learning evaluation functions for global optimization and boolean satisfiability. In AAAI-98, pages 3–10, July 26-30, 1998, Madison, Wisconsin, 1998.
Google Scholar
Bob Price and Craig Boutilier. Implicit imitation in multiagent reinforcement learning. In ICML-99, pages 325–334, Bled, SI, 1999.
Google Scholar
Paul Bakker and Yasuo Kuniyoshi. Robot see, robot do: An overview of robot imitation. In AISB96 Workshop on Learning in Robots and Animals, pages 3–11, Brighton,UK, 1996.
Google Scholar
C. G. Atkeson and S. Schaal. Robot learning from demonstration. In ICML-97, pages 12–20, Nashville, TN, 1997.
Google Scholar
Aude Billard and Gillian Hayes. Learning to communicate through imitation in autonomous robots. In ICANN-97, pages 763-68, Lausanne, Switzerland, 1997.
Google Scholar
G. M. Hayes and J. Demiris. A robot controller using learning by imitation. Technical Report DAI No. 676, University of Edinburgh. Dept. of Artificial Intelligence, 1994.
Google Scholar
Yasuo Kuniyoshi, Masayuki Inaba, and Hirochika Inoue. Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6):799–822, 1994.
Article Google Scholar
T. M. Mitchell, S. Mahadevan, and L. Steinberg. LEAP: A learning apprentice for VLSI design. In IJCAI-85, pages 573–580, Los Altos, California, 1985. Morgan Kaufmann Publishers, Inc.
Google Scholar
Paul E. Utgoff and Jeffrey A. Clouse. Two kinds of training information for evaluation function learning. In AAAI-91, pages 596–600, Anaheim, CA, 1991. AAAI Press.
Google Scholar
Chrystopher Nehaniv and Kerstin Dautenhahn. Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation. In EWLR-98, pages 64–72, Edinburgh, 1998.
Google Scholar
Dorian Šuc and Ivan Bratko. Skill reconstruction as induction of LQ controllers with subgoals. In IJCAI-97, pages 914–919, Nagoya, 1997.
Google Scholar
Maja J. Mataric, Matthew Williamson, John Demiris, and Aswath Mohan. Behaviour-based primitives for articulated control. In SAB-98, pages 165–170, Zurich, 1998.
Google Scholar
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13(1):103-30, 1993.
Google Scholar
Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge,MA, 1993.
Google Scholar
George A. F. Seber. Multivariate Observations. Wiley, New York, 1984.
MATH Google Scholar
J. Mi and Allan R. Sampson. A comparison of the Bonferroni and Scheffé bounds. Journal of Statistical Planning and Inference, 36:101–105, 1993.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of British Columbia, Vancouver, B.C., Canada, V6T 1Z4
Bob Price
Department of Computer Science, University of Toronto, Toronto, ON, Canada, M5S 3H5
Craig Boutilier

Authors

Bob Price
View author publications
You can also search for this author in PubMed Google Scholar
Craig Boutilier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Alberta, Edmonton, AB, Canada, T6G 2E8
Eleni Stroulia
School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada, K1N 6N5
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Price, B., Boutilier, C. (2001). Imitation and Reinforcement Learning in Agents with Heterogeneous Actions. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_11

Download citation

DOI: https://doi.org/10.1007/3-540-45153-6_11
Published: 16 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42144-3
Online ISBN: 978-3-540-45153-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics