Skip to main content

Imitation and Reinforcement Learning in Agents with Heterogeneous Actions

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence (Canadian AI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2056))

Abstract

Reinforcement learning techniques are increasingly being used to solve difficult problems in control and combinatorial optimization with promising results. Implicit imitation can accelerate reinforcement learning (RL) by augmenting the Bellman equations with information from the observation of expert agents (mentors). We propose two extensions that permit imitation of agents with heterogeneous actions: feasibility testing, which detects infeasible mentor actions, and k-step repair, which searches for plans that approximate infeasible actions. We demonstrate empirically that both of these extensions allow imitation agents to converge more quickly in the presence of heterogeneous actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wei Zhang and Thomas G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI-95, pages 1114–1120, Montreal, 1995.

    Google Scholar 

  2. Justin A. Boyan and Andrew W. Moore. Learning evaluation functions for global optimization and boolean satisfiability. In AAAI-98, pages 3–10, July 26-30, 1998, Madison, Wisconsin, 1998.

    Google Scholar 

  3. Bob Price and Craig Boutilier. Implicit imitation in multiagent reinforcement learning. In ICML-99, pages 325–334, Bled, SI, 1999.

    Google Scholar 

  4. Paul Bakker and Yasuo Kuniyoshi. Robot see, robot do: An overview of robot imitation. In AISB96 Workshop on Learning in Robots and Animals, pages 3–11, Brighton,UK, 1996.

    Google Scholar 

  5. C. G. Atkeson and S. Schaal. Robot learning from demonstration. In ICML-97, pages 12–20, Nashville, TN, 1997.

    Google Scholar 

  6. Aude Billard and Gillian Hayes. Learning to communicate through imitation in autonomous robots. In ICANN-97, pages 763-68, Lausanne, Switzerland, 1997.

    Google Scholar 

  7. G. M. Hayes and J. Demiris. A robot controller using learning by imitation. Technical Report DAI No. 676, University of Edinburgh. Dept. of Artificial Intelligence, 1994.

    Google Scholar 

  8. Yasuo Kuniyoshi, Masayuki Inaba, and Hirochika Inoue. Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6):799–822, 1994.

    Article  Google Scholar 

  9. T. M. Mitchell, S. Mahadevan, and L. Steinberg. LEAP: A learning apprentice for VLSI design. In IJCAI-85, pages 573–580, Los Altos, California, 1985. Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  10. Paul E. Utgoff and Jeffrey A. Clouse. Two kinds of training information for evaluation function learning. In AAAI-91, pages 596–600, Anaheim, CA, 1991. AAAI Press.

    Google Scholar 

  11. Chrystopher Nehaniv and Kerstin Dautenhahn. Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation. In EWLR-98, pages 64–72, Edinburgh, 1998.

    Google Scholar 

  12. Dorian Šuc and Ivan Bratko. Skill reconstruction as induction of LQ controllers with subgoals. In IJCAI-97, pages 914–919, Nagoya, 1997.

    Google Scholar 

  13. Maja J. Mataric, Matthew Williamson, John Demiris, and Aswath Mohan. Behaviour-based primitives for articulated control. In SAB-98, pages 165–170, Zurich, 1998.

    Google Scholar 

  14. Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13(1):103-30, 1993.

    Google Scholar 

  15. Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge,MA, 1993.

    Google Scholar 

  16. George A. F. Seber. Multivariate Observations. Wiley, New York, 1984.

    MATH  Google Scholar 

  17. J. Mi and Allan R. Sampson. A comparison of the Bonferroni and Scheffé bounds. Journal of Statistical Planning and Inference, 36:101–105, 1993.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Price, B., Boutilier, C. (2001). Imitation and Reinforcement Learning in Agents with Heterogeneous Actions. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45153-6_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42144-3

  • Online ISBN: 978-3-540-45153-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics