Preference Elicitation and Inverse Reinforcement Learning

  • Constantin A. Rothkopf
  • Christos Dimitrakakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)


We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent’s policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent’s preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.


Inverse reinforcement learning preference elicitation decision theory Bayesian inference 


  1. 1.
    Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004 (2004)Google Scholar
  2. 2.
    Bonilla, E.V., Guo, S., Sanner, S.: Gaussian process preference elicitation. In: NIPS 2010 (2010)Google Scholar
  3. 3.
    Boutilier, C.: A POMDP formulation of preference elicitation problems. In: AAAI 2002, pp. 239–246 (2002)Google Scholar
  4. 4.
    Braziunas, D., Boutilier, C.: Preference elicitation and generalized additive utility. In: AAAI 2006 (2006)Google Scholar
  5. 5.
    Casella, G., Fienberg, S., Olkin, I. (eds.): Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  6. 6.
    Chu, W., Ghahramani, Z.: Preference learning with gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 137–144. ACM, New York (2005)Google Scholar
  7. 7.
    DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons, Chichester (1970)zbMATHGoogle Scholar
  8. 8.
    Dimitrakakis, C., Rothkopf, C.A.: Bayesian multitask inverse reinforcement learning (2011), under review Google Scholar
  9. 9.
    Duff, M.O.: Optimal Learning Computational Procedures for Bayes-adaptive Markov Decision Processes. PhD thesis, University of Massachusetts at Amherst (2002)Google Scholar
  10. 10.
    Friedman, M., Savage, L.J.: The expected-utility hypothesis and the measurability of utility. The Journal of Political Economy 60(6), 463 (1952)CrossRefGoogle Scholar
  11. 11.
    Furmston, T., Barber, D.: Variational methods for reinforcement learning. In: AISTATS, pp. 241–248 (2010)Google Scholar
  12. 12.
    Grünwald, P.D., Philip Dawid, A.: Game theory, maximum entropy, minimum discrepancy, and robust bayesian decision theory. Annals of Statistics 32(4), 1367–1433 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Guo, S., Sanner, S.: Real-time multiattribute bayesian preference elicitation with pairwise comparison queries. In: AISTATS 2010 (2010)Google Scholar
  14. 14.
    Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann, San Francisco (2000)Google Scholar
  15. 15.
    Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press, New York (2006)Google Scholar
  16. 16.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey (2005)zbMATHGoogle Scholar
  17. 17.
    Ramachandran, D.: Personal communication (2010)Google Scholar
  18. 18.
    Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: 20th Int. Joint Conf. Artificial Intelligence, vol. 51, pp. 2856–2591 (2007)Google Scholar
  19. 19.
    Rothkopf, C.A.: Modular models of task based visually guided behavior. PhD thesis, Department of Brain and Cognitive Sciences, Department of Computer Science, University of Rochester (2008)Google Scholar
  20. 20.
    Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 10 (2008)Google Scholar
  21. 21.
    Syed, U., Schapire, R.E.: A reduction from apprenticeship learning to classification. In: NIPS 2010 (2010)Google Scholar
  22. 22.
    Ziebart, B.D., Andrew Bagnell, J., Dey, A.K.: Modelling interaction via the principle of maximum causal entropy. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Constantin A. Rothkopf
    • 1
  • Christos Dimitrakakis
    • 2
  1. 1.Frankfurt Institute for Advanced StudiesFrankfurtGermany
  2. 2.EPFLLausanneSwitzerland

Personalised recommendations