Skip to main content

Efficient Policies for Stationary Possibilistic Markov Decision Processes

  • Conference paper
  • First Online:
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10369))

  • 594 Accesses

Abstract

Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.irit.fr/publis/ADRIA/PapersFargier/XKRU17MDP.pdf.

  2. 2.

    If a trajectory is shorter than E, neutral elements (0 for the optimistic case and 1 for the pessimistic one) are added at the end. If the policies have different numbers of trajectories, neutral trajectories (vectors) are added to the shortest one.

  3. 3.

    A criterion O satisfies the principle of strict monotonicity iff: \(\forall \delta , \delta ', \delta ''\), \(\delta \succeq _O \delta ' \iff \delta + \delta '' \succeq _O \delta ' + \delta ''\). \(\delta +\delta ''\) contains two disjoint sets of trajectories: the ones of \(\delta \) and the ones of \(\delta ''\) (and similarly for \(\delta '+\delta ''\)). Then, adding or removing identical trajectories to two sets of trajectories does not change their comparison by \(\succeq _{lmax(lmin)}\) (resp. \(\succeq _{lmin(lmax)}\)) - while it may transform a strict preference into an indifference if \(u_{opt}\) (resp. \(u_{pes}\)) were used.

References

  1. Bauters, K., Liu, W., Godo, L.: Anytime algorithms for solving possibilistic MDPs and hybrid MDPs. In: Gyssens, M., Simari, G. (eds.) FoIKS 2016. LNCS, vol. 9616, pp. 24–41. Springer, Cham (2016)

    Chapter  Google Scholar 

  2. Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)

    MathSciNet  MATH  Google Scholar 

  3. Ben Amor, N., El Khalfi, Z., Fargier, H., Sabbadin, R.: Lexicographic refinements in possibilistic decision trees. In: Proceedings ECAI 2016, pp. 202–208 (2016)

    Google Scholar 

  4. Drougard, N., Teichteil-Konigsbuch, F., Farges, J.L., Dubois, D.: Qualitative possibilistic mixed-observable MDPs. In: Proceedings UAI 2013, pp. 192–201 (2013)

    Google Scholar 

  5. Dubois, D., Prade, H.: Possibility theory as a basis for qualitative decision theory. In: Proceedings IJCAI 1995, pp. 1925–1930 (1995)

    Google Scholar 

  6. Fargier, H., Sabbadin, R.: Qualitative decision under uncertainty: back to expected utility. Artif. Intell. 164, 245–280 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gilbert, H., Weng, P.: Quantile reinforcement learning. In: Proceedings JMLR 2016, pp. 1–16 (2016)

    Google Scholar 

  8. Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov decision processes. In: Proceedings AAAI 2017, pp. 3569–3575 (2017)

    Google Scholar 

  9. Montes, I., Miranda, E., Montes, S.: Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance. Eur. J. Oper. Res. 234(1), 209–220 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press, Cambridge (1988)

    Book  MATH  Google Scholar 

  11. Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)

    Book  MATH  Google Scholar 

  12. Sabbadin, R.: Possibilistic Markov decision processes. Eng. Appl. Artif. Intell. 14, 287–300 (2001)

    Article  Google Scholar 

  13. Sabbadin, R., Fargier, H.: Towards qualitative approaches to multi-stage decision making. Int. J. Approximate Reasoning 19, 441–471 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  14. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  15. Szörényi, B., Busa-Fekete, R., Weng, P., Hüllermeier, E.: Qualitative multi-armed bandits: a quantile-based approach. In: Proceedings ICML 2015, pp. 1660–1668 (2015)

    Google Scholar 

  16. Weng, P.: Qualitative decision making under possibilistic uncertainty: toward more discriminating criteria. In: Proceedings UAI 2005, pp. 615–622 (2005)

    Google Scholar 

  17. Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings ICAPS 2011, pp. 282–289 (2011)

    Google Scholar 

  18. Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nahla Ben Amor , Zeineb EL khalfi , Hélène Fargier or Régis Sabaddin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ben Amor, N., EL khalfi, Z., Fargier, H., Sabaddin, R. (2017). Efficient Policies for Stationary Possibilistic Markov Decision Processes. In: Antonucci, A., Cholvy, L., Papini, O. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2017. Lecture Notes in Computer Science(), vol 10369. Springer, Cham. https://doi.org/10.1007/978-3-319-61581-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61581-3_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61580-6

  • Online ISBN: 978-3-319-61581-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics