Abstract
Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
If a trajectory is shorter than E, neutral elements (0 for the optimistic case and 1 for the pessimistic one) are added at the end. If the policies have different numbers of trajectories, neutral trajectories (vectors) are added to the shortest one.
- 3.
A criterion O satisfies the principle of strict monotonicity iff: \(\forall \delta , \delta ', \delta ''\), \(\delta \succeq _O \delta ' \iff \delta + \delta '' \succeq _O \delta ' + \delta ''\). \(\delta +\delta ''\) contains two disjoint sets of trajectories: the ones of \(\delta \) and the ones of \(\delta ''\) (and similarly for \(\delta '+\delta ''\)). Then, adding or removing identical trajectories to two sets of trajectories does not change their comparison by \(\succeq _{lmax(lmin)}\) (resp. \(\succeq _{lmin(lmax)}\)) - while it may transform a strict preference into an indifference if \(u_{opt}\) (resp. \(u_{pes}\)) were used.
References
Bauters, K., Liu, W., Godo, L.: Anytime algorithms for solving possibilistic MDPs and hybrid MDPs. In: Gyssens, M., Simari, G. (eds.) FoIKS 2016. LNCS, vol. 9616, pp. 24–41. Springer, Cham (2016)
Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)
Ben Amor, N., El Khalfi, Z., Fargier, H., Sabbadin, R.: Lexicographic refinements in possibilistic decision trees. In: Proceedings ECAI 2016, pp. 202–208 (2016)
Drougard, N., Teichteil-Konigsbuch, F., Farges, J.L., Dubois, D.: Qualitative possibilistic mixed-observable MDPs. In: Proceedings UAI 2013, pp. 192–201 (2013)
Dubois, D., Prade, H.: Possibility theory as a basis for qualitative decision theory. In: Proceedings IJCAI 1995, pp. 1925–1930 (1995)
Fargier, H., Sabbadin, R.: Qualitative decision under uncertainty: back to expected utility. Artif. Intell. 164, 245–280 (2005)
Gilbert, H., Weng, P.: Quantile reinforcement learning. In: Proceedings JMLR 2016, pp. 1–16 (2016)
Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov decision processes. In: Proceedings AAAI 2017, pp. 3569–3575 (2017)
Montes, I., Miranda, E., Montes, S.: Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance. Eur. J. Oper. Res. 234(1), 209–220 (2014)
Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press, Cambridge (1988)
Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)
Sabbadin, R.: Possibilistic Markov decision processes. Eng. Appl. Artif. Intell. 14, 287–300 (2001)
Sabbadin, R., Fargier, H.: Towards qualitative approaches to multi-stage decision making. Int. J. Approximate Reasoning 19, 441–471 (1998)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Szörényi, B., Busa-Fekete, R., Weng, P., Hüllermeier, E.: Qualitative multi-armed bandits: a quantile-based approach. In: Proceedings ICML 2015, pp. 1660–1668 (2015)
Weng, P.: Qualitative decision making under possibilistic uncertainty: toward more discriminating criteria. In: Proceedings UAI 2005, pp. 615–622 (2005)
Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings ICAPS 2011, pp. 282–289 (2011)
Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ben Amor, N., EL khalfi, Z., Fargier, H., Sabaddin, R. (2017). Efficient Policies for Stationary Possibilistic Markov Decision Processes. In: Antonucci, A., Cholvy, L., Papini, O. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2017. Lecture Notes in Computer Science(), vol 10369. Springer, Cham. https://doi.org/10.1007/978-3-319-61581-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-61581-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61580-6
Online ISBN: 978-3-319-61581-3
eBook Packages: Computer ScienceComputer Science (R0)