Efficient Policies for Stationary Possibilistic Markov Decision Processes

Ben Amor, Nahla; EL khalfi, Zeineb; Fargier, Hélène; Sabaddin, Régis

doi:10.1007/978-3-319-61581-3_28

Nahla Ben Amor¹⁶,
Zeineb EL khalfi^16,17,
Hélène Fargier¹⁷ &
…
Régis Sabaddin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10369))

Included in the following conference series:

European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty

594 Accesses

Abstract

Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.irit.fr/publis/ADRIA/PapersFargier/XKRU17MDP.pdf.
2.
If a trajectory is shorter than E, neutral elements (0 for the optimistic case and 1 for the pessimistic one) are added at the end. If the policies have different numbers of trajectories, neutral trajectories (vectors) are added to the shortest one.
3.
A criterion O satisfies the principle of strict monotonicity iff: \(\forall \delta , \delta ', \delta ''\), \(\delta \succeq _O \delta ' \iff \delta + \delta '' \succeq _O \delta ' + \delta ''\). \(\delta +\delta ''\) contains two disjoint sets of trajectories: the ones of \(\delta \) and the ones of \(\delta ''\) (and similarly for \(\delta '+\delta ''\)). Then, adding or removing identical trajectories to two sets of trajectories does not change their comparison by \(\succeq _{lmax(lmin)}\) (resp. \(\succeq _{lmin(lmax)}\)) - while it may transform a strict preference into an indifference if \(u_{opt}\) (resp. \(u_{pes}\)) were used.

References

Bauters, K., Liu, W., Godo, L.: Anytime algorithms for solving possibilistic MDPs and hybrid MDPs. In: Gyssens, M., Simari, G. (eds.) FoIKS 2016. LNCS, vol. 9616, pp. 24–41. Springer, Cham (2016)
Chapter Google Scholar
Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)
MathSciNet MATH Google Scholar
Ben Amor, N., El Khalfi, Z., Fargier, H., Sabbadin, R.: Lexicographic refinements in possibilistic decision trees. In: Proceedings ECAI 2016, pp. 202–208 (2016)
Google Scholar
Drougard, N., Teichteil-Konigsbuch, F., Farges, J.L., Dubois, D.: Qualitative possibilistic mixed-observable MDPs. In: Proceedings UAI 2013, pp. 192–201 (2013)
Google Scholar
Dubois, D., Prade, H.: Possibility theory as a basis for qualitative decision theory. In: Proceedings IJCAI 1995, pp. 1925–1930 (1995)
Google Scholar
Fargier, H., Sabbadin, R.: Qualitative decision under uncertainty: back to expected utility. Artif. Intell. 164, 245–280 (2005)
Article MathSciNet MATH Google Scholar
Gilbert, H., Weng, P.: Quantile reinforcement learning. In: Proceedings JMLR 2016, pp. 1–16 (2016)
Google Scholar
Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov decision processes. In: Proceedings AAAI 2017, pp. 3569–3575 (2017)
Google Scholar
Montes, I., Miranda, E., Montes, S.: Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance. Eur. J. Oper. Res. 234(1), 209–220 (2014)
Article MathSciNet MATH Google Scholar
Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press, Cambridge (1988)
Book MATH Google Scholar
Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)
Book MATH Google Scholar
Sabbadin, R.: Possibilistic Markov decision processes. Eng. Appl. Artif. Intell. 14, 287–300 (2001)
Article Google Scholar
Sabbadin, R., Fargier, H.: Towards qualitative approaches to multi-stage decision making. Int. J. Approximate Reasoning 19, 441–471 (1998)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Szörényi, B., Busa-Fekete, R., Weng, P., Hüllermeier, E.: Qualitative multi-armed bandits: a quantile-based approach. In: Proceedings ICML 2015, pp. 1660–1668 (2015)
Google Scholar
Weng, P.: Qualitative decision making under possibilistic uncertainty: toward more discriminating criteria. In: Proceedings UAI 2005, pp. 615–622 (2005)
Google Scholar
Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings ICAPS 2011, pp. 282–289 (2011)
Google Scholar
Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LARODEC, Le Bardo, Tunisie
Nahla Ben Amor & Zeineb EL khalfi
IRIT, Toulouse, France
Zeineb EL khalfi & Hélène Fargier
INRA-MIAT, Toulouse, France
Régis Sabaddin

Authors

Nahla Ben Amor
View author publications
You can also search for this author in PubMed Google Scholar
Zeineb EL khalfi
View author publications
You can also search for this author in PubMed Google Scholar
Hélène Fargier
View author publications
You can also search for this author in PubMed Google Scholar
Régis Sabaddin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nahla Ben Amor , Zeineb EL khalfi , Hélène Fargier or Régis Sabaddin .

Editor information

Editors and Affiliations

IDSIA, Lugano, Switzerland
Alessandro Antonucci
ONERA, Toulouse, France
Laurence Cholvy
Aix-Marseille University, Marseille, France
Odile Papini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Amor, N., EL khalfi, Z., Fargier, H., Sabaddin, R. (2017). Efficient Policies for Stationary Possibilistic Markov Decision Processes. In: Antonucci, A., Cholvy, L., Papini, O. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2017. Lecture Notes in Computer Science(), vol 10369. Springer, Cham. https://doi.org/10.1007/978-3-319-61581-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-61581-3_28
Published: 15 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61580-6
Online ISBN: 978-3-319-61581-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics