Parameter-Independent Strategies for pMDPs via POMDPs

Arming, Sebastian; Bartocci, Ezio; Chatterjee, Krishnendu; Katoen, Joost-Pieter; Sokolova, Ana

doi:10.1007/978-3-319-99154-2_4

Sebastian Arming¹⁵,
Ezio Bartocci¹⁶,
Krishnendu Chatterjee¹⁷,
Joost-Pieter Katoen¹⁸ &
…
Ana Sokolova¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11024))

Included in the following conference series:

International Conference on Quantitative Evaluation of Systems

825 Accesses
7 Citations

Abstract

Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances.

We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs.

We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.

This work was supported by the Austrian FWF (National Research Network RiSE/SHiNE S11405-N23, S11407-N23 and S11411-N23) and by the RTG 2236 UnRAVeL funded by the German Research Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We can deal with objectives beyond reachability as long as they are induced by a reward structure (for applicability of the available tools), see Sects. 3 and 4.
2.
The discounted accumulated reward objective is defined in a similar way, by adding a factor \(\gamma ^i\) to the i-th summand in (1) with \(\gamma \in [0,1)\) being the discount factor. For solving reachability objectives, undiscounted rewards are sufficient.
3.
We use the abbreviation pMDP rather than PMDP as it is common in the recent literature, see (e.g. [17, 35] and as it reminds of the parameter p.).

References

Arming, S., Bartocci, E., Chatterjee, K., Katoen, J., Sokolova, A.: Parameter-independent strategies for pMDPs via POMDPs. arXiv 1806.05126 (2018). http://arxiv.org/abs/1806.05126
Arming, S., Bartocci, E., Sokolova, A.: SEA-PARAM: exploring schedulers in parametric MDPs. In: Proceedings of the QAPL 2017. EPTCS, vol. 250, pp. 25–38 (2017)
Google Scholar
Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algorithms 11(3), 441–461 (1990)
Article MathSciNet Google Scholar
Baier, C., Größer, M., Bertrand, N.: Probabilistic \(\omega \)-automata. J. ACM 59(1), 1:1–1:52 (2012)
Article MathSciNet Google Scholar
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Baldi, M., et al.: A probabilistic small model theorem to assess confidentiality of dispersed cloud storage. In: Bertrand, N., Bortolussi, L. (eds.) QEST 2017. LNCS, vol. 10503, pp. 123–139. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66335-7_8
Chapter Google Scholar
Bargiacchi, E.: AI-Toolbox. https://github.com/Svalorzen/AI-Toolbox/
Bartocci, E., Grosu, R., Katsaros, P., Ramakrishnan, C.R., Smolka, S.A.: Model repair for probabilistic systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 326–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19835-9_30
Chapter MATH Google Scholar
Beyer, D., Löwe, S., Wendler, P.: Benchmarking and resource measurement. In: Fischer, B., Geldenhuys, J. (eds.) SPIN 2015. LNCS, vol. 9232, pp. 160–178. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23404-5_12
Chapter Google Scholar
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning - a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the UAI 1997, pp. 54–61 (1997)
Google Scholar
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258–269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24
Chapter MATH Google Scholar
Chatterjee, K., Chmelik, M.: POMDPs under probabilistic semantics. Artif. Intell. 221, 46–72 (2015)
Article MathSciNet Google Scholar
Chatterjee, K., Chmelik, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: Proceedings of the AAAI 2016, pp. 3225–3232 (2016)
Google Scholar
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)
Article MathSciNet Google Scholar
Chatterjee, K., Doyen, L., Gimbert, H., Henzinger, T.A.: Randomness for free. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 246–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_23
Chapter Google Scholar
Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: Proceedings of the TASE 2013, pp. 85–92 (2013)
Google Scholar
Cubuktepe, M.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133–150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8
Chapter Google Scholar
Daws, C.: Symbolic and parametric model checking of discrete-time Markov chains. In: Liu, Z., Araki, K. (eds.) ICTAC 2004. LNCS, vol. 3407, pp. 280–294. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31862-0_21
Chapter MATH Google Scholar
Dehnert, C., et al.: PROPhESY: a probabilistic parameter synthesis tool. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 214–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_13
Chapter Google Scholar
Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A Storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31
Chapter Google Scholar
Hahn, E.M., Han, T., Zhang, L.: Probabilistic reachability for parametric Markov models. STTT 13(1), 3–19 (2011)
Article Google Scholar
Hahn, E.M., Han, T., Zhang, L.: Synthesis for PCTL in parametric Markov decision processes. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 146–161. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_12
Chapter Google Scholar
Hahn, E.M., Hermanns, H., Zhang, L., Wachter, B.: PARAM case studies (2015). https://depend.cs.uni-saarland.de/tools/param/casestudies
Halmos, P.R.: Measure Theory. Springer, New York (1974). https://doi.org/10.1007/978-1-4684-9440-2
Book MATH Google Scholar
Jansen, N., et al.: Accelerating parametric probabilistic verification. In: Norman, G., Sanders, W. (eds.) QEST 2014. LNCS, vol. 8657, pp. 404–420. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10696-0_31
Chapter Google Scholar
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-state controllers of POMDPs via parameter synthesis. In: Proceedings of the UAI 2018 (2018)
Google Scholar
Kreinovich, V., Lakeyev, A., Rohn, J., Kahl, P.: Computational Complexity and Feasibility of Data Processing and Interval Computations, Applied Optimization, vol. 10. Springer, Boston (1998). https://doi.org/10.1007/978-1-4757-2793-7
Book MATH Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lanotte, R., Maggiolo-Schettini, A., Troina, A.: Parametric probabilistic transition systems for system design and analysis. Form. Asp. Comput. 19(1), 93–109 (2007)
Article Google Scholar
Lukina, A., et al.: ARES: adaptive receding-horizon synthesis of optimal plans. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 286–302. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_17
Chapter Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Article MathSciNet Google Scholar
Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: Proceedings of the ICRA 2012, pp. 4705–4710. IEEE (2012)
Google Scholar
Pathak, S., Ábrahám, E., Jansen, N., Tacchella, A., Katoen, J.-P.: A greedy approach for the efficient repair of stochastic models. In: Havelund, K., Holzmann, G., Joshi, R. (eds.) NFM 2015. LNCS, vol. 9058, pp. 295–309. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17524-9_21
Chapter Google Scholar
Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration - an anytime algorithm for POMDPs. In: Proceedings of the IJCAI 2003, pp. 1025–1032 (2003)
Google Scholar
Polgreen, E., Wijesuriya, V.B., Haesaert, S., Abate, A.: Automated experiment design for data-efficient verification of parametric Markov decision processes. In: Bertrand, N., Bortolussi, L. (eds.) QEST 2017. LNCS, vol. 10503, pp. 259–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66335-7_16
Chapter MATH Google Scholar
Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4
Chapter Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2009)
MATH Google Scholar
Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York (1998)
Book Google Scholar
Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Salzburg, Salzburg, Austria
Sebastian Arming & Ana Sokolova
TU Wien, Vienna, Austria
Ezio Bartocci
IST Austria, Klosterneuburg, Austria
Krishnendu Chatterjee
RWTH Aachen University, Aachen, Germany
Joost-Pieter Katoen

Authors

Sebastian Arming
View author publications
You can also search for this author in PubMed Google Scholar
Ezio Bartocci
View author publications
You can also search for this author in PubMed Google Scholar
Krishnendu Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Joost-Pieter Katoen
View author publications
You can also search for this author in PubMed Google Scholar
Ana Sokolova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Arming .

Editor information

Editors and Affiliations

Macquarie University , Sydney, New South Wales, Australia
Annabelle McIver
University of Turin , Turin, Italy
Andras Horvath

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arming, S., Bartocci, E., Chatterjee, K., Katoen, JP., Sokolova, A. (2018). Parameter-Independent Strategies for pMDPs via POMDPs. In: McIver, A., Horvath, A. (eds) Quantitative Evaluation of Systems. QEST 2018. Lecture Notes in Computer Science(), vol 11024. Springer, Cham. https://doi.org/10.1007/978-3-319-99154-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-99154-2_4
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99153-5
Online ISBN: 978-3-319-99154-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics