Assurance in Reinforcement Learning Using Quantitative Verification

Mason, George; Calinescu, Radu; Kudenko, Daniel; Banks, Alec

doi:10.1007/978-3-319-66790-4_5

George Mason⁵,
Radu Calinescu⁵,
Daniel Kudenko^5,6 &
…
Alec Banks⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 85))

960 Accesses
7 Citations

Abstract

Reinforcement learning (RL) agents converge to optimal solutions for sequential decision making problems. Although increasingly successful, RL cannot be used in applications where unpredictable agent behaviour may have significant unintended negative consequences. We address this limitation by introducing an assured reinforcement learning (ARL) method which uses quantitative verification (QV) to restrict the agent behaviour to areas that satisfy safety, reliability and performance constraints specified in probabilistic temporal logic. To this end, ARL builds an abstract Markov decision process (AMDP) that models the problem to solve at a high level, and uses QV to identify a set of Pareto-optimal AMDP policies that satisfy the constraints. These formally verified abstract policies define areas of the agent behaviour space where RL can occur without constraint violations. We show the effectiveness of our ARL method through two case studies: a benchmark flag-collection navigation task and an assisted-living planning system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wiering, M., Otterlo, M.: Reinforcement learning and markov decision processes. In: Wiering, M., Otterlo, M. (eds.) Reinforcement Learning: State-of-the-art, pp. 3–42. Springer, Berlin, Heidelberg (2012)
Chapter Google Scholar
Riedmiller, M., Gabel, T., Hafner, R.: Reinforcement learning for robot soccer. Auton. Robots 27(1), 55–73 (2009)
Article Google Scholar
Kwok, C., Fox, D.: reinforcement learning for sensing strategies. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3158–3163. IEEE (2004)
Google Scholar
Baxter, J., Tridgell, A., Weaver, L.: KnightCap: A chess program that learns by combining TD(\(\lambda \)) with minimax search. In: 15th International Conference on Machine Learning, pp. 28–36. Morgan Kaufmann (1998)
Google Scholar
Bagnell, J.A., Schneider, J.G.: autonomous helicopter control using reinforcement learning policy search methods. In: IEEE International Conference on Robotics and Automation. IEEE (2001)
Google Scholar
Calinescu, R., Ghezzi, C., Kwiatkowska, M., Mirandola, R.: Self-adaptive software needs quantitative verification at runtime. Commun. ACM 55(9), 69–77 (2012)
Article Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Mason, G., Calinescu, R., Kudenko, D., Banks, A.: Combining reinforcement learning and quantitative verification for agent policy assurance. In: 6th International Workshop on Combinations of Intelligent Methods and Applications, pp. 45–52 (2016)
Google Scholar
Kwiatkowska, M.: Quantitative verification: models, techniques and tools. In: 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 449–458. ACM Press (2007)
Google Scholar
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)
Article MATH Google Scholar
Marthi, B.: Automatic shaping and decomposition of reward functions. In: 24th International Conference on Machine learning, pp. 601–608. ACM (2007)
Google Scholar
Lahijanian, M., Andersson, S.B., Belta, C.: Temporal logic motion planning and control with probabilistic satisfaction guarantees. IEEE Trans. Robot. 28(2), 396–409 (2012)
Article Google Scholar
Cizelj, I., Ding, X.C. and Lahijanian, M., Pinto, A., Belta, C.: Probabilistically safe vehicle control in a hostile environment. In: 18th World Congress of the The International Federation of Automatic Control, pp. 11803–11808. (2011)
Google Scholar
Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Synthesis of human-in-the-loop control protocols for autonomous systems. IEEE Trans. Autom. Sci. Eng. 13(2), 450–462 (2016)
Article Google Scholar
Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: 9th International Symposium on Artificial Intelligence and Mathematics, pp. 531–539 (2006)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Andova, S., Hermanns, H., Katoen, J.-P.: Discrete-time rewards model-checked. In: Larsen, K.G., Niebert, P. (eds) Formal Modeling and Analysis of Timed Systems. LNCS, vol. 2791, pp. 88–104 (2004)
Google Scholar
Abe, N., Melville, P., Pendus, C., Reddy, C.K., Jensen, D.L., Thomas, V.P., Bennett, J.J., Anderson, G.F., Cooley, B.R., Kowalczyk, M., Domick, M., Gardinier, T.: Optimizing debt collections using constrained reinforcement learning. In: 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 75–84. ACM (2010)
Google Scholar
Castro, D.D., Tamar, A., Mannor, S.: Policy gradients with variance related risk criteria. In: 29th International Conference on Machine Learning, pp. 935–942. ACM (2012)
Google Scholar
Delage, E., Mannor, S.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)
Article MathSciNet MATH Google Scholar
Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) 17th European Conference on Machine Learning. LNAI vol. 4212, pp. 646–653. Springer, Heidelberg (2006)
Google Scholar
Moldovan, T.M., Abbeel, P.: Safe exploration in markov decision processes. In: 29th International Conference on Machine Learning, pp. 1711–1718. ACM (2012)
Google Scholar
Ponda, S.S., Johnson, L.B., How, J.P.: Risk allocation strategies for distributed chance-constrained task allocation. In: American Control Conference, pp. 3230–3236. IEEE (2013)
Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: 15th National Conference on Artificial Intelligence, pp. 761–768. AAAI Press (1998)
Google Scholar
Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Trans. Inf. Technol. Biomed. 10(2), 323–33 (2006)
Article Google Scholar
Kwiatkowska, M.: advances in quantitative verification for ubiquitous computing. In: Liu, Z., Woodcock, J., Zhu, H. (eds.) 10th International Colloquium on Theoretical Aspects of Computing. LNCS vol. 8049, pp. 42–58. Springer (2013)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) 23rd International Conference on Computer Aided Verification. LNCS vol. 6806, pp. 585–591. Springer (2011)
Google Scholar
Katoen, J.-P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The Ins and Outs of the probabilistic model checker MRMC. Perform. Eval. 68(2), 90–104 (2011)
Article Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) 7th International Conference on Formal Methods for Performance Evaluation. LNCS vol. 4486, pp. 220–270. Springer (2007)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-Learning. Mach. Learn. 8(3), 279–292 (1992)
MATH Google Scholar
Xia, L., Jia, Q.-S.: Policy Iteration for parameterized markov decision processes and its application. In: 9th Asian Control Conference, pp. 1–6. IEEE (2013)
Google Scholar
Liu, C., Xu, X., Hu, D.: Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans. Syst. Man Cybern. Syst. 45(3), 385–398 (2015)
Article Google Scholar
Gerasimou, S., Tamburrelli, G., Calinescu, R.: Search-based synthesis of probabilistic models for quality-of-service software engineering. In: 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 319–330. IEEE (2015)
Google Scholar
Arcuri, A., Briand, L.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 33rd International Conference on Software Engineering, pp. 1–10. ACM (2011)
Google Scholar
Calinescu, R., Kikuchi, S., Johnson, K.: Compositional reverification of probabilistic safety properties for large-scale complex IT systems. In: Large-Scale Complex IT Systems. Development, Operation and Management, pp. 303–329. Springer (2012)
Google Scholar
Calinescu, R., Johnson, K., Rafiq, Y.: Developing self-verifying service-based systems. In: 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 734–737 (2013)
Google Scholar
Calinescu, R., Gerasimou, S., Banks, A.: Self-adaptive software with decentralised control loops. In: 17th International Conference on Fundamental Approaches to Software Engineering, pp. 235–251. Springer (2015)
Google Scholar
Gerasimou, S., Calinescu, R., Banks, A.: Efficient runtime quantitative verification using caching, lookahead, and nearly-optimal reconfiguration. In: 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pp. 115–124 (2014)
Google Scholar
Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Conference on Machine Learning, pp. 105–111. (1994)
Google Scholar
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002)
Article MATH Google Scholar
Driessens, K., Džeroski, S.: Integrating Guidance into Relational Reinforcement Learning. Mach. Learn. 57(3), 271–304 (2004)
Article MATH Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1), 51–80 (2011)
Article MathSciNet Google Scholar
Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: 15th International Conference on Machine Learning, pp. 197–205. Morgan Kaufmann (1998)
Google Scholar
Mannor, S., Shimkin, N.: A geometric approach to multi-criterion reinforcement learning. J. Mach. Learn. Res. 5, 325–360 (2004)
MathSciNet MATH Google Scholar
Barrett, L., Narayanan, S.: learning all optimal policies with multiple criteria. In: 25th International Conference on Machine learning, pp. 41–47. Omni Press (2008)
Google Scholar
Moffaert, K.V., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15(1), 3663–3692 (2014)
MathSciNet MATH Google Scholar
Mason, G., Calinescu, R., Kudenko, D., Banks, A.: assured reinforcement learning for safety-critical applications. In: Doctoral Consortium on Agents and Artificial Intelligence (DCAART ’17), pp. 9–16 (2017)
Google Scholar
Mason, G., Calinescu, R., Kudenko, D., Banks, A.: Assured reinforcement learning with formally verified abstract policies. In: 9th International Conference on Agents and Artificial Intelligence, pp. 105–117 (2017)
Google Scholar
Calinescu, R., Johnson, K., Rafiq, Y.: Using observation ageing to improve Markovian model learning in QoS engineering. In: 2nd International Conference on Performance Engineering, pp. 505–510 (2011)
Google Scholar
Calinescu, R., Rafiq, Y., Johnson, K., Bakir, M.E.: Adaptive model learning for continual verification of non-functional properties. In: 5th International Conference on Performance Engineering, pp. 87–98 (2014)
Google Scholar
Efthymiadis, K., Kudenko, D.: Knowledge revision for reinforcement learning with abstract MDPs. In: Bordini, R.H., Elkind, E., Weiss, G., Yolum, P. (eds.) 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 763–770. (2015)
Google Scholar

Download references

Acknowledgements

This paper presents research sponsored by the UK MOD. The information contained in it should not be interpreted as representing the views of the UK MOD, nor should it be assumed it reflects any current or future UK MOD policy.

Author information

Authors and Affiliations

Department of Computer Science, University of York, Deramore Lane, York, UK
George Mason, Radu Calinescu & Daniel Kudenko
Saint Petersburg National Research Academic University of the Russian Academy of Sciences, St. Petersburg, Russia
Daniel Kudenko
Defence Science and Technology Laboratory, Salisbury, UK
Alec Banks

Authors

George Mason
View author publications
You can also search for this author in PubMed Google Scholar
Radu Calinescu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kudenko
View author publications
You can also search for this author in PubMed Google Scholar
Alec Banks
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Mason .

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Ioannis Hatzilygeroudis
The Faculty of Engineering, Environment and Computing, School of Computing, Coventry University, Coventry, United Kingdom
Vasile Palade

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mason, G., Calinescu, R., Kudenko, D., Banks, A. (2018). Assurance in Reinforcement Learning Using Quantitative Verification. In: Hatzilygeroudis, I., Palade, V. (eds) Advances in Hybridization of Intelligent Methods. Smart Innovation, Systems and Technologies, vol 85. Springer, Cham. https://doi.org/10.1007/978-3-319-66790-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-66790-4_5
Published: 15 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66789-8
Online ISBN: 978-3-319-66790-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics