Skip to main content

Assurance in Reinforcement Learning Using Quantitative Verification

  • Chapter
  • First Online:
Advances in Hybridization of Intelligent Methods

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 85))

Abstract

Reinforcement learning (RL) agents converge to optimal solutions for sequential decision making problems. Although increasingly successful, RL cannot be used in applications where unpredictable agent behaviour may have significant unintended negative consequences. We address this limitation by introducing an assured reinforcement learning (ARL) method which uses quantitative verification (QV) to restrict the agent behaviour to areas that satisfy safety, reliability and performance constraints specified in probabilistic temporal logic. To this end, ARL builds an abstract Markov decision process (AMDP) that models the problem to solve at a high level, and uses QV to identify a set of Pareto-optimal AMDP policies that satisfy the constraints. These formally verified abstract policies define areas of the agent behaviour space where RL can occur without constraint violations. We show the effectiveness of our ARL method through two case studies: a benchmark flag-collection navigation task and an assisted-living planning system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wiering, M., Otterlo, M.: Reinforcement learning and markov decision processes. In: Wiering, M., Otterlo, M. (eds.) Reinforcement Learning: State-of-the-art, pp. 3–42. Springer, Berlin, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Riedmiller, M., Gabel, T., Hafner, R.: Reinforcement learning for robot soccer. Auton. Robots 27(1), 55–73 (2009)

    Article  Google Scholar 

  3. Kwok, C., Fox, D.: reinforcement learning for sensing strategies. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3158–3163. IEEE (2004)

    Google Scholar 

  4. Baxter, J., Tridgell, A., Weaver, L.: KnightCap: A chess program that learns by combining TD(\(\lambda \)) with minimax search. In: 15th International Conference on Machine Learning, pp. 28–36. Morgan Kaufmann (1998)

    Google Scholar 

  5. Bagnell, J.A., Schneider, J.G.: autonomous helicopter control using reinforcement learning policy search methods. In: IEEE International Conference on Robotics and Automation. IEEE (2001)

    Google Scholar 

  6. Calinescu, R., Ghezzi, C., Kwiatkowska, M., Mirandola, R.: Self-adaptive software needs quantitative verification at runtime. Commun. ACM 55(9), 69–77 (2012)

    Article  Google Scholar 

  7. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  8. Mason, G., Calinescu, R., Kudenko, D., Banks, A.: Combining reinforcement learning and quantitative verification for agent policy assurance. In: 6th International Workshop on Combinations of Intelligent Methods and Applications, pp. 45–52 (2016)

    Google Scholar 

  9. Kwiatkowska, M.: Quantitative verification: models, techniques and tools. In: 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 449–458. ACM Press (2007)

    Google Scholar 

  10. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)

    Article  MATH  Google Scholar 

  11. Marthi, B.: Automatic shaping and decomposition of reward functions. In: 24th International Conference on Machine learning, pp. 601–608. ACM (2007)

    Google Scholar 

  12. Lahijanian, M., Andersson, S.B., Belta, C.: Temporal logic motion planning and control with probabilistic satisfaction guarantees. IEEE Trans. Robot. 28(2), 396–409 (2012)

    Article  Google Scholar 

  13. Cizelj, I., Ding, X.C. and Lahijanian, M., Pinto, A., Belta, C.: Probabilistically safe vehicle control in a hostile environment. In: 18th World Congress of the The International Federation of Automatic Control, pp. 11803–11808. (2011)

    Google Scholar 

  14. Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Synthesis of human-in-the-loop control protocols for autonomous systems. IEEE Trans. Autom. Sci. Eng. 13(2), 450–462 (2016)

    Article  Google Scholar 

  15. Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: 9th International Symposium on Artificial Intelligence and Mathematics, pp. 531–539 (2006)

    Google Scholar 

  16. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  17. Andova, S., Hermanns, H., Katoen, J.-P.: Discrete-time rewards model-checked. In: Larsen, K.G., Niebert, P. (eds) Formal Modeling and Analysis of Timed Systems. LNCS, vol. 2791, pp. 88–104 (2004)

    Google Scholar 

  18. Abe, N., Melville, P., Pendus, C., Reddy, C.K., Jensen, D.L., Thomas, V.P., Bennett, J.J., Anderson, G.F., Cooley, B.R., Kowalczyk, M., Domick, M., Gardinier, T.: Optimizing debt collections using constrained reinforcement learning. In: 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 75–84. ACM (2010)

    Google Scholar 

  19. Castro, D.D., Tamar, A., Mannor, S.: Policy gradients with variance related risk criteria. In: 29th International Conference on Machine Learning, pp. 935–942. ACM (2012)

    Google Scholar 

  20. Delage, E., Mannor, S.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) 17th European Conference on Machine Learning. LNAI vol. 4212, pp. 646–653. Springer, Heidelberg (2006)

    Google Scholar 

  22. Moldovan, T.M., Abbeel, P.: Safe exploration in markov decision processes. In: 29th International Conference on Machine Learning, pp. 1711–1718. ACM (2012)

    Google Scholar 

  23. Ponda, S.S., Johnson, L.B., How, J.P.: Risk allocation strategies for distributed chance-constrained task allocation. In: American Control Conference, pp. 3230–3236. IEEE (2013)

    Google Scholar 

  24. Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: 15th National Conference on Artificial Intelligence, pp. 761–768. AAAI Press (1998)

    Google Scholar 

  25. Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Trans. Inf. Technol. Biomed. 10(2), 323–33 (2006)

    Article  Google Scholar 

  26. Kwiatkowska, M.: advances in quantitative verification for ubiquitous computing. In: Liu, Z., Woodcock, J., Zhu, H. (eds.) 10th International Colloquium on Theoretical Aspects of Computing. LNCS vol. 8049, pp. 42–58. Springer (2013)

    Google Scholar 

  27. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) 23rd International Conference on Computer Aided Verification. LNCS vol. 6806, pp. 585–591. Springer (2011)

    Google Scholar 

  28. Katoen, J.-P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The Ins and Outs of the probabilistic model checker MRMC. Perform. Eval. 68(2), 90–104 (2011)

    Article  Google Scholar 

  29. Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) 7th International Conference on Formal Methods for Performance Evaluation. LNCS vol. 4486, pp. 220–270. Springer (2007)

    Google Scholar 

  30. Watkins, C.J.C.H., Dayan, P.: Q-Learning. Mach. Learn. 8(3), 279–292 (1992)

    MATH  Google Scholar 

  31. Xia, L., Jia, Q.-S.: Policy Iteration for parameterized markov decision processes and its application. In: 9th Asian Control Conference, pp. 1–6. IEEE (2013)

    Google Scholar 

  32. Liu, C., Xu, X., Hu, D.: Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans. Syst. Man Cybern. Syst. 45(3), 385–398 (2015)

    Article  Google Scholar 

  33. Gerasimou, S., Tamburrelli, G., Calinescu, R.: Search-based synthesis of probabilistic models for quality-of-service software engineering. In: 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 319–330. IEEE (2015)

    Google Scholar 

  34. Arcuri, A., Briand, L.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 33rd International Conference on Software Engineering, pp. 1–10. ACM (2011)

    Google Scholar 

  35. Calinescu, R., Kikuchi, S., Johnson, K.: Compositional reverification of probabilistic safety properties for large-scale complex IT systems. In: Large-Scale Complex IT Systems. Development, Operation and Management, pp. 303–329. Springer (2012)

    Google Scholar 

  36. Calinescu, R., Johnson, K., Rafiq, Y.: Developing self-verifying service-based systems. In: 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 734–737 (2013)

    Google Scholar 

  37. Calinescu, R., Gerasimou, S., Banks, A.: Self-adaptive software with decentralised control loops. In: 17th International Conference on Fundamental Approaches to Software Engineering, pp. 235–251. Springer (2015)

    Google Scholar 

  38. Gerasimou, S., Calinescu, R., Banks, A.: Efficient runtime quantitative verification using caching, lookahead, and nearly-optimal reconfiguration. In: 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pp. 115–124 (2014)

    Google Scholar 

  39. Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Conference on Machine Learning, pp. 105–111. (1994)

    Google Scholar 

  40. Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002)

    Article  MATH  Google Scholar 

  41. Driessens, K., Džeroski, S.: Integrating Guidance into Relational Reinforcement Learning. Mach. Learn. 57(3), 271–304 (2004)

    Article  MATH  Google Scholar 

  42. Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1), 51–80 (2011)

    Article  MathSciNet  Google Scholar 

  43. Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: 15th International Conference on Machine Learning, pp. 197–205. Morgan Kaufmann (1998)

    Google Scholar 

  44. Mannor, S., Shimkin, N.: A geometric approach to multi-criterion reinforcement learning. J. Mach. Learn. Res. 5, 325–360 (2004)

    MathSciNet  MATH  Google Scholar 

  45. Barrett, L., Narayanan, S.: learning all optimal policies with multiple criteria. In: 25th International Conference on Machine learning, pp. 41–47. Omni Press (2008)

    Google Scholar 

  46. Moffaert, K.V., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15(1), 3663–3692 (2014)

    MathSciNet  MATH  Google Scholar 

  47. Mason, G., Calinescu, R., Kudenko, D., Banks, A.: assured reinforcement learning for safety-critical applications. In: Doctoral Consortium on Agents and Artificial Intelligence (DCAART ’17), pp. 9–16 (2017)

    Google Scholar 

  48. Mason, G., Calinescu, R., Kudenko, D., Banks, A.: Assured reinforcement learning with formally verified abstract policies. In: 9th International Conference on Agents and Artificial Intelligence, pp. 105–117 (2017)

    Google Scholar 

  49. Calinescu, R., Johnson, K., Rafiq, Y.: Using observation ageing to improve Markovian model learning in QoS engineering. In: 2nd International Conference on Performance Engineering, pp. 505–510 (2011)

    Google Scholar 

  50. Calinescu, R., Rafiq, Y., Johnson, K., Bakir, M.E.: Adaptive model learning for continual verification of non-functional properties. In: 5th International Conference on Performance Engineering, pp. 87–98 (2014)

    Google Scholar 

  51. Efthymiadis, K., Kudenko, D.: Knowledge revision for reinforcement learning with abstract MDPs. In: Bordini, R.H., Elkind, E., Weiss, G., Yolum, P. (eds.) 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 763–770. (2015)

    Google Scholar 

Download references

Acknowledgements

This paper presents research sponsored by the UK MOD. The information contained in it should not be interpreted as representing the views of the UK MOD, nor should it be assumed it reflects any current or future UK MOD policy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Mason .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Mason, G., Calinescu, R., Kudenko, D., Banks, A. (2018). Assurance in Reinforcement Learning Using Quantitative Verification. In: Hatzilygeroudis, I., Palade, V. (eds) Advances in Hybridization of Intelligent Methods. Smart Innovation, Systems and Technologies, vol 85. Springer, Cham. https://doi.org/10.1007/978-3-319-66790-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66790-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66789-8

  • Online ISBN: 978-3-319-66790-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics