Advertisement

\(L^*\)-Based Learning of Markov Decision Processes

  • Martin TapplerEmail author
  • Bernhard K. Aichernig
  • Giovanni Bacci
  • Maria Eichlseder
  • Kim G. Larsen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11800)

Abstract

Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient.

An influential active learning technique is Angluin’s \(L^*\) algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study \(L^*\)-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.

Keywords

Model inference Active automata learning Markov decision processes 

Notes

Acknowledgment

The work of B. Aichernig, M. Eichlseder and M. Tappler has been carried out as part of the TU Graz LEAD project “Dependable Internet of Things in Adverse Environments”. The work of K. Larsen and G. Bacci has been supported by the Advanced ERC Grant nr. 867096 (LASSO).

References

  1. 1.
    Aichernig, B.K., Mostowski, W., Mousavi, M.R., Tappler, M., Taromirad, M.: Model learning and model-based testing. In: Bennaceur et al. [9], pp. 74–100.  https://doi.org/10.1007/978-3-319-96562-8_3
  2. 2.
    Aichernig, B.K., Tappler, M.: Efficient active automata learning via mutation testing. J. Autom. Reasoning (2018).  https://doi.org/10.1007/s10817-018-9486-0
  3. 3.
    Aichernig, B.K., Tappler, M.: Probabilistic black-box reachability checking (extended version). Formal Methods Syst. Des. (2019).  https://doi.org/10.1007/s10703-019-00333-0
  4. 4.
    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987).  https://doi.org/10.1016/0890-5401(87)90052-6MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: MDPDist library. http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip. Accessed 28 June 2019
  6. 6.
    Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: Computing behavioral distances, compositionally. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 74–85. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40313-2_9CrossRefzbMATHGoogle Scholar
  7. 7.
    Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: The BisimDist library: efficient computation of bisimilarity distances for Markovian models. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 278–281. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40196-1_23CrossRefzbMATHGoogle Scholar
  8. 8.
    Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)zbMATHGoogle Scholar
  9. 9.
    Bennaceur, A., Hähnle, R., Meinke, K. (eds.): Machine Learning for Dynamic Software Analysis: Potentials and Limits. LNCS, vol. 11026. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-96562-8 CrossRefGoogle Scholar
  10. 10.
    Bergadano, F., Varricchio, S.: Learning behaviors of automata from multiplicity and equivalence queries. SIAM J. Comput. 25(6), 1268–1280 (1996).  https://doi.org/10.1137/S009753979326091XMathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Bernardo, M., Issarny, V. (eds.): Formal Methods for Eternal Networked Software Systems - 11th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2011, Bertinoro, Italy, 13–18 June 2011, Advanced Lectures. LNCS, vol. 6659. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21455-4
  12. 12.
    Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994).  https://doi.org/10.1007/3-540-58473-0_144CrossRefzbMATHGoogle Scholar
  13. 13.
    Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. ITA 33(1), 1–20 (1999).  https://doi.org/10.1051/ita:1999102MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Cassel, S., Howar, F., Jonsson, B., Steffen, B.: Active learning for extended finite state machines. Formal Aspects Comput. 28(2), 233–263 (2016).  https://doi.org/10.1007/s00165-016-0355-5MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Castro, J., Gavaldà, R.: Learning probability distributions generated by finite-state machines. In: Heinz, J., Sempere, J.M. (eds.) Topics in Grammatical Inference, pp. 113–142. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-48395-4_5CrossRefGoogle Scholar
  16. 16.
    Chen, Y., Nielsen, T.D.: Active learning of Markov decision processes for system verification. In: 11th International Conference on Machine Learning and Applications, ICMLA, Boca Raton, FL, USA, 12–15 December 2012, vol. 2, pp. 289–294. IEEE (2012).  https://doi.org/10.1109/ICMLA.2012.158
  17. 17.
    Feng, L., Han, T., Kwiatkowska, M.Z., Parker, D.: Learning-based compositional verification for synchronous probabilistic systems. In: Bultan, T., Hsiung, P.-A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 511–521. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-24372-1_40CrossRefzbMATHGoogle Scholar
  18. 18.
    Fiterău-Broştean, P., Janssen, R., Vaandrager, F.: Combining model learning and model checking to analyze TCP implementations. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 454–471. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-41540-6_25CrossRefGoogle Scholar
  19. 19.
    Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo and Issarny [11], pp. 53–113.  https://doi.org/10.1007/978-3-642-21455-4_3
  20. 20.
    Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Fox, D., Kavraki, L.E., Kurniawati, H. (eds.) Robotics: Science and Systems X, University of California, Berkeley, USA, 12–16 July 2014 (2014). http://www.roboticsproceedings.org/rss10/p39.html
  21. 21.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963). http://www.jstor.org/stable/2282952MathSciNetCrossRefGoogle Scholar
  22. 22.
    Howar, F., Steffen, B.: Active automata learning in practice - an annotated bibliography of the years 2011 to 2016. In: Bennaceur et al. [9], pp. 123–148.  https://doi.org/10.1007/978-3-319-96562-8_5
  23. 23.
    Hungar, H., Niese, O., Steffen, B.: Domain-specific optimization in automata learning. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 315–327. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45069-6_31CrossRefGoogle Scholar
  24. 24.
    Isberner, M., Howar, F., Steffen, B.: The TTT algorithm: a redundancy-free approach to active automata learning. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 307–322. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11164-3_26CrossRefGoogle Scholar
  25. 25.
    Khalili, A., Tacchella, A.: Learning nondeterministic Mealy machines. In: Clark, A., Kanazawa, M., Yoshinaka, R. (eds.) Proceedings of the 12th International Conference on Grammatical Inference, ICGI 2014, Kyoto, Japan, 17–19 September 2014. JMLR Workshop and Conference Proceedings, vol. 34, pp. 109–123. JMLR.org (2014). http://jmlr.org/proceedings/papers/v34/khalili14a.html
  26. 26.
    Kwiatkowska, M.Z., Norman, G., Parker, D.: Analysis of a gossip protocol in PRISM. SIGMETRICS Perform. Eval. Rev. 36(3), 17–22 (2008).  https://doi.org/10.1145/1481506.1481511CrossRefGoogle Scholar
  27. 27.
    Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22110-1_47CrossRefGoogle Scholar
  28. 28.
    Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Fahrenberg, U., Legay, A., Thrane, C.R. (eds.) Proceedings Quantities in Formal Methods, QFM 2012, Paris, France, 28 August 2012. EPTCS, vol. 103, pp. 49–63 (2012).  https://doi.org/10.4204/EPTCS.103.6
  29. 29.
    Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning deterministic probabilistic automata from a model checking perspective. Mach. Learn. 105(2), 255–299 (2016).  https://doi.org/10.1007/s10994-016-5565-9MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Margaria, T., Niese, O., Raffelt, H., Steffen, B.: Efficient test-based model generation for legacy reactive systems. In: Ninth IEEE International High-Level Design Validation and Test Workshop 2004, pp. 95–100. IEEE Computer Society (2004).  https://doi.org/10.1109/HLDVT.2004.1431246
  31. 31.
    Nerode, A.: Linear automaton transformations. Proc. Am. Math. Soc. 9, 541–544 (1958)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Norman, G., Shmatikov, V.: Analysis of probabilistic contract signing. J. Comput. Secur. 14(6), 561–589 (2006). http://content.iospress.com/articles/journal-of-computer-security/jcs268CrossRefGoogle Scholar
  33. 33.
    Rivest, R.L., Schapire, R.E.: Inference of finite automata using homing sequences. Inf. Comput. 103(2), 299–347 (1993).  https://doi.org/10.1006/inco.1993.1021MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    de Ruiter, J., Poll, E.: Protocol state fuzzing of TLS implementations. In: Jung, J., Holz, T. (eds.) 24th USENIX Security Symposium, USENIX Security 15, Washington, D.C., USA, 12–14 August 2015, pp. 193–206. USENIX Association (2015). https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/de-ruiter
  35. 35.
    Shahbaz, M., Groz, R.: Inferring Mealy machines. In: Cavalcanti, A., Dams, D.R. (eds.) FM 2009. LNCS, vol. 5850, pp. 207–222. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-05089-3_14CrossRefGoogle Scholar
  36. 36.
    Steffen, B., Howar, F., Merten, M.: Introduction to active automata learning from a practical perspective. In: Bernardo and Issarny [11], pp. 256–296.  https://doi.org/10.1007/978-3-642-21455-4_8
  37. 37.
    Tappler, M.: Evaluation material for \({L}^*\)-based learning of Markov decision processes.  https://doi.org/10.6084/m9.figshare.7960928.v1
  38. 38.
    Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G.: \(L^*\)-based learning of Markov decision processes (extended version). CoRR arXiv:1906.12239 (2019), http://arxiv.org/abs/1906.12239
  39. 39.
    Tappler, M., Aichernig, B.K., Bloem, R.: Model-based testing IoT communication via active automata learning. In: 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, 13–17 March 2017, pp. 276–287. IEEE Computer Society (2017).  https://doi.org/10.1109/ICST.2017.32
  40. 40.
    Tretmans, J.: Test generation with inputs, outputs and repetitive quiescence. Softw. Concepts Tools 17(3), 103–120 (1996)zbMATHGoogle Scholar
  41. 41.
    Tzeng, W.: Learning probabilistic automata and Markov chains via queries. Mach. Learn. 8, 151–166 (1992).  https://doi.org/10.1007/BF00992862CrossRefzbMATHGoogle Scholar
  42. 42.
    Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984).  https://doi.org/10.1145/1968.1972CrossRefzbMATHGoogle Scholar
  43. 43.
    Volpato, M., Tretmans, J.: Approximate active learning of nondeterministic input output transition systems. ECEASST 72 (2015).  https://doi.org/10.14279/tuj.eceasst.72.1008
  44. 44.
    Willemse, T.A.C.: Heuristics for ioco-based test-based modelling. In: Brim, L., Haverkort, B., Leucker, M., van de Pol, J. (eds.) FMICS 2006. LNCS, vol. 4346, pp. 132–147. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-70952-7_9CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Martin Tappler
    • 1
    Email author
  • Bernhard K. Aichernig
    • 1
  • Giovanni Bacci
    • 3
  • Maria Eichlseder
    • 2
  • Kim G. Larsen
    • 3
  1. 1.Institute of Software TechnologyGraz University of TechnologyGrazAustria
  2. 2.Institute of Applied Information Processing and CommunicationsGraz University of TechnologyGrazAustria
  3. 3.Department of Computer ScienceAalborg UniversityAalborgDenmark

Personalised recommendations