$$L^*$$ -Based Learning of Markov Decision Processes

Tappler, Martin; Aichernig, Bernhard K.; Bacci, Giovanni; Eichlseder, Maria; Larsen, Kim G.

doi:10.1007/978-3-030-30942-8_38

Martin Tappler¹¹,
Bernhard K. Aichernig¹¹,
Giovanni Bacci¹³,
Maria Eichlseder¹² &
…
Kim G. Larsen¹³

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11800))

Included in the following conference series:

International Symposium on Formal Methods

1739 Accesses
18 Citations
3 Altmetric

Abstract

Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient.

An influential active learning technique is Angluin’s $L^*$ algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study $L^*$-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is inspired by the introduction of chaos states in $\mathbf {ioco}$-based learning [43].

References

Aichernig, B.K., Mostowski, W., Mousavi, M.R., Tappler, M., Taromirad, M.: Model learning and model-based testing. In: Bennaceur et al. [9], pp. 74–100. https://doi.org/10.1007/978-3-319-96562-8_3
Aichernig, B.K., Tappler, M.: Efficient active automata learning via mutation testing. J. Autom. Reasoning (2018). https://doi.org/10.1007/s10817-018-9486-0
Aichernig, B.K., Tappler, M.: Probabilistic black-box reachability checking (extended version). Formal Methods Syst. Des. (2019). https://doi.org/10.1007/s10703-019-00333-0
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). https://doi.org/10.1016/0890-5401(87)90052-6
Article MathSciNet MATH Google Scholar
Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: MDPDist library. http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip. Accessed 28 June 2019
Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: Computing behavioral distances, compositionally. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 74–85. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40313-2_9
Chapter MATH Google Scholar
Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: The BisimDist library: efficient computation of bisimilarity distances for Markovian models. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 278–281. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40196-1_23
Chapter MATH Google Scholar
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Bennaceur, A., Hähnle, R., Meinke, K. (eds.): Machine Learning for Dynamic Software Analysis: Potentials and Limits. LNCS, vol. 11026. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96562-8
Book Google Scholar
Bergadano, F., Varricchio, S.: Learning behaviors of automata from multiplicity and equivalence queries. SIAM J. Comput. 25(6), 1268–1280 (1996). https://doi.org/10.1137/S009753979326091X
Article MathSciNet MATH Google Scholar
Bernardo, M., Issarny, V. (eds.): Formal Methods for Eternal Networked Software Systems - 11th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2011, Bertinoro, Italy, 13–18 June 2011, Advanced Lectures. LNCS, vol. 6659. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58473-0_144
Chapter MATH Google Scholar
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. ITA 33(1), 1–20 (1999). https://doi.org/10.1051/ita:1999102
Article MathSciNet MATH Google Scholar
Cassel, S., Howar, F., Jonsson, B., Steffen, B.: Active learning for extended finite state machines. Formal Aspects Comput. 28(2), 233–263 (2016). https://doi.org/10.1007/s00165-016-0355-5
Article MathSciNet MATH Google Scholar
Castro, J., Gavaldà, R.: Learning probability distributions generated by finite-state machines. In: Heinz, J., Sempere, J.M. (eds.) Topics in Grammatical Inference, pp. 113–142. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48395-4_5
Chapter Google Scholar
Chen, Y., Nielsen, T.D.: Active learning of Markov decision processes for system verification. In: 11th International Conference on Machine Learning and Applications, ICMLA, Boca Raton, FL, USA, 12–15 December 2012, vol. 2, pp. 289–294. IEEE (2012). https://doi.org/10.1109/ICMLA.2012.158
Feng, L., Han, T., Kwiatkowska, M.Z., Parker, D.: Learning-based compositional verification for synchronous probabilistic systems. In: Bultan, T., Hsiung, P.-A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 511–521. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24372-1_40
Chapter MATH Google Scholar
Fiterău-Broştean, P., Janssen, R., Vaandrager, F.: Combining model learning and model checking to analyze TCP implementations. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 454–471. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6_25
Chapter Google Scholar
Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo and Issarny [11], pp. 53–113. https://doi.org/10.1007/978-3-642-21455-4_3
Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Fox, D., Kavraki, L.E., Kurniawati, H. (eds.) Robotics: Science and Systems X, University of California, Berkeley, USA, 12–16 July 2014 (2014). http://www.roboticsproceedings.org/rss10/p39.html
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963). http://www.jstor.org/stable/2282952
Article MathSciNet Google Scholar
Howar, F., Steffen, B.: Active automata learning in practice - an annotated bibliography of the years 2011 to 2016. In: Bennaceur et al. [9], pp. 123–148. https://doi.org/10.1007/978-3-319-96562-8_5
Hungar, H., Niese, O., Steffen, B.: Domain-specific optimization in automata learning. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 315–327. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-6_31
Chapter Google Scholar
Isberner, M., Howar, F., Steffen, B.: The TTT algorithm: a redundancy-free approach to active automata learning. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 307–322. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11164-3_26
Chapter Google Scholar
Khalili, A., Tacchella, A.: Learning nondeterministic Mealy machines. In: Clark, A., Kanazawa, M., Yoshinaka, R. (eds.) Proceedings of the 12th International Conference on Grammatical Inference, ICGI 2014, Kyoto, Japan, 17–19 September 2014. JMLR Workshop and Conference Proceedings, vol. 34, pp. 109–123. JMLR.org (2014). http://jmlr.org/proceedings/papers/v34/khalili14a.html
Kwiatkowska, M.Z., Norman, G., Parker, D.: Analysis of a gossip protocol in PRISM. SIGMETRICS Perform. Eval. Rev. 36(3), 17–22 (2008). https://doi.org/10.1145/1481506.1481511
Article Google Scholar
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Fahrenberg, U., Legay, A., Thrane, C.R. (eds.) Proceedings Quantities in Formal Methods, QFM 2012, Paris, France, 28 August 2012. EPTCS, vol. 103, pp. 49–63 (2012). https://doi.org/10.4204/EPTCS.103.6
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning deterministic probabilistic automata from a model checking perspective. Mach. Learn. 105(2), 255–299 (2016). https://doi.org/10.1007/s10994-016-5565-9
Article MathSciNet MATH Google Scholar
Margaria, T., Niese, O., Raffelt, H., Steffen, B.: Efficient test-based model generation for legacy reactive systems. In: Ninth IEEE International High-Level Design Validation and Test Workshop 2004, pp. 95–100. IEEE Computer Society (2004). https://doi.org/10.1109/HLDVT.2004.1431246
Nerode, A.: Linear automaton transformations. Proc. Am. Math. Soc. 9, 541–544 (1958)
Article MathSciNet Google Scholar
Norman, G., Shmatikov, V.: Analysis of probabilistic contract signing. J. Comput. Secur. 14(6), 561–589 (2006). http://content.iospress.com/articles/journal-of-computer-security/jcs268
Article Google Scholar
Rivest, R.L., Schapire, R.E.: Inference of finite automata using homing sequences. Inf. Comput. 103(2), 299–347 (1993). https://doi.org/10.1006/inco.1993.1021
Article MathSciNet MATH Google Scholar
de Ruiter, J., Poll, E.: Protocol state fuzzing of TLS implementations. In: Jung, J., Holz, T. (eds.) 24th USENIX Security Symposium, USENIX Security 15, Washington, D.C., USA, 12–14 August 2015, pp. 193–206. USENIX Association (2015). https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/de-ruiter
Shahbaz, M., Groz, R.: Inferring Mealy machines. In: Cavalcanti, A., Dams, D.R. (eds.) FM 2009. LNCS, vol. 5850, pp. 207–222. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05089-3_14
Chapter Google Scholar
Steffen, B., Howar, F., Merten, M.: Introduction to active automata learning from a practical perspective. In: Bernardo and Issarny [11], pp. 256–296. https://doi.org/10.1007/978-3-642-21455-4_8
Tappler, M.: Evaluation material for ${L}^*$-based learning of Markov decision processes. https://doi.org/10.6084/m9.figshare.7960928.v1
Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G.: $L^*$-based learning of Markov decision processes (extended version). CoRR arXiv:1906.12239 (2019), http://arxiv.org/abs/1906.12239
Tappler, M., Aichernig, B.K., Bloem, R.: Model-based testing IoT communication via active automata learning. In: 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, 13–17 March 2017, pp. 276–287. IEEE Computer Society (2017). https://doi.org/10.1109/ICST.2017.32
Tretmans, J.: Test generation with inputs, outputs and repetitive quiescence. Softw. Concepts Tools 17(3), 103–120 (1996)
MATH Google Scholar
Tzeng, W.: Learning probabilistic automata and Markov chains via queries. Mach. Learn. 8, 151–166 (1992). https://doi.org/10.1007/BF00992862
Article MATH Google Scholar
Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984). https://doi.org/10.1145/1968.1972
Article MATH Google Scholar
Volpato, M., Tretmans, J.: Approximate active learning of nondeterministic input output transition systems. ECEASST 72 (2015). https://doi.org/10.14279/tuj.eceasst.72.1008
Willemse, T.A.C.: Heuristics for ioco-based test-based modelling. In: Brim, L., Haverkort, B., Leucker, M., van de Pol, J. (eds.) FMICS 2006. LNCS, vol. 4346, pp. 132–147. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70952-7_9
Chapter Google Scholar

Download references

Acknowledgment

The work of B. Aichernig, M. Eichlseder and M. Tappler has been carried out as part of the TU Graz LEAD project “Dependable Internet of Things in Adverse Environments”. The work of K. Larsen and G. Bacci has been supported by the Advanced ERC Grant nr. 867096 (LASSO).

Author information

Authors and Affiliations

Institute of Software Technology, Graz University of Technology, Graz, Austria
Martin Tappler & Bernhard K. Aichernig
Institute of Applied Information Processing and Communications, Graz University of Technology, Graz, Austria
Maria Eichlseder
Department of Computer Science, Aalborg University, Aalborg, Denmark
Giovanni Bacci & Kim G. Larsen

Authors

Martin Tappler
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard K. Aichernig
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Bacci
View author publications
You can also search for this author in PubMed Google Scholar
Maria Eichlseder
View author publications
You can also search for this author in PubMed Google Scholar
Kim G. Larsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Tappler .

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Pisa, Italy
Maurice H. ter Beek
Macquarie University, Sydney, NSW, Australia
Annabelle McIver
University of Minho, Braga, Portugal
José N. Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G. (2019). $L^*$-Based Learning of Markov Decision Processes. In: ter Beek, M., McIver, A., Oliveira, J. (eds) Formal Methods – The Next 30 Years. FM 2019. Lecture Notes in Computer Science(), vol 11800. Springer, Cham. https://doi.org/10.1007/978-3-030-30942-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-30942-8_38
Published: 23 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30941-1
Online ISBN: 978-3-030-30942-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(L^*\)-Based Learning of Markov Decision Processes

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

\(L^*\)-Based Learning of Markov Decision Processes

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation