Learning Moore machines from input–output traces

Abstract

The problem of learning automata from example traces (but no equivalence or membership queries) is fundamental in automata learning theory and practice. In this paper, we study this problem for finite-state machines with inputs and outputs, and in particular for Moore machines. We develop three algorithms for solving this problem: (1) the PTAP algorithm, which transforms a set of input–output traces into an incomplete Moore machine and then completes the machine with self-loops; (2) the PRPNI algorithm, which uses the well-known RPNI algorithm for automata learning to learn a product of automata encoding a Moore machine; and (3) the MooreMI algorithm, which directly learns a Moore machine using PTAP extended with state merging. We prove that MooreMI has the fundamental identification in the limit property. We compare the algorithms experimentally in terms of the size of the learned machine and several notions of accuracy, introduced in this paper. We also carry out a performance comparison against two existing tools (LearnLib and flexfringe). Finally, we compare with OSTIA, an algorithm that learns a more general class of transducers and find that OSTIA generally does not learn a Moore machine, even when fed with a characteristic sample.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Notes

  1. 1.

    The term smallest automaton is used in the exact identification problem, instead of the more well-known term minimal automaton. Among equivalent machines, one with the fewest states is called minimal. Among machines which are all consistent with a set of traces but not necessarily equivalent, one with the fewest states is called smallest.

  2. 2.

    We have implemented the k-tails algorithm and applied it on the characteristic sample for the Moore machine in Fig. 5a, described in Sect. 4.1. Using \(k = 0\), we get a non-deterministic machine of three states. Using any \(k > 0\), we get a deterministic machine of eight states. This excessive number of states is due to the way the k-tails equivalence relation is defined. In particular, in order for two input words to be considered equivalent, they must have successors in the training set with the same letters. This implies that a word with no successors in the training set can never be equivalent with a word with some successors, even if both words represent the same state in the target machine.

  3. 3.

    Note that there are generally different kinds of characteristic samples for different learners [13]. In this paper, our definition of the characteristic sample is designed with our MooreMI algorithm in mind, which is the natural extension for Moore machines of the RPNI algorithm.

  4. 4.

    https://sourceforge.net/projects/jhotdraw/.

  5. 5.

    https://jakarta.apache.org/.

  6. 6.

    The diameter of M is the smallest number of transitions needed to reach any state of M starting from the initial state.

References

  1. 1.

    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    Google Scholar 

  2. 2.

    Vaandrager, F.: Model learning. Commun. ACM 60(2), 86–95 (2017)

    Article  Google Scholar 

  3. 3.

    Tripakis, S.: Data-driven and model-based design. In: 1st IEEE International Conference on Industrial Cyber-Physical Systems (ICPS) (2018)

  4. 4.

    Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice Hall, Upper Saddle River (1999)

    Google Scholar 

  5. 5.

    Solar-Lezama, A.: Program sketching. STTT 15(5–6), 475–495 (2013)

    Article  Google Scholar 

  6. 6.

    Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: 38th POPL, pp. 317–330 (2011)

  7. 7.

    Seshia, S.A.: Sciduction: combining induction, deduction, and structure for verification and synthesis. In: DAC, pp. 356–365 (2012)

  8. 8.

    Ray, B., Posnett, D., Filkov, V., Devanbu, P.: A large scale study of programming languages and code quality in github. In: ACM SIGSOFT, FSE’14 (2014)

  9. 9.

    Alur, R., Martin, M., Raghothaman, M., Stergiou, C., Tripakis, S., Udupa, A.: Synthesizing finite-state protocols from scenarios and requirements. In: HVC, Volume 8855 of LNCS. Springer (2014)

  10. 10.

    Alur, R., Tripakis, S.: Automatic synthesis of distributed protocols. SIGACT News 48(1), 55–90 (2017)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Zeller, A.: Why Programs Fail—A Guide to Systematic Debugging, 2nd edn. Academic Press, Cambridge (2009)

    Google Scholar 

  12. 12.

    Kohavi, Z.: Switching and Finite Automata Theory, 2nd edn. McGraw-Hill, New York (1978)

    Google Scholar 

  13. 13.

    de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. CUP, Cambridge (2010)

    Google Scholar 

  14. 14.

    Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)

    MathSciNet  MATH  Article  Google Scholar 

  15. 15.

    Raffelt, H., Steffen, B.: Learnlib: a library for automata learning and experimentation, vol. 3922, pp. 377–380 (2006)

  16. 16.

    Verwer, S., Hammerschmidt, C.: flexfringe: a passive automaton learning package, pp. 638–642 (2017)

  17. 17.

    Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)

    Article  Google Scholar 

  18. 18.

    Giantamidis, G., Tripakis, S.: Learning Moore machines from input–output traces. In: Fitzgerald, J.S., Heitmeyer, C.L., Gnesi, S., Philippou, A. (eds.) 21st International Symposium on Formal Methods (FM 2016), Volume 9995 of LNCS, pp. 291–309 (2016)

  19. 19.

    Mens, I.-E., Maler, O.: Learning regular languages over large ordered alphabets. Log. Methods Comput. Sci. 11(3) (2015). https://doi.org/10.2168/LMCS-11(3:13)2015

  20. 20.

    Argyros, G., Stais, I., Kiayias, A., Keromytis, A.D.: Back in black: towards formal, black box analysis of sanitizers and filters. In: IEEE Symposium on Security and Privacy, SP 2016, pp. 91–109 (2016)

  21. 21.

    Drews, S., D’Antoni, L.: Learning symbolic automata. In: Tools and Algorithms for the Construction and Analysis of Systems—23rd International Conference, TACAS 2017, volume 10205 of LNCS, pp. 173–189 (2017)

  22. 22.

    Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the abbadingo one dfa learning competition and a new evidence-driven state merging algorithm. In: Honavar, V., Slutzki, G. (eds.) Grammatical Inference. Springer, Berlin (1998)

    Google Scholar 

  23. 23.

    Walkinshaw, N., Lambeau, B., Damas, C., Bogdanov, K., Dupont, P.: Stamina: a competition to encourage the development and assessment of software model inference techniques. Empir. Softw. Eng. 18(4), 791–824 (2013)

    Article  Google Scholar 

  24. 24.

    Verwer, S., Eyraud, R., Higuera, C.: Pautomac: a probabilistic automata and hidden markov models learning competition. Mach. Learn. 96(1), 129–154 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  25. 25.

    Jasper, M., Mues, M., Murtovi, A., Schlüter, M., Howar, F., Steffen, B., Schordan, M., Hendriks, D., Schiffelers, R., Kuppens, H., Vaandrager, F.W.: Rers 2019: combining synthesis with real-world models. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 101–115. Springer International Publishing, Cham (2019)

    Google Scholar 

  26. 26.

    Moore, E.F.: Gedanken-experiments on sequential machines. In: Automata Studies, number 34. Princeton University Press (1956)

  27. 27.

    Gill, A.: State-identification experiments in finite automata. Inf. Control 4, 132–154 (1961)

    MathSciNet  MATH  Article  Google Scholar 

  28. 28.

    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)

    MathSciNet  MATH  Article  Google Scholar 

  29. 29.

    Shahbaz, M., Groz, R.: Inferring mealy machines. In: FM 2009, pp. 207–222 (2009)

  30. 30.

    Jonsson, B.: Learning of automata models extended with data. In: SFM 2011, Advanced Lectures, pp. 327–349 (2011)

  31. 31.

    Cassel, S., Howar, F., Jonsson, B., Steffen, B.: Learning extended finite state machines. In: SEFM 2014, Proceedings, pp. 250–264 (2014)

  32. 32.

    Aarts, F., Vaandrager, F.: Learning I/O automata. In: CONCUR. Springer, pp. 71–85 (2010)

  33. 33.

    Howar, F., Steffen, B., Jonsson, B., Cassel, S.: Inferring canonical register automata. In: VMCAI 2012, Proceedings, pp. 251–266 (2012)

  34. 34.

    Aarts, F., Fiterau-Brostean, P., Kuppens, H., Vaandrager, F.W.: Learning register automata with fresh value generation. In: Theoretical Aspects of Computing—ICTAC, volume 9399 of LNCS, pp. 165–183 (2015)

  35. 35.

    Medhat, R., Ramesh, S., Bonakdarpour, B., Fischmeister, S.: A framework for mining hybrid automata from input/output traces. In: Embedded Software (EMSOFT), pp. 177–186 (2015)

  36. 36.

    Gold, E.M.: Complexity of automaton identification from given data. Inf. Control 37(3), 302–320 (1978)

    MathSciNet  MATH  Article  Google Scholar 

  37. 37.

    Heule, M.J., Verwer, S.: Software model synthesis using satisfiability solvers. Empir. Softw. Eng. 18(4), 825–856 (2013)

    Article  Google Scholar 

  38. 38.

    Ulyantsev, V., Zakirzyanov, I., Shalyto, A.: BFS-based symmetry breaking predicates for DFA identification. In: Language and Automata Theory and Applications (LATA), volume 8977 of LNCS. Springer, pp. 611–622 (2015)

  39. 39.

    Oncina, J., Garcia, P.: Identifying regular languages in polynomial time. In: Advances in Structural and Syntactic Pattern Recognition, pp. 99–108 (1992)

  40. 40.

    Dupont, P.: Incremental regular inference. In: ICGI-96, pp. 222–237 (1996)

  41. 41.

    Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In: ICGI-98, pp. 1–12 (1998)

  42. 42.

    Beschastnikh, I., Brun, Y., Ernst, M.D., Krishnamurthy, A.: Inferring models of concurrent systems from logs of their behavior with csight. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. ACM, New York, NY, USA, pp. 468–479 (2014)

  43. 43.

    Verwer, S., de Weerdt, M., Witteveen, C.: A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data. In: Sempere, J.M., García, P. (eds.) Grammatical Inference: Theoretical Results and Applications, pp. 203–216. Springer, Berlin (2010)

    Google Scholar 

  44. 44.

    Walkinshaw, N., Taylor, R., Derrick, J.: Inferring extended finite state machine models from software executions. Empir. Softw. Eng. 21(3), 811–853 (2016). https://doi.org/10.1007/s10664-015-9367-7

    Article  Google Scholar 

  45. 45.

    Spichakova, M.: An approach to inference of finite state machines based on gravitationally-inspired search algorithm. Proc. Estonian Acad. Sci. 62(1), 39–46 (2013)

    MATH  Article  Google Scholar 

  46. 46.

    Aleksandrov, A.V., Kazakov, S.V., Sergushichev, A.A., Tsarev, F.N., Shalyto, A.A.: The use of evolutionary programming based on training examples for the generation of finite state machines for controlling objects with complex behavior. J. Comput. Sys. Sc. Int. 52(3), 410–425 (2013)

    MATH  Article  Google Scholar 

  47. 47.

    Buzhinsky, I.P., Ulyantsev, V.I., Chivilikhin, D.S., Shalyto, A.A.: Inducing finite state machines from training samples using ant colony optimization. J. Comput. Sys. Sc. Int. 53(2), 256–266 (2014)

    MATH  Article  Google Scholar 

  48. 48.

    Meinke, K.: CGE: a sequential learning algorithm for mealy automata. In: Sempere, J.M., García, P. (eds.) Grammatical Inference: Theoretical Results and Applications, 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13–16, 2010. Proceedings, volume 6339 of LNCS. Springer, pp. 148–162 (2010)

  49. 49.

    Veelenturf, L.P.J.: Inference of sequential machines from sample computations. IEEE Trans. Comput. 27(2), 167–170 (1978)

    MathSciNet  MATH  Article  Google Scholar 

  50. 50.

    Takahashi, K., Fujiyoshi, A., Kasai, T.: A polynomial time algorithm to infer sequential machines. Syst. Comput. Jpn. 34(1), 59–67 (2003)

    Article  Google Scholar 

  51. 51.

    Biermann, A.W., Feldman, J.A.: On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput. 21(6), 592–597 (1972)

    MathSciNet  MATH  Article  Google Scholar 

  52. 52.

    Karthik, A.V., Ray, S., Nuzzo, P., Mishchenko, A., Brayton, R., Roychowdhury, J.: ABCD-NL: approximating continuous non-linear dynamical systems using purely Boolean models for analog/mixed-signal verification. In: ASP-DAC, pp. 250–255 (2014)

  53. 53.

    Grinchtein, O., Leucker, M.: Learning finite-state machines from inexperienced teachers. In: ICGI, pp. 344–345 (2006)

  54. 54.

    Leucker, M., Neider, D.: Learning minimal deterministic automata from inexperienced teachers. In: ISoLA, pp. 524–538 (2012)

  55. 55.

    Heitmeyer, C.L., Pickett, M., Leonard, E.I., Archer, M.M., Ray, I., Aha, D.W., Trafton, J.G.: Building high assurance human-centric decision systems. Autom. Softw. Eng. 22(2), 159–197 (2015)

    Article  Google Scholar 

  56. 56.

    Ulyantsev, V., Buzhinsky, I., Shalyto, A.: Exact finite-state machine identification from scenarios and temporal properties. STTT 20(1), 35–55 (2018)

    Article  Google Scholar 

  57. 57.

    Gulwani, S., Srivastava, S., Venkatesan, R.: Program analysis as constraint solving. In: PLDI’08. ACM, pp. 281–292 (2008)

  58. 58.

    Colón, M.A., Sankaranarayanan, S., Sipma, H.B.: Linear invariant generation using non-linear constraint solving. In: CAV. Springer, pp. 420–432 (2003)

  59. 59.

    Gupta, A., Rybalchenko, A.: Invgen: an efficient invariant generator. In: Computer Aided Verification, CAV. Springer, pp. 634–640 (2009)

  60. 60.

    Ackermann, C., Cleaveland, R., Huang, S., Ray, A., Shelton, C., Latronico, E.: Automatic requirement extraction from test cases. In: Runtime Verification, RV’10 (2010)

  61. 61.

    Jin, X., Donz, A., Deshmukh, J.V., Seshia, S.A.: Mining requirements from closed-loop control models. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(11), 1704–1717 (2015)

    Article  Google Scholar 

  62. 62.

    Lemieux, C., Park, D., Beschastnikh, I.: General LTL specification mining. In: Automated Software Engineering (ASE), pp. 81–92 (2015)

  63. 63.

    Ammons, G., Bodík, R., Larus, J.R.: Mining specifications. In: POPL’02. ACM, pp. 4–16 (2002)

  64. 64.

    Lee, D., Yannakakis, M.: Principles and methods of testing finite state machines—a survey. Proc. IEEE 84(8), 1090–1123 (1996)

    Article  Google Scholar 

  65. 65.

    Chow, T.S.: Testing software design modeled by finite-state machines. IEEE Trans. Softw. Eng. 4(3), 178–187 (1978)

    MATH  Article  Google Scholar 

  66. 66.

    Dorofeeva, R., El-Fakih, K., Maag, S., Cavalli, A.R., Yevtushenko, N.: Fsm-based conformance testing methods: a survey annotated with experimental evaluation. Inf. Softw. Technol. 52(12), 1286–1297 (2010)

    Article  Google Scholar 

  67. 67.

    Berg, T., Grinchtein, O., Jonsson, B., Leucker, M., Raffelt, H., Steffen, B.: On the correspondence between conformance testing and regular inference. In: FASE, volume 3442 of LNCS. Springer, pp. 175–189 (2005)

  68. 68.

    Sorower, M.S.: A literature survey on algorithms for multi-label learning. Technical report (2010)

  69. 69.

    Coste, F., Nicolas, J.: ICGI-98, chapter How considering incompatible state mergings may reduce the DFA induction search tree. Springer, pp. 199–210 (1998)

  70. 70.

    The D Programming Language. https://dlang.org/

  71. 71.

    Walkinshaw, N., Bogdanov, K.: Inferring finite-state models with temporal constraints. In: ASE, pp. 248–257 (2008)

  72. 72.

    Tsarev, F., Egorov, K.: Finite state machine induction using genetic algorithm based on testing and model checking. In: 13th Annual Genetic and Evolutionary Computation Conference, GECCO, pp. 759–762 (2011)

  73. 73.

    Lo, D., Khoo, S.-C.: Smartic: towards building an accurate, robust and scalable specification miner. In: FSE. ACM, New York, NY, USA, pp. 265–275 (2006)

  74. 74.

    Akram, H.I., de la Higuera, C., Xiao, H., Eckert, C.: Grammatical inference algorithms in matlab. In: ICGI’10. Springer, pp. 262–266 (2010)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Georgios Giantamidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by the Academy of Finland and the U.S. National Science Foundation (Awards #1329759 and #1139138). This work was partially supported by the Irish Development Agency (IDA) for UTRC Ireland related to Network of Excellence in Aerospace Cyber Physical Systems.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Giantamidis, G., Tripakis, S. & Basagiannis, S. Learning Moore machines from input–output traces. Int J Softw Tools Technol Transfer (2019). https://doi.org/10.1007/s10009-019-00544-0

Download citation

Keywords

  • Finite state machine
  • Moore machine
  • Mealy machine
  • Automata learning
  • Passive learning
  • Characteristic sample