Learning Unions of k-Testable Languages

  • Alexis LinardEmail author
  • Colin de la Higuera
  • Frits Vaandrager
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11417)


A classical problem in grammatical inference is to identify a language from a set of examples. In this paper, we address the problem of identifying a union of languages from examples that belong to several different unknown languages. Indeed, decomposing a language into smaller pieces that are easier to represent should make learning easier than aiming for a too generalized language. In particular, we consider k-testable languages in the strict sense (k-TSS). These are defined by a set of allowed prefixes, infixes (sub-strings) and suffixes that words in the language may contain. We establish a Galois connection between the lattice of all languages over alphabet \(\varSigma \), and the lattice of k-TSS languages over \(\varSigma \). We also define a simple metric on k-TSS languages. The Galois connection and the metric allow us to derive an efficient algorithm to learn the union of k-TSS languages. We evaluate our algorithm on an industrial dataset and thus demonstrate the relevance of our approach.


Grammatical inference k-Testable languages Union of languages Galois connection 


  1. 1.
    Benzécri, J.P.: Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Les cahiers de l’analyse des données 7(2), 209–218 (1982)zbMATHGoogle Scholar
  2. 2.
    Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 115–126 (2006)Google Scholar
  3. 3.
    Coste, F.: Learning the language of biological sequences. In: Heinz, J., Sempere, J.M. (eds.) Topics in Grammatical Inference, pp. 215–247. Springer, Heidelberg (2016). Scholar
  4. 4.
    García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(9), 920–925 (1990)CrossRefGoogle Scholar
  5. 5.
    Garcia, P., Vidal, E., Oncina, J.: Learning locally testable languages in the strict sense. In: First International Workshop Algorithmic Learning Theory (ALT), pp. 325–338 (1990)Google Scholar
  6. 6.
    Gold, M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)MathSciNetCrossRefGoogle Scholar
  7. 7.
    de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
  8. 8.
    Linard, A.: Learning several languages from labeled strings: state merging and evolutionary approaches. arXiv preprint arXiv:1806.01630 (2018)
  9. 9.
    Linard, A., Smetsers, R., Vaandrager, F., Waqas, U., van Pinxten, J., Verwer, S.: Learning pairwise disjoint simple languages from positive examples. arXiv preprint arXiv:1706.01663 (2017)
  10. 10.
    McNaughton, R., Papert, S.A.: Counter-Free Automata (M.I.T. Research Monograph No. 65). The MIT Press (1971)Google Scholar
  11. 11.
    Nielson, F., Nielson, H., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999). Scholar
  12. 12.
    Rogers, J., Pullum, G.K.: Aural pattern recognition experiments and the subregular hierarchy. J. Log. Lang. Inf. 20(3), 329–342 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Tantini, F., Terlutte, A., Torre, F.: Sequences classification by least general generalisations. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS (LNAI), vol. 6339, pp. 189–202. Springer, Heidelberg (2010). Scholar
  14. 14.
    Torres, I., Varona, A.: k-TSS language models in speech recognition systems. Comput. Speech Lang. 15(2), 127–148 (2001)CrossRefGoogle Scholar
  15. 15.
    Umar, W., et al.: A fast estimator of performance with respect to the design parameters of self re-entrant flowshops. In: Euromicro Conference on Digital System Design, pp. 215–221 (2016)Google Scholar
  16. 16.
    Yokomori, T., Kobayashi, S.: Learning local languages and their application to dna sequence analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(10), 1067–1079 (1998)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Alexis Linard
    • 1
    Email author
  • Colin de la Higuera
    • 2
  • Frits Vaandrager
    • 1
  1. 1.Institute for Computing and Information ScienceRadboud UniversityNijmegenThe Netherlands
  2. 2.Laboratoire des Sciences du Numérique de NantesUniversité de NantesNantesFrance

Personalised recommendations