Grammatical Inference in Software Engineering: An Overview of the State of the Art

  • Andrew Stevenson
  • James R. Cordy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7745)


Grammatical inference – used successfully in a variety of fields such as pattern recognition, computational biology and natural language processing – is the process of automatically inferring a grammar by examining the sentences of an unknown language. Software engineering can also benefit from grammatical inference. Unlike the aforementioned fields, which use grammars as a convenient tool to model naturally occuring patterns, software engineering treats grammars as first-class objects typically created and maintained for a specific purpose by human designers. We introduce the theory of grammatical inference and review the state of the art as it relates to software engineering.


grammatical inference software engineering grammar induction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adriaans, P., van Zaanen, M.: Computational grammatical inference. STUDFUZZ, vol. 194, pp. 187–203. Springer, Heidelberg (2006)Google Scholar
  2. 2.
    Angluin, D.: Inductive inference of formal languages from positive data. Information and Control 45(2), 117–135 (1980)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Angluin, D.: A note on the number of queries needed to identify regular languages. Information and Control 51(1), 76–87 (1981)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Angluin, D.: Inference of reversible languages. Journal of the ACM (JACM) 29, 741–765 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75, 87–106 (1987)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)Google Scholar
  7. 7.
    Angluin, D.: Negative results for equivalence queries. Machine Learning 5(2), 121–150 (1990)Google Scholar
  8. 8.
    Angluin, D., Kharitonov, M.: When won’t membership queries help? In: Proceedings of the Twenty-Third Annual ACM Symposium on Theory of Computing, STOC 1991, pp. 444–454. ACM, New York (1991)CrossRefGoogle Scholar
  9. 9.
    Ates, K., Kukluk, J., Holder, L., Cook, D., Zhang, K.: Graph grammar induction on structural data for visual programming. In: 18th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2006, pp. 232–242 (November 2006)Google Scholar
  10. 10.
    Bugalho, M., Oliveira, A.L.: Inference of regular languages using state merging algorithms with search. Pattern Recogn. 38(9), 1457–1467 (2005)zbMATHCrossRefGoogle Scholar
  11. 11.
    Burago, A.: Learning structurally reversible context-free grammars from queries and counterexamples in polynomial time. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT 1994, pp. 140–146. ACM, New York (1994)CrossRefGoogle Scholar
  12. 12.
    Cano, A., Ruíz, J., García, P.: Inferring Subclasses of Regular Languages Faster Using RPNI and Forbidden Configurations. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 28–36. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Cicchello, O., Kremer, S.C.: Inducing grammars from sparse data sets: a survey of algorithms and results. J. Mach. Learn. Res. 4, 603–632 (2003)MathSciNetGoogle Scholar
  14. 14.
    Clark, A.: Distributional Learning of Some Context-Free Languages with a Minimally Adequate Teacher. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS, vol. 6339, pp. 24–37. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Clark, A., Eyraud, R., Habrard, A.: A Polynomial Algorithm for the Inference of Context Free Languages. In: Clark, A., Coste, F., Miclet, L. (eds.) ICGI 2008. LNCS (LNAI), vol. 5278, pp. 29–42. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Crespi-Reghizzi, S., Guida, G., Mandrioli, D.: Noncounting context-free languages. Journal of the ACM (JACM) 25(4), 571–580 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Crespi-Reghizzi, S., Melkanoff, M.A., Lichten, L.: The use of grammatical inference for designing programming languages. Communications of the ACM 16, 83–90 (1973)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    de la Higuera, C.: Current Trends in Grammatical Inference. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SSPR&SPR 2000. LNCS, vol. 1876, pp. 28–31. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  19. 19.
    de la Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition 38, 1332–1348 (2005)CrossRefGoogle Scholar
  20. 20.
    de la Higuera, C.: Ten Open Problems in Grammatical Inference. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 32–44. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Dubey, A., Jalote, P., Aggarwal, S.: Learning context-free grammar rules from a set of programs. Software. IET 2(3), 223–240 (2008)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey/part i. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 343–359 (1986)zbMATHCrossRefGoogle Scholar
  23. 23.
    Fürst, L., Mernik, M., Mahnic, V.: Graph grammar induction as a parser-controlled heuristic search process, Budapest, Hungary (October 2011)Google Scholar
  24. 24.
    Gold, E.M.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)zbMATHCrossRefGoogle Scholar
  25. 25.
    Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37(3), 302–320 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Hrnčič, D., Mernik, M., Bryant, B.R.: Embedding Dsls Into Gpls: A Grammatical Inference Approach. Information Technology and Control 40(4) (December 2011)Google Scholar
  27. 27.
    Hrnčič, D., Mernik, M., Bryant, B.R., Javed, F.: A memetic grammar inference algorithm for language learning. Applied Soft Computing 12(3), 1006–1020 (2012)CrossRefGoogle Scholar
  28. 28.
    Ishizaka, H.: Polynomial time learnability of simple deterministic languages. Machine Learning 5(2), 151–164 (1990)Google Scholar
  29. 29.
    Javed, F., Mernik, M., Bryant, B.R., Gray, J.: A grammar-based approach to class diagram validation (2005)Google Scholar
  30. 30.
    Javed, F., Mernik, M., Gray, J., Bryant, B.R.: MARS: a metamodel recovery system using grammar inference. Inf. Softw. Technol. 50(9-10), 948–968 (2008)CrossRefGoogle Scholar
  31. 31.
    Javed, F., Mernik, M., Sprague, A., Bryant, B.: Incrementally inferring context-free grammars for domain-specific languages. In: Proceedings of the Eighteenth International Conference on Software Engineering and Knowledge Engineering (SEKE 2006), pp. 363–368 (2006)Google Scholar
  32. 32.
    Juillé, H., Pollack, J.B.: A Stochastic Search Approach to Grammar Induction. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, p. 126. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  33. 33.
    Kearns, M., Li, M., Pitt, L., Valiant, L.: On the learnability of boolean formulae. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC 1987, pp. 285–295. ACM, New York (1987)CrossRefGoogle Scholar
  34. 34.
    Kermorvant, C., de la Higuera, C.: Learning Languages with Help. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 161–173. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  35. 35.
    Kong, J., Ates, K., Zhang, K., Gu, Y.: Adaptive mobile interfaces through grammar induction. In: 20th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2008, vol. 1, pp. 133–140 (November 2008)Google Scholar
  36. 36.
    Lämmel, R., Verhoef, C.: Semi-automatic grammar recovery. Softw. Pract. Exper. 31(15), 1395–1448 (2001)zbMATHCrossRefGoogle Scholar
  37. 37.
    Lämmel, R., Zaytsev, V.: An Introduction to Grammar Convergence. In: Leuschel, M., Wehrheim, H. (eds.) IFM 2009. LNCS, vol. 5423, pp. 246–260. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  38. 38.
    Lang, K.J.: Faster algorithms for finding minimal consistent DFAs. Technical report (1999)Google Scholar
  39. 39.
    Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In: Proceedings of the 4th International Colloquium on Grammatical Inference, pp. 1–12. Springer, London (1998)CrossRefGoogle Scholar
  40. 40.
    Langley, P., Stromsten, S.: Learning Context-Free Grammars with a Simplicity Bias. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 220–228. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  41. 41.
    Larus, J.R.: Whole program paths. In: ACM SIGPLAN Notices, PLDI 1999, pp. 259–269. ACM, New York (1999)Google Scholar
  42. 42.
    Lee, L.: Learning of context-free languages: A survey of the literature. REP, 12–96 (1996)Google Scholar
  43. 43.
    Li, M., Vitányi, P.M.B.: Learning simple concepts under simple distributions. SIAM Journal of Computing 20, 911–935 (1991)zbMATHCrossRefGoogle Scholar
  44. 44.
    Liu, Q., Bryant, B.R., Mernik, M.: Metamodel recovery from multi-tiered domains using extended MARS. In: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference, COMPSAC 2010, pp. 279–288. IEEE Computer Society, Washington, DC (2010)CrossRefGoogle Scholar
  45. 45.
    Memon, A.U.: Log File Categorization and Anomaly Analysis Using Grammar Inference. Master of science, Queen’s University (2008)Google Scholar
  46. 46.
    Mernik, M., Hrnčič, D., Bryant, B., Sprague, A., Gray, J., Liu, Q., Javed, F.: Grammar inference algorithms and applications in software engineering. In: XXII International Symposium on Information, Communication and Automation Technologies, ICAT 2009., pp. 1–7 (October 2009)Google Scholar
  47. 47.
    Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research 7(1), 67–82 (1997)zbMATHGoogle Scholar
  48. 48.
    Oncina, J., García, P.: Identifying regular languages in polynomial time. In: Advances in Structural and Syntactic Pattern Recognition - Proceedings of the International Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, pp. 99–108 (1992)Google Scholar
  49. 49.
    Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. Journal of the ACM (JACM) 35(4), 965–984 (1988)MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    Ron, D.: Automata Learning and its Applications. PhD thesis, Hebrew University (1995)Google Scholar
  51. 51.
    Sakakibara, Y.: Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science 76(2-3), 223–242 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    Sakakibara, Y.: Efficient learning of context-free grammars from positive structural examples. Information and Computation 97(1), 23–60 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  53. 53.
    Sakakibara, Y.: Recent advances of grammatical inference. Theoretical Computer Science 185, 15–45 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Trakhtenbrot, B.A., Barzdin, Y.M.: Finite Automata: Behaviour and Synthesis. North-Holland Publishing Company, Amsterdam (1973)Google Scholar
  55. 55.
    Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)zbMATHCrossRefGoogle Scholar
  56. 56.
    Črepinšek, M., Mernik, M., Bryant, B.R., Javed, F., Sprague, A.: Inferring context-free grammars for domain-specific languages. Electronic Notes in Theoretical Computer Science 141(4), 99–116 (2005)CrossRefGoogle Scholar
  57. 57.
    Črepinšek, M., Mernik, M., Javed, F., Bryant, B.R., Sprague, A.: Extracting grammar from programs: evolutionary approach. ACM SIGPLAN Notices 40, 39–46 (2005)Google Scholar
  58. 58.
    Vidal, E.: Grammatical Inference: An Introductory Survey. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 1–4. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  59. 59.
    Yokomori, T.: Polynomial-time learning of very simple grammars from positive data. In: Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pp. 213–227. Morgan Kaufmann Publishers Inc., San Francisco (1991)Google Scholar
  60. 60.
    Yokomori, T.: On polynomial-time learnability in the limit of strictly deterministic automata. Machine Learning 19(2), 153–179 (1995)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Andrew Stevenson
    • 1
  • James R. Cordy
    • 1
  1. 1.Queen’s UniversityKingstonCanada

Personalised recommendations