Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Grammar Inference

  • Matthew Young-LaiEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_182


Automata induction; Automatic induction; Automatic language induction; Grammar induction; Grammatical induction; Grammatical inference


Grammar inference is the task of learning grammars or languages from training data. It is a type of inductive inference, the name given to learning techniques that try to guess general rules from examples.

The basic problem is to find a grammar consistent with a training set of positive examples. Usually, the target language is infinite, while the training set is finite. Some work assumes that both positive and negative examples are available, but this is not true in most real applications. Sometimes probability information is attached to each example. In this case, it is possible to learn a probability distribution for the strings in the language in addition to the grammar. This is sometimes called stochastic grammar inference.

A grammar inference algorithm must target a particular grammar representation. More expressive...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Ahonen H, Mannila H, Nikunen E. Generating grammars for SGML tagged texts lacking DTD. In: Proceedings of the Workshop on Principles of Document Processing; 1994.Google Scholar
  2. 2.
    Ahonen H, Mannila H, Nikunen E. Forming grammars for structured documents: an application of grammatical inference. In: Carrasco R, Oncina J, editors. Lecture notes in computer science, vol. 862. Berlin/New York: Springer; 1994. p. 153–67.Google Scholar
  3. 3.
    Angluin D. On the complexity of minimum inference of regular sets. Inf Control. 1978;39(3):337–50.MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Angluin D. Inference of reversible languages. J ACM. 1982;29(3):741–85.MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann Math Stat. 1970;41(1):164–71.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Fankhauser P, Xu Y. MarkItUp! an incremental approach to document structure recognition. Electron Publ Orig Dissem Des. 1993;6(4):447–56.Google Scholar
  7. 7.
    Gold EM. Language identification in the limit. Inf Control. 1967;10(5):447–74.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Gold EM. Complexity of automaton identification from finite data. Inf Control. 1978;37(3):302–20.zbMATHCrossRefGoogle Scholar
  9. 9.
    Goldman R, Widom J. DataGuides: enabling query formulation and optimization in semi-structured databases. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 436–45.Google Scholar
  10. 10.
    Hopcroft JE, Ullman JD. Introduction to automata theory, languages and computation. Reading: Addison-Wesley; 1979.zbMATHGoogle Scholar
  11. 11.
    Oncina J, García P. Inferring regular languages in polynomial updated time. In: de la Blanca NP, Sanfeliu A, Vidal E, editors. Pattern recognition and image analysis. Singapore: World Scientific; 1992. p. 49–61.CrossRefGoogle Scholar
  12. 12.
    Sánchez JA, Benedí JM. Statistical inductive learning of regular formal languages. In: Carrasco R, Oncina J, editors. Lecture notes in computer science, vol. 862; 1994. p. 130–8.CrossRefGoogle Scholar
  13. 13.
    Shafer K. Creating DTDs via the GB-engine and Fred. Dublin/Ohio: OCLC Online Computer Library Center; 1995.Google Scholar
  14. 14.
    Stolcke A, Omohundro S. Inducing probabilistic grammars by Bayesian model merging. In: Carrasco R, Oncina J, editors. Lecture notes in computer science. 862; 1994. p. 106–18.Google Scholar
  15. 15.
    Young-Lai M, Tompa FW. Stochastic grammatical inference of text database structure. Mach Learn. 2000;40(2):111–37.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Sybase iAnywhereWaterlooCanada

Section editors and affiliations

  • Frank Tompa
    • 1
  1. 1.David R. Cheriton School of Computer ScienceUniv. of WaterlooWaterlooCanada