Abstract
The outline of a system is described which is designed to infer a grammar from a finite sample of linguistic data (corpus). It is inspired by the research on inductive inference in the sense of Gold(1967). After tagging the corpus, an incremental learning algorithm is used to produce a sequence of grammars which approximates the target grammar of the data provided. In each step, a small set of sentences is selected in a way which reduces the danger of overgeneralization. The sentences selected are analysed by a modified Earley parser which allows to measure the “distance” between the language generated by the actual grammar G and sentences not covered by G. The sentence which minimizes the “inductive leap” for the learner is selected to infer a new grammar. For this sentence several hypotheses for completing its partial structural description are formulated and evaluated. The “best” hypothesis is then used to infer a new grammar. This process is continued until the corpus is completely covered by the grammar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A. V. & T. G. Peterson (1972), A minimum distance error-correcting parser for context-free languages, SIAM Journal on Computing, 1(4), 305–12.
Angluin, D. (1980), Inductive inference of formal languages from positive data, Information and Control, 45, 117–35.
Berwick, R.C. (1986), Learning from positive-only examples, in: R.S. Michalski, J.G. Carbonell & T. M. Mitchell (eds), Machine Learning-Vol.II, Morgan Kaufmann, Los Altos, 625–45.
Crespi-Reghizzi, S. (1972), An effective model for grammar inference, in: B. Gilchrist (ed), Information Processing 71, Elsevier North-Holland, 524-29.
Garside, R., G. Leech & G. Sampson (1987), The computational analysis of English, Longman, New York.
Gold, E. M. (1967), Language identification in the limit, Information and Control, 10, 447–74.
Lyon, G. (1974), Syntax-directed least-errors analysis for context-free languages: A practical approach, Communications of the ACM, 17(1), 3–14.
Morgan, J.L. (1986), From simple input to complex grammar, The MIT Press, Cambridge, MA.
Wagner, R. A. & J. L. Seiferas (1978), Correcting counter-automaton-recognizable languages, Siam Journal on Computing, 7(3), 357–75.
Yokomori, T. (1989), Learning context-free languages efficiently, in: K.P. Jantke (ed), Analogical and inductive inference, Springer, Berlin-Heidelberg, 104–23.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Schrepp, J. (1993). Acquisition of Syntactical Knowledge from Text. In: Opitz, O., Lausen, B., Klar, R. (eds) Information and Classification. Studies in Classification, Data Analysis and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-50974-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-50974-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56736-3
Online ISBN: 978-3-642-50974-2
eBook Packages: Springer Book Archive