Acquisition of Syntactical Knowledge from Text

Schrepp, Jürgen

doi:10.1007/978-3-642-50974-2_36

Jürgen Schrepp⁷

Part of the book series: Studies in Classification, Data Analysis and Knowledge Organization ((STUDIES CLASS))

525 Accesses

Abstract

The outline of a system is described which is designed to infer a grammar from a finite sample of linguistic data (corpus). It is inspired by the research on inductive inference in the sense of Gold(1967). After tagging the corpus, an incremental learning algorithm is used to produce a sequence of grammars which approximates the target grammar of the data provided. In each step, a small set of sentences is selected in a way which reduces the danger of overgeneralization. The sentences selected are analysed by a modified Earley parser which allows to measure the “distance” between the language generated by the actual grammar G and sentences not covered by G. The sentence which minimizes the “inductive leap” for the learner is selected to infer a new grammar. For this sentence several hypotheses for completing its partial structural description are formulated and evaluated. The “best” hypothesis is then used to infer a new grammar. This process is continued until the corpus is completely covered by the grammar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A. V. & T. G. Peterson (1972), A minimum distance error-correcting parser for context-free languages, SIAM Journal on Computing, 1(4), 305–12.
Article Google Scholar
Angluin, D. (1980), Inductive inference of formal languages from positive data, Information and Control, 45, 117–35.
Article Google Scholar
Berwick, R.C. (1986), Learning from positive-only examples, in: R.S. Michalski, J.G. Carbonell & T. M. Mitchell (eds), Machine Learning-Vol.II, Morgan Kaufmann, Los Altos, 625–45.
Google Scholar
Crespi-Reghizzi, S. (1972), An effective model for grammar inference, in: B. Gilchrist (ed), Information Processing 71, Elsevier North-Holland, 524-29.
Google Scholar
Garside, R., G. Leech & G. Sampson (1987), The computational analysis of English, Longman, New York.
Google Scholar
Gold, E. M. (1967), Language identification in the limit, Information and Control, 10, 447–74.
Article Google Scholar
Lyon, G. (1974), Syntax-directed least-errors analysis for context-free languages: A practical approach, Communications of the ACM, 17(1), 3–14.
Article Google Scholar
Morgan, J.L. (1986), From simple input to complex grammar, The MIT Press, Cambridge, MA.
Google Scholar
Wagner, R. A. & J. L. Seiferas (1978), Correcting counter-automaton-recognizable languages, Siam Journal on Computing, 7(3), 357–75.
Article Google Scholar
Yokomori, T. (1989), Learning context-free languages efficiently, in: K.P. Jantke (ed), Analogical and inductive inference, Springer, Berlin-Heidelberg, 104–23.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Linguistische Datenverarbeitung, Universität Trier, Postfach 3825, W-5500, Trier, Germany
Jürgen Schrepp

Authors

Jürgen Schrepp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Mathematische Methoden der Wirtschaftswissenschaften, Universität Augsburg, Universitätsstr. 2, D-86135, Augsburg, Germany
Otto Opitz
Forschungsinstitut für Kinderernährung, Heinstück 11, D-44225, Dortmund, Germany
Berthold Lausen
Abteilung für Medizinische Informatik, Universitäts-Klinikum Freiburg, Stefan-Meier-Str. 26, D-79104, Freiburg, Germany
Rüdiger Klar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schrepp, J. (1993). Acquisition of Syntactical Knowledge from Text. In: Opitz, O., Lausen, B., Klar, R. (eds) Information and Classification. Studies in Classification, Data Analysis and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-50974-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-50974-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56736-3
Online ISBN: 978-3-642-50974-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics