Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus

Watkinson, Stephen; Manandhar, Suresh

doi:10.1007/3-540-40030-3_14

Stephen Watkinson³ &
Suresh Manandhar³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1925))

Included in the following conference series:

International Conference on Learning Language in Logic

402 Accesses
2 Citations

Abstract

In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories, the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilistically parsed. The parses of this example and the set of parses of earlier examples in the corpus are used to build a lexicon and annotate the corpus. We report the results from experiments on two generated corpora and also on the more complicated LLL corpus, that contains examples from subsets of English syntax. These show that the system is able to generate reasonable lexicons and provide accurately parsed corpora in the process. We also discuss ways in which the approach can be scaled up to deal with larger and more diverse corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adriaans, P. W. (1992). Language Learning from a Categorial Perspective. Ph.D. thesis, Universiteit van Amsterdam.
Google Scholar
Bar-Hillel, Y., Gaifman, C., & Shamir, E. (1964). On categorial and phrase structure grammars. In Language and Information, pp. 99–115. Addison-Wesley. First appeared in The Bulletin of the Research Council of Israel, vol. 9F, pp. 1–16, 1960.
MathSciNet Google Scholar
Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In Natural Language Processing Using Very Large Corpora. Kluwer Academic Press.
Google Scholar
Buszkowski, W. (1987). Discoverypro cedures for categorial grammars. In Klein, E., & van Benthem, J. (Eds.), Categories, Polymorphism and Unification, pp. 35–64. Centre for Cognitive Science, University of Edinburgh & Institute for Language, Logic and Information, Universityof Amsterdam.
Google Scholar
Charniak, E. (1993). Statistical Language Learning. The MIT Press, Cambridge, Massachusetts.
Google Scholar
Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, Computer & Information Science, Universityof Pennsylvania.
Google Scholar
Gazdar, G., & Mellish, C. (1989). Natural Language Processing in Prolog: An Introduction to Computational Linguistics. Adison-Wesley.
Google Scholar
Hindle, D. (1983). Deterministic parsing of syntactic non-fluencies. In Marcus, M. (Ed.), Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, pp. 123–128. ACL.
Google Scholar
Kanazawa, M. (1994). Learnable Classes of Categorial Grammars. Ph.D. thesis, Institute for Logic, Language and Computation, Universityof Amsterdam.
Google Scholar
Kazakov, D., Pulman, S., & Muggleton, S. (1998). The FraCas dataset and the LLL challenge. Tech. rep., SRI International.
Google Scholar
Kupiec, J. (1992). Robust part-of-speech tagging using a hidden markov model. Computer Speech and Language, 6, 225–242.
Article Google Scholar
Marcus, M. P. (1980). A Theory of Syntactic Recognition. The MIT Press Series in Artificial Intelligence. The MIT Press.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Tech. rep. IRCS-93-47, Institution for Research in Cognitive Science.
Google Scholar
Osborne, M. (1997). Minimisation, indifference and statistical language learning. In Workshop on Empirical Learning of Natural Language Processing Tasks, ECML’97, pp. 113–124.
Google Scholar
Pinker, S. (1990). Language acquisition. In Oshershon, D. N., & Lasnik, H. (Eds.), An Invitation to Cognitive Science: Language, Vol. 1, pp. 199–241. The MIT Press.
Google Scholar
Solomon, W. D. (1991). Learning a grammar. Tech. rep. UMCS-AI-91-2-1, Department of Computer Science, Artificial Intelligence Group, University of Manchester.
Google Scholar
Steedman, M. (1993). Categorial grammar. Lingua, 90, 221–258.
Article Google Scholar
Uszkoreit, H. (1986). Categorial unification grammars. Technical report CSLI-86-66, Center for the Studyof Language and Information, Stanford University, Stanford, CA.
Google Scholar
Watkinson, S., & Manandhar, S. (1999). Unsupervised lexical learning with categorial grammars. In Stolcke, A., & Kelher, A. (Eds.), Proceeding of the Workshop in Unsupervised Learning in Natural Language Processing.
Google Scholar
Wol., J. (1987). Cognitive development as optimisation. In Bolc, L. (Ed.), Computational Models of Learning, Symbolic computation-artificial intelligence. Springer Verlag.
Google Scholar
Wood, M. M. (1993). Categorial Grammars. Linguistic TheoryGuides. Routledge. General Editor Richard Hudson.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of York, YO10 5DD, Hesslington, York, UK
Stephen Watkinson & Suresh Manandhar

Authors

Stephen Watkinson
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Manandhar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK
James Cussens
Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Watkinson, S., Manandhar, S. (2000). Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus. In: Cussens, J., Džeroski, S. (eds) Learning Language in Logic. LLL 1999. Lecture Notes in Computer Science(), vol 1925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40030-3_14

Download citation

DOI: https://doi.org/10.1007/3-540-40030-3_14
Published: 01 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41145-1
Online ISBN: 978-3-540-40030-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics