Abstract
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories, the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilistically parsed. The parses of this example and the set of parses of earlier examples in the corpus are used to build a lexicon and annotate the corpus. We report the results from experiments on two generated corpora and also on the more complicated LLL corpus, that contains examples from subsets of English syntax. These show that the system is able to generate reasonable lexicons and provide accurately parsed corpora in the process. We also discuss ways in which the approach can be scaled up to deal with larger and more diverse corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adriaans, P. W. (1992). Language Learning from a Categorial Perspective. Ph.D. thesis, Universiteit van Amsterdam.
Bar-Hillel, Y., Gaifman, C., & Shamir, E. (1964). On categorial and phrase structure grammars. In Language and Information, pp. 99–115. Addison-Wesley. First appeared in The Bulletin of the Research Council of Israel, vol. 9F, pp. 1–16, 1960.
Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In Natural Language Processing Using Very Large Corpora. Kluwer Academic Press.
Buszkowski, W. (1987). Discoverypro cedures for categorial grammars. In Klein, E., & van Benthem, J. (Eds.), Categories, Polymorphism and Unification, pp. 35–64. Centre for Cognitive Science, University of Edinburgh & Institute for Language, Logic and Information, Universityof Amsterdam.
Charniak, E. (1993). Statistical Language Learning. The MIT Press, Cambridge, Massachusetts.
Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, Computer & Information Science, Universityof Pennsylvania.
Gazdar, G., & Mellish, C. (1989). Natural Language Processing in Prolog: An Introduction to Computational Linguistics. Adison-Wesley.
Hindle, D. (1983). Deterministic parsing of syntactic non-fluencies. In Marcus, M. (Ed.), Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, pp. 123–128. ACL.
Kanazawa, M. (1994). Learnable Classes of Categorial Grammars. Ph.D. thesis, Institute for Logic, Language and Computation, Universityof Amsterdam.
Kazakov, D., Pulman, S., & Muggleton, S. (1998). The FraCas dataset and the LLL challenge. Tech. rep., SRI International.
Kupiec, J. (1992). Robust part-of-speech tagging using a hidden markov model. Computer Speech and Language, 6, 225–242.
Marcus, M. P. (1980). A Theory of Syntactic Recognition. The MIT Press Series in Artificial Intelligence. The MIT Press.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Tech. rep. IRCS-93-47, Institution for Research in Cognitive Science.
Osborne, M. (1997). Minimisation, indifference and statistical language learning. In Workshop on Empirical Learning of Natural Language Processing Tasks, ECML’97, pp. 113–124.
Pinker, S. (1990). Language acquisition. In Oshershon, D. N., & Lasnik, H. (Eds.), An Invitation to Cognitive Science: Language, Vol. 1, pp. 199–241. The MIT Press.
Solomon, W. D. (1991). Learning a grammar. Tech. rep. UMCS-AI-91-2-1, Department of Computer Science, Artificial Intelligence Group, University of Manchester.
Steedman, M. (1993). Categorial grammar. Lingua, 90, 221–258.
Uszkoreit, H. (1986). Categorial unification grammars. Technical report CSLI-86-66, Center for the Studyof Language and Information, Stanford University, Stanford, CA.
Watkinson, S., & Manandhar, S. (1999). Unsupervised lexical learning with categorial grammars. In Stolcke, A., & Kelher, A. (Eds.), Proceeding of the Workshop in Unsupervised Learning in Natural Language Processing.
Wol., J. (1987). Cognitive development as optimisation. In Bolc, L. (Ed.), Computational Models of Learning, Symbolic computation-artificial intelligence. Springer Verlag.
Wood, M. M. (1993). Categorial Grammars. Linguistic TheoryGuides. Routledge. General Editor Richard Hudson.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Watkinson, S., Manandhar, S. (2000). Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus. In: Cussens, J., Džeroski, S. (eds) Learning Language in Logic. LLL 1999. Lecture Notes in Computer Science(), vol 1925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40030-3_14
Download citation
DOI: https://doi.org/10.1007/3-540-40030-3_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41145-1
Online ISBN: 978-3-540-40030-1
eBook Packages: Springer Book Archive