Skip to main content

Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus

  • Chapter
  • First Online:
Learning Language in Logic (LLL 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1925))

Included in the following conference series:

Abstract

In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories, the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilistically parsed. The parses of this example and the set of parses of earlier examples in the corpus are used to build a lexicon and annotate the corpus. We report the results from experiments on two generated corpora and also on the more complicated LLL corpus, that contains examples from subsets of English syntax. These show that the system is able to generate reasonable lexicons and provide accurately parsed corpora in the process. We also discuss ways in which the approach can be scaled up to deal with larger and more diverse corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adriaans, P. W. (1992). Language Learning from a Categorial Perspective. Ph.D. thesis, Universiteit van Amsterdam.

    Google Scholar 

  2. Bar-Hillel, Y., Gaifman, C., & Shamir, E. (1964). On categorial and phrase structure grammars. In Language and Information, pp. 99–115. Addison-Wesley. First appeared in The Bulletin of the Research Council of Israel, vol. 9F, pp. 1–16, 1960.

    MathSciNet  Google Scholar 

  3. Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In Natural Language Processing Using Very Large Corpora. Kluwer Academic Press.

    Google Scholar 

  4. Buszkowski, W. (1987). Discoverypro cedures for categorial grammars. In Klein, E., & van Benthem, J. (Eds.), Categories, Polymorphism and Unification, pp. 35–64. Centre for Cognitive Science, University of Edinburgh & Institute for Language, Logic and Information, Universityof Amsterdam.

    Google Scholar 

  5. Charniak, E. (1993). Statistical Language Learning. The MIT Press, Cambridge, Massachusetts.

    Google Scholar 

  6. Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, Computer & Information Science, Universityof Pennsylvania.

    Google Scholar 

  7. Gazdar, G., & Mellish, C. (1989). Natural Language Processing in Prolog: An Introduction to Computational Linguistics. Adison-Wesley.

    Google Scholar 

  8. Hindle, D. (1983). Deterministic parsing of syntactic non-fluencies. In Marcus, M. (Ed.), Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, pp. 123–128. ACL.

    Google Scholar 

  9. Kanazawa, M. (1994). Learnable Classes of Categorial Grammars. Ph.D. thesis, Institute for Logic, Language and Computation, Universityof Amsterdam.

    Google Scholar 

  10. Kazakov, D., Pulman, S., & Muggleton, S. (1998). The FraCas dataset and the LLL challenge. Tech. rep., SRI International.

    Google Scholar 

  11. Kupiec, J. (1992). Robust part-of-speech tagging using a hidden markov model. Computer Speech and Language, 6, 225–242.

    Article  Google Scholar 

  12. Marcus, M. P. (1980). A Theory of Syntactic Recognition. The MIT Press Series in Artificial Intelligence. The MIT Press.

    Google Scholar 

  13. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Tech. rep. IRCS-93-47, Institution for Research in Cognitive Science.

    Google Scholar 

  14. Osborne, M. (1997). Minimisation, indifference and statistical language learning. In Workshop on Empirical Learning of Natural Language Processing Tasks, ECML’97, pp. 113–124.

    Google Scholar 

  15. Pinker, S. (1990). Language acquisition. In Oshershon, D. N., & Lasnik, H. (Eds.), An Invitation to Cognitive Science: Language, Vol. 1, pp. 199–241. The MIT Press.

    Google Scholar 

  16. Solomon, W. D. (1991). Learning a grammar. Tech. rep. UMCS-AI-91-2-1, Department of Computer Science, Artificial Intelligence Group, University of Manchester.

    Google Scholar 

  17. Steedman, M. (1993). Categorial grammar. Lingua, 90, 221–258.

    Article  Google Scholar 

  18. Uszkoreit, H. (1986). Categorial unification grammars. Technical report CSLI-86-66, Center for the Studyof Language and Information, Stanford University, Stanford, CA.

    Google Scholar 

  19. Watkinson, S., & Manandhar, S. (1999). Unsupervised lexical learning with categorial grammars. In Stolcke, A., & Kelher, A. (Eds.), Proceeding of the Workshop in Unsupervised Learning in Natural Language Processing.

    Google Scholar 

  20. Wol., J. (1987). Cognitive development as optimisation. In Bolc, L. (Ed.), Computational Models of Learning, Symbolic computation-artificial intelligence. Springer Verlag.

    Google Scholar 

  21. Wood, M. M. (1993). Categorial Grammars. Linguistic TheoryGuides. Routledge. General Editor Richard Hudson.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Watkinson, S., Manandhar, S. (2000). Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus. In: Cussens, J., Džeroski, S. (eds) Learning Language in Logic. LLL 1999. Lecture Notes in Computer Science(), vol 1925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40030-3_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-40030-3_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41145-1

  • Online ISBN: 978-3-540-40030-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics