Advertisement

Treebanks pp 43-59 | Cite as

Bank of English and Beyond

Hand-crafted parsers for functional annotation
  • Timo Järvinen
Part of the Text, Speech and Language Technology book series (TLTB, volume 20)

Abstract

The 200 million word corpus of the Bank of English was annotated morphologically and syntactically using the English Constraint Grammar analyser, a rulebased shallow parser developed at the Research Unit for Computational Linguistics, University of Helsinki. We discuss the annotation system and methods used in the corpus work, as well as the theoretical assumptions of the Constraint Grammar syntax. Based on our experience in large-scale corpus work, we argue for a deeper and more explicit, dependency-based syntactic representation. We present a new practical parsing system, the Functional Dependency Grammar parser, developed from the Constraint Grammar system, and discuss its suitability for treebank annotation.

Keywords

Parsing Tagging Constraint Grammar Functional Dependency Grammar Bank of English 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aduriz, I., Aldezabal, I., Alegria, I., Artola, X., Ezeiza, N., and Urizar, R. (1996). Euslem: A lemmatiser/tagger for Basque. In Gellerstam, M., Järborg, J., Malmgren, S.-G., Norén, K., Rogström, L., and Papmehl, C. R., (eds), Papers submitted to the Seventh EURALEX International Congress on Lexicography, p. 17–26, Göteborg. Göteborg University, Department of English.Google Scholar
  2. Anttila, A. (1995). How to recognise Subjects in English. In ( Karlsson et al., 1995), p. 315–358.Google Scholar
  3. Bick, E. (1997). Dependensstrukturer i Constraint Grammar syntaks for por-tugisisk. In Bronsted, T. and Lytje, I., (eds), Sprog og Multimedier, p. 39–57. Universitetsforlag, Aalborg.Google Scholar
  4. Church, K. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, p. 136–143, Austin, Texas.Google Scholar
  5. COLING-96 (1996). COLING-96. The 16th International Conference on Computational Linguistics, Copenhagen, Denmark. Center for Sprogteknologi, COLING-96 Organizing Committee.Google Scholar
  6. Collins, M. J. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, p. 184–191, Santa Cruz, USA. ACL.Google Scholar
  7. Eisner, J. M. (1996). Three new probabilistic models for dependency parsing: An exploration. In (COLING-96, 1996), p. 340–345.Google Scholar
  8. Garside, R., Leech, G., and Sampson, G. (1987). The Computational Analysis of English. A Corpus-Based Approach. Longman, London.Google Scholar
  9. Hajič, J. (1998). Building a syntactically annotated corpus: The Prague Dependency Treebank. In Hajičová, E., (ed), Issues of Valency and Meaning-Studies in Honour of Jarmila Panevova, p. 106–132. Karolinum — Charles University Press, Prague.Google Scholar
  10. Hurskainen, A. (1996). Disambiguation of morphological analysis in Bantu languages. In (COLING-96, 1996), p. 568–573.Google Scholar
  11. Järvinen, T. (1994). Annotating 200 million words: The Bank of English Project. In COLING 94. The 15th International Conference on Computational Linguistics Proceedings, volume I, p. 565–568, Kyoto, Japan. International Committee on Computational Linguistics, COLING 94 Organizing Committee.Google Scholar
  12. Järvinen, T. and Tapanainen, P. (1997). A dependency parser for English. Technical Report TR-1, Department of General Linguistics, University of Helsinki, Finland.Google Scholar
  13. Karlsson, F. (1990). Constraint grammar as a framework for parsing running text. In Karlgren, H., (ed), Papers presented to the 13th International Conference on Computational Linguistics, Vol. 3., p. 168–173. Helsinki.Google Scholar
  14. Karlsson, F. (1995). Designing a parser for unrestricted text. In ( Karlsson et al., 1995), p. 1–40.Google Scholar
  15. Karlsson, F., Voutilainen, A., Heikkilä, J., and Anttila, A., (eds) (1995). Constraint Grammar: a language-independent system for parsing unrestricted dency Treebank. In Proceedings text, volume 4 of Natural Language Processing. Mouton de Gruyter, Berlin and New York.Google Scholar
  16. Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Publications 11, Department Of General Linguistics, University of Helsinki, Finland.Google Scholar
  17. Leech, G., Garside, R., and Bryant, M. (1994). CLAWS4: The tagging of the British National Corpus. In In COLING 94. The 15th International Conference on Computational Linguistics Proceedings, volume I, p. 622–628, Kyoto, Japan. International Committee on Computational Linguistics, COLING 94 Organizing Committee.Google Scholar
  18. Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.Google Scholar
  19. Müürisep, K. (1998). Eesti keele süntaksianalüsaatorist. Keel ja Kirjandus, XLI:47–56.Google Scholar
  20. Sinclair, J. (1987). Collins COBUILD English Language Dictionary. (First Edition). Collins, London.Google Scholar
  21. Sinclair, J., Hanks, P., Fox, G., Moon, R., and Stock, P. (1995). Collins COBUILD English Language Dictionary. (Second Edition). HarperCollins, London.Google Scholar
  22. Sutcliffe, R. F., Koch, H.-D., and McElligott, A. (1996). Industrial Parsing of Software Manuals. Language and Computers: Studies In Practical Linguistics. Rodopi, Amsterdam.Google Scholar
  23. Tapanainen, P. (1999). Parsing in two frameworks: finite-state and functional dependency grammar. PhD thesis, University of Helsinki.Google Scholar
  24. Tapanainen, P. and Järvinen, T. (1997). A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, D.C, p. 64–71, Washington, D.C. Association for Computational Linguistics.Google Scholar
  25. Tapanainen, P. and Järvinen, T. (1998). Dependency concordances. International Journal of Lexicography, 11(3): 187–203.CrossRefGoogle Scholar
  26. Voutilainen, A. (1995). Morphological disambiguation. In ( Karlsson et al., 1995), chapter 6, p. 165–284.Google Scholar
  27. Voutilainen, A. (1999). Hand-crafted rules. In van Halteren, H., (ed), Syntactic Wordclass Tagging, p. 217–246. Kluwer Academic Publishers, Dordrecht.CrossRefGoogle Scholar
  28. Voutilainen, A. and Heikkilä, J. (1995). Compiling and testing the lexicon. In ( Karlsson et al., 1995), chapter 3, p. 89–102.Google Scholar
  29. Voutilainen, A., Heikkilä, J., and Anttila, A. (1992). Constraint Grammar of English, a performance-oriented introduction. Publications 21, Department Of General Linguistics, University of Helsinki, Finland.Google Scholar
  30. Voutilainen, A. and Järvinen, T. (1996). Using the English Constraint Grammar parser to analyse a software manual corpus. In ( Sutcliffe et al., 1996).Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2003

Authors and Affiliations

  • Timo Järvinen
    • 1
  1. 1.Conexor oy Helsinki Science ParkHelsinkiFinland

Personalised recommendations