Skip to main content
Book cover

Treebanks pp 149–163Cite as

Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank

  • Chapter

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 20))

Abstract

This chapter will describe our experience developing specifications and tools for building a Syntactically Annotated Corpus (SAC) for Spanish newspaper texts. The initial corpus consists of 1,500 sentences extracted from El País Digital and Compra Maestra, with a total of 22,695 words. The paper will address several of the relevant topics for any SAC project, namely methodology, data selection, annotation scheme, tools, and experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bies, A., Ferguson, M., Katz, K., Macintyre, R. (1995). Bracketing Guidelines for Treebank II Style Penn Treebank Project.

    Google Scholar 

  • Brants, T., Skut, S., Uszkoreit, H. (2003). “Syntactic annotation of a German newspaper corpus”, in this volume.

    Google Scholar 

  • EAGLES (1996): Preliminary Recommendations for the Syntactic Annotation of Corpora.

    Google Scholar 

  • Ide, N., Bonhomme, P., Romary, L. (2000) “XCES: An XML-based Encoding Standard for Linguistic Corpora”, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 831–835.

    Google Scholar 

  • Karlsson, F. and Voutilainen, A. and Heikkilä, J. Anttila, A. (1995). Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Berlin, Mouton de Gruyter.

    Book  Google Scholar 

  • Marcus, M, Santorini, B, Marcinkiewicz, M. A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, 19, 2, p. 313–330.

    Google Scholar 

  • Mengel, A., Lezius, W. (2000). “An XML-based representation format for syntactically annotated corpora”, in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 121–126.

    Google Scholar 

  • Moreno, A., López, S., Sánchez, F. (1999). Spanish Tree Bank: Specifications. Version 4. 30 April 1999. Internal document, Laboratorio de Lingüística In-formática, UAM.

    Google Scholar 

  • Moreno, A., Grishman, R., Lpez, S., Sánchez, F., Sekine, S. (2000). A Tree-bank of Spanish and its Application to Parsing, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 107–111.

    Google Scholar 

  • Moreno, A. Goñi, J.M. (1995). GRAMPAL: A morphological model and processor for Spanish implements in Prolog, Proceedings of the Joint Conference on Declarative Programming (GULP-PRODE’95), Marina de Vietri, Italy.

    Google Scholar 

  • Sánchez, F. (1997). Análisis morfosintáctico y desambiguación en castellano. Ph.D. Dissertation, Department of Linguistics, Universidad Autnoma de Madrid.

    Google Scholar 

  • Sánchez, F. Ramírez, Declerck, Th. (1999). Integrated set of tools for robust text processing. Proceedings of the VEXTAL Conference, Venice.

    Google Scholar 

  • Sekine, S. (1998). Corpus-based Parsing and Sublanguage Studies. Ph.D. Dissertation, Department of Computer Science, New York University.

    Google Scholar 

  • Skut, W., Krenn, B., Brants, T., Uszkoreit, H. (1997). An Annotation Scheme for Free Word Order Languages, Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP), Washington, D.C.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Moreno, A., López, S., Sánchez, F., Grishman, R. (2003). Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0201-1_9

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1335-5

  • Online ISBN: 978-94-010-0201-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics