Abstract
This chapter will describe our experience developing specifications and tools for building a Syntactically Annotated Corpus (SAC) for Spanish newspaper texts. The initial corpus consists of 1,500 sentences extracted from El País Digital and Compra Maestra, with a total of 22,695 words. The paper will address several of the relevant topics for any SAC project, namely methodology, data selection, annotation scheme, tools, and experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bies, A., Ferguson, M., Katz, K., Macintyre, R. (1995). Bracketing Guidelines for Treebank II Style Penn Treebank Project.
Brants, T., Skut, S., Uszkoreit, H. (2003). “Syntactic annotation of a German newspaper corpus”, in this volume.
EAGLES (1996): Preliminary Recommendations for the Syntactic Annotation of Corpora.
Ide, N., Bonhomme, P., Romary, L. (2000) “XCES: An XML-based Encoding Standard for Linguistic Corpora”, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 831–835.
Karlsson, F. and Voutilainen, A. and Heikkilä, J. Anttila, A. (1995). Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Berlin, Mouton de Gruyter.
Marcus, M, Santorini, B, Marcinkiewicz, M. A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, 19, 2, p. 313–330.
Mengel, A., Lezius, W. (2000). “An XML-based representation format for syntactically annotated corpora”, in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 121–126.
Moreno, A., López, S., Sánchez, F. (1999). Spanish Tree Bank: Specifications. Version 4. 30 April 1999. Internal document, Laboratorio de Lingüística In-formática, UAM.
Moreno, A., Grishman, R., Lpez, S., Sánchez, F., Sekine, S. (2000). A Tree-bank of Spanish and its Application to Parsing, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, p. 107–111.
Moreno, A. Goñi, J.M. (1995). GRAMPAL: A morphological model and processor for Spanish implements in Prolog, Proceedings of the Joint Conference on Declarative Programming (GULP-PRODE’95), Marina de Vietri, Italy.
Sánchez, F. (1997). Análisis morfosintáctico y desambiguación en castellano. Ph.D. Dissertation, Department of Linguistics, Universidad Autnoma de Madrid.
Sánchez, F. Ramírez, Declerck, Th. (1999). Integrated set of tools for robust text processing. Proceedings of the VEXTAL Conference, Venice.
Sekine, S. (1998). Corpus-based Parsing and Sublanguage Studies. Ph.D. Dissertation, Department of Computer Science, New York University.
Skut, W., Krenn, B., Brants, T., Uszkoreit, H. (1997). An Annotation Scheme for Free Word Order Languages, Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP), Washington, D.C.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Moreno, A., López, S., Sánchez, F., Grishman, R. (2003). Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_9
Download citation
DOI: https://doi.org/10.1007/978-94-010-0201-1_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive