Skip to main content

Timber! Issues in Treebank Building and Use

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

Abstract

We discuss several treebank conceptions in the literature and show that their requirements may be incompatible, describing then the options taken in the construction of a Portuguese treebank, in what concerns human vs. automatic intervention. Use cases are then listed in connection with a Web search tool (Águia), whose philosophy and implementation is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Marcus, Mitchell, Kim, Grace, Marcinkiewicz, Mary Ann, MacIntyre, Robert, Bies, Ann, Ferguson, Mark, Katz, Karen, Schasberger, Britta: The Penn treebank: Annotating predicate argument structure. In: Proceedings of the 1994 Human Language Technology Workshop (ARPA) (1994) 110–115.

    Google Scholar 

  2. Xia, Fei, Palmer, Martha, Xue, Nianwen, Okurowski, Mary Ellen, Kovarik, John, Chiou, Fu-dong, Huang, Shizhe, Kroch, Tony, Marcus, Mitch: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Gavriladou, M. et al. (eds.), Proceedings of LREC 2000 (2000) 3–10.

    Google Scholar 

  3. Skut, Wojciech, Brants, Thorsten, Krenn, Brigitte, Uszkoreit, Hans: A Linguistically Interpreted Corpus of German Newspaper Text. In: Rubio, A. et al. (eds.), Proceedings of LREC 1998 (1998) 705–711

    Google Scholar 

  4. Afonso, Susana, Bick, Eckhard, Haber, Renato, Santos, Diana: “Floresta sintá(c)tica”: a treebank for Portuguese. In: Rodríguez, M.G., Araujo, C.P.S. (eds.): Proceedings of LREC 2002 (2002), 1698–1703

    Google Scholar 

  5. Wilson, G., Mani, I., Sundheim, B., Ferro, L.: A multilingual approach to annotating and extracting temporal information. In: Proceedings of the Worskhop for Temporal and Spatial Information Processing (Toulouse, July 7th 2001) (2001) 81–87

    Google Scholar 

  6. Marcus, Mitchell P., Santorini, Beatrice, Marcinkiewicz, Mary Ann: Building a large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19 (1993) 313–330

    Google Scholar 

  7. Gaizauskas, R., Hepple, M., Huyck, C. Modifying Existing Annotated Corpora for General Comparative Evaluation of Parsing. In: Workshop on Evaluation of Parsing Systems, at the LREC’98 (1998)

    Google Scholar 

  8. Carroll, John, Minnen, Guido, Briscoe, Ted: Corpus annotation for Parser Evaluation. In: Uszkoreit, H. et al (eds.): Proceedings of LINC-99: Linguistically Interpreted Corpora, EACL (Bergen, 12 June 1999) (1999) 35–41

    Google Scholar 

  9. Santos, Diana, Rocha, Paulo: AvalON: uma iniciativa de avaliação conjunta para o português. In: Actas do XVIII Encontro da Associação Portuguesa de LinguÍstica (Porto, 2–4 de Outubro de 2002) (2003)

    Google Scholar 

  10. Santos, Diana, Costa, Luís, Rocha, Paulo: Cooperatively evaluating Portuguese morphology. In: this volume (2003)

    Google Scholar 

  11. Bick, Eckhard: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press (2000)

    Google Scholar 

  12. Santos, Diana, Gasperin, Caroline: Evaluation of parsed corpora: experiments in user-transparent and user-visible evaluation. In Rodríguez, M.G.; Araujo, C.P.S. (eds.): Proceedings of LREC 2002 (2002) 597–604

    Google Scholar 

  13. Afonso, Susana: Clara e sucintamente: Um estudo em corpus sobre a coordenação de advérbios em-mente. In: Actas do XVIII Encontro da Associação Portuguesa de LinguÍstica (Porto, 2–4 de Outubro de 2002) (2003)

    Google Scholar 

  14. Afonso, Susana, Bick, Eckhard, Haber, Renato, Santos, Diana: Floresta sintá(c)tica: um treebank para o português. In: Gonçalves, Anabela, Correia, Clara Nunes (eds.): Actas do XVII Encontro da Associação Portuguesa de Linguística (Lisboa, 2–4 de Outubro de 2001) (2002) 533–545

    Google Scholar 

  15. Christ, Oliver: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94: 3rd Conference on Computational Lexicography and Text Research (1994) 23–32

    Google Scholar 

  16. Evert, Stefan: CQP Query Language Tutorial. IMS Stuttgart, 13 Out 2001

    Google Scholar 

  17. Evert, Stefan; Kermes, Hannah: Annotation, storage, and retrieval of mildly recursive structures. In: Proceedings of the Workshop on Shallow Processing of Large Corpora (SProLaC 2003) (2003)

    Google Scholar 

  18. König, Esther, Lezius, Wolfgang: A description language for syntactically annotated corpora. In: Proceedings of COLING 2000 (2000) 1056–1060

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos, D. (2003). Timber! Issues in Treebank Building and Use. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-45011-4_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40436-1

  • Online ISBN: 978-3-540-45011-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics