Building and Using Parsed Corpora

  • Anne Abeillé

Part of the Text, Speech and Language Technology book series (TLTB, volume 20)

Table of contents

  1. Front Matter
    Pages i-xxvi
  2. Building Treebanks

    1. Front Matter
      Pages 1-1
    2. English treebanks

      1. Ann Taylor, Mitchell Marcus, Beatrice Santorini
        Pages 5-22
      2. Geoffrey Sampson
        Pages 23-41
      3. Timo Järvinen
        Pages 43-59
      4. Sean Wallis
        Pages 61-71
    3. German treebanks

      1. Thorsten Brants, Wojciech Skut, Hans Uszkoreit
        Pages 73-87
      2. Markus Becker, Andrew Bredenkamp, Berthold Crysmann, Judith Klein
        Pages 89-100
    4. Slavic treebanks

      1. Alena Böhmová, Jan Hajič, Eva Hajičová, Barbora Hladká
        Pages 103-127
      2. Małgorzata Marciniak, Agnieszka Mykowiecka, Adam Przepiórkowski, Anna Kupść
        Pages 129-146
    5. Treebanks for romance languages

      1. Antonio Moreno, Susana López, Fernando Sánchez, Ralph Grishman
        Pages 149-163
      2. Anne Abeillé, Lionel Clément, François Toussenel
        Pages 165-187
      3. Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci et al.
        Pages 189-210
      4. Vitor Rocio, Mário Amado Alves, J. Gabriel Lopes, Maria Francisca Xavier, Graça Vicente
        Pages 211-227
    6. Treebanks for other languages

      1. Keh-Jiann Chen, Chi-Ching Luo, Ming-Chung Chang, Feng-Yi Chen, Chao-Jan Chen, Chu-Ren Huang et al.
        Pages 231-248
      2. Sadao Kurohashi, Makoto Nagao
        Pages 249-260
      3. Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür
        Pages 261-277
  3. Using Treebanks

    1. Front Matter
      Pages 279-279
    2. Nancy Ide, Laurent Romary
      Pages 281-296
    3. Evaluation with treebanks

      1. John Carroll, Guido Minnen, Ted Briscoe
        Pages 299-316
      2. Dekang Lin
        Pages 317-329
    4. Grammar induction with treebanks

  4. Back Matter
    Pages 391-407

About this book


Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
This book:

provides a state of the art on work being done with parsed corpora;

gathers 21 papers on building and using parsed corpora raising many relevant questions;

deals with a variety of languages and a variety of corpora;

is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.


Head-driven Phrase Structure Grammar Index Syntax computational linguistics corpus electronic corpora evolution grammar language linguistics syntactic

Editors and affiliations

  • Anne Abeillé
    • 1
  1. 1.Universite Paris 7ParisFrance

Bibliographic information

  • DOI
  • Copyright Information Springer Science+Business Media B.V. 2003
  • Publisher Name Springer, Dordrecht
  • eBook Packages Springer Book Archive
  • Print ISBN 978-1-4020-1335-5
  • Online ISBN 978-94-010-0201-1
  • Series Print ISSN 1386-291X
  • Buy this book on publisher's site