Using Tree Transducers for Grammatical Inference

  • Noémie-Fleur Sandillon-Rezer
  • Richard Moot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6736)


We present a novel way of extracting a categorial grammar from annotated data. Using the sentences from the Paris VII annotated treebank [2] as our starting point, we use a tree transducer to convert the annotated trees from the corpus into categorial grammar derivations.

We describe both the formal aspects and the implementation of the tree transducer, which is a conservative extension of standard tree transducers allowing a compact specification of the transductions rules relevant for our purposes, and we discuss the specific set of transduction rules we use to convert the corpus into AB grammar derivation trees.

Evaluating the resulting tree transducer on the entire corpus, we find that it produces a treebank finds lexical entries for 90,0% of the corpus, though it produces complete derivations for only 75% of all sentence in the corpus.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abeillé, A., Clément, L.: Annotation morpho-syntaxique (2003),
  2. 2.
    Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for french. Treebanks. Kluwer, Dordrecht (2003)CrossRefGoogle Scholar
  3. 3.
    Besombes, J., Marion, J.: Learning tree languages from positive examples and membership queries. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS (LNAI), vol. 3244, pp. 440–453. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    BuszKowski, W., Penn, G.: Categorial grammars determined from linguistic data by unification. Studia Logica 49(4), 431–454 (1990), MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chomsky, N.: Lectures on government and binding (1981)Google Scholar
  6. 6.
    Clark, S., Curran, J.: Wide-coverage efficient statistical parsing with ccg and log-linear. Models, Computational Linguistics 33 (2007)Google Scholar
  7. 7.
    Comon, H., Dauchet, M., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications (1997),
  8. 8.
    Costa Florêncio, C.: Consistent identification in the limit of any of the classes k-valued is NP-hard. In: de Groote, P., Morrill, G., Retoré, C. (eds.) LACL 2001. LNCS (LNAI), vol. 2099, pp. 125–138. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Engelfriet, J., Vogler, H.: The translation power of top-down tree-to-graph transducers. Journal of Computer and System Sciences 49(2) (1993)Google Scholar
  10. 10.
    Gold, E.M.: Language identification in the limit. Information and Control 10(5) (1967)Google Scholar
  11. 11.
    Hockenmaier, J.: Data and models for statistical parsing with combinatory categorial grammar (2003)Google Scholar
  12. 12.
    Hockenmaier, J.: Creating a ccgbank and a wide-coverage ccg lexicon for german. In: Proceedings of COLING/ACL, Sydney (2006)Google Scholar
  13. 13.
    Kanazawa, M.: Learnable Classes of Categorial Grammars. Center for the Study of Language and Information, Stanford University, Ventura Hall, 220 Panama Street, Stanford, CA 94305-4115 (1998), phone: 650-723-3084; e-mail:; World Wide Web: Scholar
  14. 14.
    Knight, K., Graehl, J.: An overview of probabilistic tree transducers for natural language processing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 1–24. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Kraak, E.: A deductive account of french object clitics. In: SYntax and Semantics, pp. 271–312 (1998)Google Scholar
  16. 16.
    Lambek, J.: The mathematics of sentence structure. The American Mathematical Monthly 65(3), 154–170 (1958),, articletype: primary_article / Full publication date: March 1958, Mathematical Association of America MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Levy, R., Andrew, G.: Tregex and tsurgeon: tools for querying and manipulating tree data structures (2006),
  18. 18.
    Moortgat, M.: Categorial type logics. In: Handbook of Logic and Language, pp. 93–177 (1997),
  19. 19.
    Moot, R.: Automated extraction of type-logical supertags from the spoken dutch corpus. In: Complexity of Lexical Descriptions and its Relevance to Natural Language Processing: A Supertagging Approach (2010)Google Scholar
  20. 20.
    Moot, R.: Semi-automated extraction of a wide-coverage type-logical grammar for french. In: Proceedings TALN 2010, Monreal (2010)Google Scholar
  21. 21.
    Moot, R., Retoré, C.: Les indices pronominaux du français dans les grammaires catégorielles. Lingvisticae Investigationes 29(1), 137–146 (2006)CrossRefGoogle Scholar
  22. 22.
    Morrill, G.V.: Type Logical Grammar: Categorial Logic of Signs. Springer, Heidelberg (1994)CrossRefzbMATHGoogle Scholar
  23. 23.
    Sandillon-Rezer, N. (2011),
  24. 24.
    Steedman, M.: The syntactic process (200)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Noémie-Fleur Sandillon-Rezer
    • 1
    • 2
    • 3
  • Richard Moot
    • 1
    • 2
    • 3
  1. 1.Université de Bordeaux LaBRITalenceFrance
  2. 2.CNRS, esplanade des Arts et MétiersTalenceFrance
  3. 3.SIGNES (INRIA Bordeaux SW)TalenceFrance

Personalised recommendations