Journal of Logic, Language and Information

, Volume 13, Issue 4, pp 457–470 | Cite as

Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time

  • Stephan Kepser
Original Article


In recent years large amounts of electronic texts have become available. While the first of these corpora had only a low level of annotation, the more recent ones are annotated with refined syntactic information. To make these rich annotations accessible for linguists, the development of query systems has become an important goal. One of the main difficulties in this task consists in the choice of the right query language, a language which at the same time should be powerful enough to let users formulate the queries they want and which should be efficiently evaluable to keep query response times short. There is a widespread belief that such a query language does not exist. It is therefore the aim of this paper to show that there is indeed a powerful query language that can be efficiently evaluated. We propose the use of monadic second-order logic as a query language. We show that a query in this language can be evaluated in linear time in the size of a tree in the corpus. We also provide examples of complicated linguistic queries expressed in monadic second-order logic thereby demonstrating the high expressive power of the language.

Key words

Complexity theory monadic second-order logic query treebank 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abeillé, A. and Clément, L., 1999, “A tagged reference corpus for French,” in Proceedings of EACL-LINC.Google Scholar
  2. Arnborg, S., Lagergren, J., and Seese, D., 1991, “Easy problems for tree-decomposable graphs,” Journal of Algorithms 12, 308–340.CrossRefGoogle Scholar
  3. Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, J., and Siméon, J., 2003, “XQuery 1.0: An XML Query Language,” Technical report, W3C. Working draft.Google Scholar
  4. Bodlaender, H.L., 1993, “A tourist guide through treewidth,” Acta Cybernetica 11, 1–23.Google Scholar
  5. Bodlaender, H.L., 1996, “A linear-time algorithm for finding tree-decompositions of small treewidth,” SIAM Journal on Computing 25, 1305–1317.CrossRefGoogle Scholar
  6. Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G., 2002, “The TIGER Treebank,” in Proceedings of the Workshop on Treebanks and Linguistic Theories, K. Simov, ed.,Sozopol.Google Scholar
  7. Brants, T., Skut, W., and Uszkoreit, H., 1999, “Syntatic annotation of a German newspaper corpus,” pp. 69–76 in Proceedings of the ATALA Treebank Workshop.Google Scholar
  8. Cornell, T., 2003, Personal communication.Google Scholar
  9. Courcelle, B., 1990a, “Graph rewriting: An algebraic and logic approach,” pp. 193–242 in Handbook of Theoretical Computer Science, Vol. B., Chapt 5, J. van Leeuwen, ed., Elsevier.Google Scholar
  10. Courcelle, B., 1990b, “The monadic second-order logic of graphs I: Recognizable sets of finite graphs,” Information and Computation 85, 12–75.CrossRefGoogle Scholar
  11. Courcelle, B.: 1992, “The mondic second-order logic of graphs III: Tree-decompositions, minors and complexity issues,” Informatique Théoretique et Applications 26, 257–286.Google Scholar
  12. Courcelle, B. and Mosbah, M., 1993, “Monadic second-order evaluation on tree-decomposable graphs,” Theoretical Computer Science 109, 49–82.CrossRefGoogle Scholar
  13. Dickinson, M. and Meurers, D., 2003, “Detecting Errors in Part-of-Speech Annotations,” pp. 107–114 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.Google Scholar
  14. Doner, J., 1970, “Tree acceptors and some of their applications,” Journal of Computer and System Sciences 4, 406–451.CrossRefGoogle Scholar
  15. Ebbinghaus, H.-D. and Flum, J., 1995, Finite Model Theory, Berlin, New York: Springer-Verlag.Google Scholar
  16. Gécseg, F. and Steinby, M., 1984, Tree Automata, Budapest: Akademiai Kiado.Google Scholar
  17. Hagerup, T., 2002, “Simpler and faster tree decomposition.” Manuscript, University of Frankfurt a. M.Google Scholar
  18. Hinrichs, E., Bartels, J., Kawata, Y., Kordoni, V., and Telljohann, H., 2000, “The VERBMOBIL treebanks,” in Proceedings of KONVENS 2000.Google Scholar
  19. Kallmeyer, L. and Steiner, I., 2002, “Querying treebanks of spontaneous speech with VIQTORYA,” Traitement Automatique des Langues 43(3), 155–179.Google Scholar
  20. Kay, M., 2001, “XSL Transformations (XSLT), Version 2.0.” Technical Report, W3C.Google Scholar
  21. Kepser, S., 2002, “A proof of the turing-completeness of XSLT and XQuery,” Technical Report, SFB 441.Google Scholar
  22. Kepser, S., 2003, “Finite structure query: A tool for querying syntactically annotated corpora,” pp. 179–186 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.Google Scholar
  23. König, E. and Lezius, W., 2000, “A description language for syntactically annotated corpora,” pp. 1056–1060 in Proceedings of the COLING Conference.Google Scholar
  24. Marcus, M., Santorini, B., and Marcinkiewicz, M. A., 1993, “Building a large annotated corpus of English: The Penn treebank”, Computational Linguistics 19(2), 313–330.Google Scholar
  25. Neven, F. and Schwentick, T., 2000, “Expressive and efficient pattern languages for tree-structured data,” in Proceedings PODS 2000, B. Ludäscher, ed.Google Scholar
  26. Rabin, M., 1977, “Decidable theories,” pp. 595–629 in Handbook of Mathematical Logic, J. Barwise, ed., North-Holland.Google Scholar
  27. Randall, B., 2000, “CorpusSearch user’s manual,” Technical Report, University of Pennsylvania,
  28. Robertson, N. and Seymour, P., 1986, “Graph minors II. Algorithmic aspects of treewidth,” Journal of Algorithms 7, 309–322.CrossRefGoogle Scholar
  29. Rogers, J., 2003, Personal communication.Google Scholar
  30. Rohde, D., 2001, “TGrep2,” Technical report, Carnegie Mellon University,
  31. Thatcher, J. and Wright, J., 1968, “Generalized finite automata theory with an application to a decision problem of second-order logic,” Mathematical Systems Theory 2(1), 57–81.CrossRefGoogle Scholar
  32. Vardi, M., 1982, “The complexity of relational query languages,” pp. 137–146 in Proceedings of the 14th ACM Symposium on Theory of Computing.Google Scholar
  33. W3 Consortium, 1999, “Extensible markup language (XML),” Technical Report, W3C.Google Scholar
  34. Wallis, S. and Nelson, G., 2000, “Exploiting fuzzy tree fragment queries in the investigation of parsed corpora,” Literary and Linguistic Computing 15(3), 339–361.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  1. 1.SFB 441, University of TübingenGermany

Personalised recommendations