Advertisement

An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing

  • Gertjan van Noord
  • Dale Gerdemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2214)

Abstract

Finite-state techniques are widely used in various areas of Natural Language Processing (NLP).As Kaplan and Kay [12] have argued, regular expressions are the appropriate level of abstraction for thinking about finite-state languages and finite-state relations.More complex finite-state operations (such as contexted replacement) are defined on the basis of basic operations (such as Kleene closure, complementation, composition).

In order to be able to experiment with such complex finite-state operations the FSA Utilities (version 5) provides an extendible regular expression compiler.The paper discusses the regular expression operations provided by the compiler, and the possibilities to create new regular expression operators.The benefits of such an extendible regular expression compiler are illustrated with a number of examples taken from recent publications in the area of finite-state approaches to NLP.

Keywords

Optimality Theory Natural Language Processing Regular Expression Regular Language Computational Linguistics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Steven Abney. Partial parsing via finite-state cascades. In John Carroll, editor, Workshop on Robust Parsing; Eight European Summer School in Logic, Language and Information, pages 8–15, 1995.Google Scholar
  2. [2]
    Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-W esley, 1974.Google Scholar
  3. [3]
    Gosse Bouma.A modern computational linguistics course using dutch. In EACL 99: Computer and Internet Supported Education in Language and Speech Technology. Proceedings of a Workshop sponsored by ELSNET and The Association for Computational Linguistics, Bergen Norway, 1999.Google Scholar
  4. [4]
    Christian S. Calude, Kai Salomaa, and Sheng Yu.Metric lexical analysis. In O. Boldt, H. Juergensen, and L. Robbins, editors, Workshop on Implementing Automata; WIA99 Pre-Proceedings, Potsdam Germany, 1999.Google Scholar
  5. [5]
    Jean-Pierre Chanod and Pasi Tapanainen.A robust finite-state grammar for French. In John Carroll, editor, Workshop on Robust Parsing, Prague, 1996. These proceedings are also available as Cognitive Science Research Paper #435; School of Cognitive and Computing Sciences, University of Sussex.Google Scholar
  6. [6]
    P.C. Uit den Boogaart. Woordfrequenties in geschreven en gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht, 1975. Werkgroep Frequentie-onderzoek van het Nederlands.Google Scholar
  7. [7]
    Dale Gerdemann and Gertjan van Noord.Transducers from rewrite rules with backreferences. In Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen Norway, 1999.Google Scholar
  8. [8]
    Gregory Grefenstette.Light parsing as finite-state filtering. In EACI 1996 Workshop Extended Finite-State Models of Language, Budapest, 1996.Google Scholar
  9. [9]
    John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In Z. Kohavi, editor, The Theory of Machines and Computations, pages 189–196. Academic Press, 1971.Google Scholar
  10. [10]
    John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley, 1979.Google Scholar
  11. [11]
    C. Douglas Johnson. Formal Aspects of Phonological Descriptions. Mouton, The Hague, 1972.Google Scholar
  12. [12]
    Ronald Kaplan and Martin Kay.Regular models of phonological rule systems. Computational Linguistics, 20(3):331–379, 1994.Google Scholar
  13. [13]
    Lauri Karttunen.The replace operator. In 33th Annual Meeting of the Association for Computational Linguistics, M.I.T. Cambridge Mass., 1995.Google Scholar
  14. [14]
    Lauri Karttunen.Directed replacement. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.Google Scholar
  15. [15]
    Lauri Karttunen. The replace operator. In Emannual Roche and Yves Schabes, editors, Finite-State Language Processing, pages 117–147. Bradford, MIT Press, 1997.Google Scholar
  16. [16]
    Lauri Karttunen.The proper treatment of optimality theory in computational phonology. In Finite-state Methods in Natural Language Processing, pages 1–12, Ankara, 1998.Google Scholar
  17. [17]
    George Anton Kiraz and Edmund Grimley-Evans.Multi-tape automata for speech and language systems: A prolog implementation. In Derick Wood and Sheng Yu, editors, Automata Implementation. Second Internation Workshop on Implementing Automata, WIA’ 97, pages 87–103. Springer Lecture Notes in Computer Science 1436, 1998.Google Scholar
  18. [18]
    Mehryar Mohri, Fernando C.N. Pereira, and Michael Riley. A rational design for a weighted finite-state transducer library. In Automata Implementation. Second International Workshop on Implementing Automata, WIA’ 97. Springer Verlag, 1998. Lecture Notes in Computer Science 1436.CrossRefGoogle Scholar
  19. [19]
    Mehryar Mohri and Richard Sproat.An efficient compiler for weighted rewrite rules. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.Google Scholar
  20. [20]
    Alan Prince and Paul Smolensky. Optimalit y theory: Constraint interaction in generative grammar. Technical Report TR-2, Rutgers University Cognitive Science Center, New Brunswick, NJ, 1993. MIT Press, To Appear.Google Scholar
  21. [21]
    D. Raymond and D. Wood. The grail papers. Technical Report TR-491, University of Western Ontario, Department of Computer Science, London Ontario, 1996.Google Scholar
  22. [22]
    Emmanuel Roche.Parsing with finite-state transducers. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, pages 241–281. MIT Press, Cambridge, 1997.Google Scholar
  23. [23]
    Emmanuel Roche and Yves Schabes.Introduction. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing. MIT Press, Cambridge, Mass, 1997.Google Scholar
  24. [24]
    Gertjan van Noord.FSA Utilities: A toolbox to manipulate finite-state automata. In Darrell Raymond, Derick Wood, and Sheng Yu, editors, Automata Implementation, pages 87–108. Springer Verlag, 1997. Lecture Notes in Computer Science 1260.Google Scholar
  25. [25]
    Gertjan van Noord.FSA Utilities (version 5), 1998. The FSAUtilities toolbox is available free of charge under Gnu General Public License at http://www.let.rug.nl/~vannoord/Fsa/.
  26. [26]
    Gertjan van Noord.The treatment of epsilon moves in subset construction. In Finite-state Methods in Natural Language Processing, Ankara, 1998. cmplg/ 9804003.Accepted for Computational Linguistics.Google Scholar
  27. [27]
    Bruce W. Watson. Taxonomies and Toolkits of Regular Language Algorithms. PhD thesis, Eindhoven University of Technology, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Gertjan van Noord
    • 1
  • Dale Gerdemann
    • 2
  1. 1.University of GroningenGroningen
  2. 2.University of TübingenTübingen

Personalised recommendations