Skip to main content

Part-of-Speech Tagging with Two Sequential Transducers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2088))

Abstract

We present a method of constructing and using a cascade consisting of a left-and a right-sequential finite-state transducer (FST), T 1 and T 2, for part-of-speech (POS) disambiguation. Compared to a Hidden Markov model (HMM), this FST cascade has the advantage of significantly higher processing speed, but at the cost of slightly lower accuracy. Applications such as Information Retrieval, where the speed can be more important than accuracy, could benefit from this approach.

In the process of POS tagging, we first assign every word of a sentence a unique ambiguity class c i that can be looked up in a lexicon encoded by a sequential FST. Every c i is denoted by a single symbol, e.g. “[ADJ NOUN]”, although it represents a set of alternative tags that a given word can occur with. The sequence of the c i of all words of one sentence is the input to our FST cascade (Fig. 1). It is mapped by T 1, from left to right, to a sequence of reduced ambiguity classes r i. Every r i is denoted by a single symbol, although it represents a set of alternative tags. Intuitively, T 1 eliminates the less likely tags from c i, thus creating r i. Finally, T 2 maps the sequence of r i, from right to left, to an output sequence of single POS tags t i. Intuitively, T 2 selects the most likely ti from every r i (Fig. 1).

Although our approach is related to the concept of bimachines [2] and factorization [1], we proceed differently in that we build two sequential FSTs directly and not by factorization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. C.C. Elgot, and J.E. Mezei. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development, pages 47–68, January.

    Google Scholar 

  2. M.P. Schützenberger. 1961. A remark on finite transducers. Information and Control, 4:185–187.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kempe, A. (2001). Part-of-Speech Tagging with Two Sequential Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_34

Download citation

  • DOI: https://doi.org/10.1007/3-540-44674-5_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42491-8

  • Online ISBN: 978-3-540-44674-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics