Part-of-Speech Tagging with Two Sequential Transducers

Kempe, André

doi:10.1007/3-540-44674-5_34

Part-of-Speech Tagging with Two Sequential Transducers

André Kempe⁵

Conference paper
First Online: 01 January 2001

403 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2088))

Abstract

We present a method of constructing and using a cascade consisting of a left-and a right-sequential finite-state transducer (FST), T ₁ and T ₂, for part-of-speech (POS) disambiguation. Compared to a Hidden Markov model (HMM), this FST cascade has the advantage of significantly higher processing speed, but at the cost of slightly lower accuracy. Applications such as Information Retrieval, where the speed can be more important than accuracy, could benefit from this approach.

In the process of POS tagging, we first assign every word of a sentence a unique ambiguity class c _i that can be looked up in a lexicon encoded by a sequential FST. Every c _i is denoted by a single symbol, e.g. “[ADJ NOUN]”, although it represents a set of alternative tags that a given word can occur with. The sequence of the c _i of all words of one sentence is the input to our FST cascade (Fig. 1). It is mapped by T ₁, from left to right, to a sequence of reduced ambiguity classes r _i. Every r _i is denoted by a single symbol, although it represents a set of alternative tags. Intuitively, T ₁ eliminates the less likely tags from c _i, thus creating r _i. Finally, T ₂ maps the sequence of r _i, from right to left, to an output sequence of single POS tags t _i. Intuitively, T ₂ selects the most likely ti from every r _i (Fig. 1).

Although our approach is related to the concept of bimachines [2] and factorization [1], we proceed differently in that we build two sequential FSTs directly and not by factorization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

C.C. Elgot, and J.E. Mezei. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development, pages 47–68, January.
Google Scholar
M.P. Schützenberger. 1961. A remark on finite transducers. Information and Control, 4:185–187.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Centre Europe - Grenoble Laboratory, 6 chemin de Maupertuis, 38240, Meylan, France
André Kempe

Authors

André Kempe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Middlesex College, The University of Western Ontario, London, ON, Canada, N6A 5B7
Shen Yu & Andrei Păun &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kempe, A. (2001). Part-of-Speech Tagging with Two Sequential Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_34

Download citation

DOI: https://doi.org/10.1007/3-540-44674-5_34
Published: 20 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42491-8
Online ISBN: 978-3-540-44674-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics