Integrating a POS Tagger and a Chunker Implemented as Weighted Finite State Machines

  • Alexis Nasr
  • Alexandra Volanschi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)


This paper presents a method of integrating a probabilistic part-of-speech tagger and a chunker. This integration lead to the correction of a number of errors made by the tagger when used alone. Both tagger and chunker are implemented as weighted finite state machines. Experiments on a French corpus showed a decrease of the word error rate of about 12%.


Part-of-speech tagging chunking weighted finite state machines 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mohri, M.: Weighted Grammars Tools: the GRM Library. In: Junqua, J.-C., Van Noord, G. (eds.) Robustness in Language and Speech Technology, pp. 19–40. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  2. 2.
    Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23 (1997)Google Scholar
  3. 3.
    Bahl, L.R., Mercer, R.L.: Part of speech assignment by a statistical decision algorithm. In: Proceedings IEEE International Symposium on Information Theory, pp. 88–89 (1976)Google Scholar
  4. 4.
    Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing 35, 400–401 (1987)CrossRefGoogle Scholar
  5. 5.
    Allauzen, C., Mohri, M., Roark, B.: Generalized algorithms for constructing statistical language models. In: 41st Meeting of the Association for Computational Linguistics, Sapporo, Japon, pp. 40–47 (2003)Google Scholar
  6. 6.
    Tzoukermann, E., Radev, D.R.: Use of weighted finite state trasducers in part of speech tagging. Natural Language Engineering (1997)Google Scholar
  7. 7.
    Kempe, A.: Finite state transducers approximating hidden markov models. In: 35th Meeting of the Association for Computational Linguistics (ACL 1997), Madrid, Spain, pp. 460–467 (1997)Google Scholar
  8. 8.
    Jurish, B.: A hybrid approach to part-of-speech tagging. Technical report, Berlin-Brandenburgishe Akademie der Wissenschaften (2003)Google Scholar
  9. 9.
    Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for french. In: Abeillé, A. (ed.) Treebanks. Kluwer, Dordrecht (2003)CrossRefGoogle Scholar
  10. 10.
    Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer, Dordrecht (1991)CrossRefGoogle Scholar
  11. 11.
    Abney, S.: Partial parsing via finite-state cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prague, Czech Republic, pp. 8–15 (1996)Google Scholar
  12. 12.
    Abney, S.: Chunk stylebook (1996),
  13. 13.
    Mohri, M., Pereira, F.C.N.: Dynamic compilation of weighted context-free grammars. In: 36th Meeting of the Association for Computational Linguistics (ACL 1998) (1998)Google Scholar
  14. 14.
    Chen, K.H., Chen, H.H.: Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In: Meeting of the Association for Computational Linguistics, pp. 234–241 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alexis Nasr
    • 1
  • Alexandra Volanschi
    • 1
  1. 1.Lattice-CNRS (UMR 8094)Université Paris 7France

Personalised recommendations