Algorithms for Minimum Risk Chunking

  • Martin Jansche
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)


Stochastic finite automata are useful for identifying substrings (chunks) within larger units of text. Relevant applications include tokenization, base-NP chunking, named entity recognition, and other information extraction tasks. For a given input string, a stochastic automaton represents a probability distribution over strings of labels encoding the location of chunks. For chunking and extraction tasks, the quality of predictions is evaluated in terms of precision and recall of the chunked/extracted phrases when compared against some gold standard. However, traditional methods for estimating the parameters of a stochastic finite automaton and for decoding the best hypothesis do not pay attention to the evaluation criterion, which we take to be the well-known F-measure. We are interested in methods that remedy this situation, both in training and decoding. Our main result is a novel algorithm for efficiently evaluating expected F-measure. We present the algorithm and discuss its applications for utility/ risk-based parameter estimation and decoding.


Noun Phrase Natural Language Processing Entity Recognition Label Sequence Empirical Risk Minimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: ANLP, pp. 136–143 (1988)Google Scholar
  2. 2.
    Voutilainen, A.: NPtool, a detector of English noun phrases. In: WVLC, pp. 48–57 (1993)Google Scholar
  3. 3.
    Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: WVLC, pp. 82–94 (1995)Google Scholar
  4. 4.
    Tjong Kim Sang, E.F., Veenstra, J.: Representing text chunks. In: EACL, pp. 173–179 (1999)Google Scholar
  5. 5.
    Punyakanok, V., Roth, D.: The use of classifiers in sequential inference. In: NIPS, pp. 995–1001 (2000)Google Scholar
  6. 6.
    Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A high-performance learning name-finder. In: ANLP, pp. 194–201 (1997)Google Scholar
  7. 7.
    Freitag, D.: Toward general-purpose learning for information extraction. In: COLING-ACL, pp. 404–408 (1998)Google Scholar
  8. 8.
    Zhou, G.: Chunking-based Chinese word tokenization. In: SIGHAN (2003)Google Scholar
  9. 9.
    van Rijsbergen, C.J.: Foundation of evaluation. Journal of Documentation 30, 365–373 (1974)CrossRefGoogle Scholar
  10. 10.
    Jansche, M.: Maximum expected F-measure training of logistic regression models. In: HLT-EMNLP, pp. 692–699 (2005)Google Scholar
  11. 11.
    Mohri, M., Pereira, F., Riley, M.: The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231, 17–32 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Bengio, Y.: Markovian models for sequential data. Neural Computing Surveys 2, 129–162 (1999)Google Scholar
  13. 13.
    Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, p. 15. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Collins, M.: Machine learning methods in natural language processing. Tutorial presented at COLT (2003)Google Scholar
  15. 15.
    Ratnaparkhi, A.: Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania (1998)Google Scholar
  16. 16.
    Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task. In: CoNLL, pp. 155–158 (2002)Google Scholar
  17. 17.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Chichester (2000)zbMATHGoogle Scholar
  18. 18.
    Zhang, T., Damerau, F., Johnson, D.: Text chunking using regularized winnow. In: ACL, pp. 539–546 (2001)Google Scholar
  19. 19.
    Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)zbMATHGoogle Scholar
  20. 20.
    Zhang, T., Johnson, D.: A robust risk minimization based named entity recognition system. In: CoNLL, pp. 204–207 (2003)Google Scholar
  21. 21.
    Stolcke, A., König, Y., Weintraub, M.: Explicit word error minimization in n-best list rescoring. In: EuroSpeech (1997)Google Scholar
  22. 22.
    Kumar, S., Byrne, W.: Minimum Bayes-risk decoding for machine translation. In: HLT-NAACL, pp. 169–176 (2004)Google Scholar
  23. 23.
    van Rijsbergen, C.J.: Information Retrieval, 1st edn. Butterworths (1975)Google Scholar
  24. 24.
    Mohri, M., Pereira, F., Riley, M.: Weighted automata in text and speech processing. In: ECAI 1996 Workshop on Extended Finite State Models of Language, pp. 46–50 (1996)Google Scholar
  25. 25.
    Allauzen, C., Mohri, M., Roark, B.: Generalized algorithms for constructing language models. In: ACL, pp. 40–47 (2003)Google Scholar
  26. 26.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 1st edn. MIT Press, Cambridge (1990)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Martin Jansche
    • 1
  1. 1.Center for Computational Learning SystemsColumbia UniversityNew YorkUSA

Personalised recommendations