Advertisement

Journal of Computer Science and Technology

, Volume 23, Issue 4, pp 612–619 | Cite as

Scaling Conditional Random Fields by One-Against-the-Other Decomposition

  • Hai Zhao
  • Chunyu KitEmail author
Short Paper

Abstract

As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a very small tag (or label) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. An optimal tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, we apply this approach to tackling Chinese word segmentation (CWS) as a sequence labeling problem. Our evaluation shows that it can reduce the computational cost of this language processing task by 40–50% without any significant performance loss on various large-scale data sets.

Keywords

natural language processing machine learning conditional random fields Chinese word segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_9157_MOESM1_ESM.pdf (65 kb)
(PDF 65.1 kb)

References

  1. [1]
    Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the Eighteenth International Conference on Machine Learning, ICML’01, Williams College: Morgan Kaufmann Publishers Inc., USA, 2001, pp.282–289.Google Scholar
  2. [2]
    Rosenfeld B, Feldman R, Fresko M. A systematic cross-comparison of sequence classifiers. In Proc. SDM 2006, Bethesda, Maryland, 2006, pp.563–567.Google Scholar
  3. [3]
    Sha F, Pereira F. Shallow parsing with conditional random fields. In Proc. the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, Vol. 1, 2003, pp.134–141.Google Scholar
  4. [4]
    Wallach H M. Efficient training of conditional random fields [Thesis]. Division of Informatics, University of Edinburgh, 2002.Google Scholar
  5. [5]
    Viterbi A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 1967, 13(2): 260–269.zbMATHCrossRefGoogle Scholar
  6. [6]
    Cohn T, Smith A, Osborne M. Scaling conditional random fields using error-correcting codes. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan: Association for Computational Linguistics, June 2005, pp.10–17.Google Scholar
  7. [7]
    Hsu C W, Lin C J. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 2002, 13(2): 415–425.CrossRefGoogle Scholar
  8. [8]
    Sutton C, McCallum A. Piecewise pseudolikelihood for efficient training of conditional random fields. In Proc. the 24th International Conference on Machine Learning, Corvalis, Oregon, ACM Press, June 20–24 2007, pp.863–870.Google Scholar
  9. [9]
    Toutanova K, Klein D, Manning C, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. HLT-NAACL’03, Edmonton, Canada, May 27–June 1, 2003, pp.252–259.Google Scholar
  10. [10]
    V Punyakanok, D Roth, W tau Yih, D Zimak. Learning and inference over constrained output. In Proc. IJCAI 2005, Edinburgh, Scotland, July 30–August 5, 2005, pp.1124–1129.Google Scholar
  11. [11]
    Abbeel P, Koller D, Ng A Y. Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research, 2006, 7: 1743–1788.MathSciNetGoogle Scholar
  12. [12]
    McCallum A, Sutton C. Piecewise training with parameter independence diagrams: Comparing globally- and locally-trained linear-chain CRFs. Tech. Rep. IR-383, Center for Intelligent Information Retrieval, University of Massachusetts, 2004, presented at NIPS 2004 Workshop on Learning with Structured Outputs.Google Scholar
  13. [13]
    Xue N. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29–48.Google Scholar
  14. [14]
    Peng F, Feng F, McCallum A. Chinese segmentation and new word detection using conditional random fields. In Proc. COLING 2004, Geneva, Switzerland, August 23–27, 2004, pp.562–568.Google Scholar
  15. [15]
    Tseng H, Chang P, Andrew G, Jurafsky D, Manning C. A conditional random field word segmenter for SIGHAN bakeoff 2005. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.168–171.Google Scholar
  16. [16]
    Tsai R T H, Hung H C, Sung C L, Dai H J, Hsu W L. On closed task of Chinese word segmentation: An improved CRF model coupled with character clustering and automatically generated template matching. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.108–117.Google Scholar
  17. [17]
    Zhao H, Huang C N, Li M. An improved Chinese word segmentation system with conditional random field. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.162–165.Google Scholar
  18. [18]
    Zhang R, Kikui G, Sumita E. Subword-based tagging by conditional random fields for Chinese word segmentation. In Proc. Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL - 2006), New York, 2006, pp.193–196.Google Scholar
  19. [19]
    Zhou G D. A chunking strategy towards unknown word detection in Chinese word segmentation. In Proc. the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005), Dale R, Wong K F, Su J, Kwong O Y (eds.), Jeju Island, Korea, Lecture Notes in Computer Science, Vol. 3651. Springer, October 11–13, 2005, pp.530–541.Google Scholar
  20. [20]
    Low J K, Ng H T, Guo W. A maximum entropy approach to Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.161–164.Google Scholar
  21. [21]
    Zhao H, Huang C N, Li M, Lu B L. Effective tag set selection in Chinese word segmentation via conditional random field modeling. In Proc. the 20th Asian Pacific Conference on Language, Information and Computation, Wuhan, China, November 1–3, 2006, pp.87–94.Google Scholar
  22. [22]
    Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.123–133.Google Scholar
  23. [23]
    Asahara M, Fukuoka K, Azuma A, Goh C L, Watanabe Y, Matsumoto Y, Tsuzuki T. Combination of machine learning methods for optimum Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.134–137.Google Scholar
  24. [24]
    Chen A, Zhou Y, Zhang A, Sun G, Unigram language model for Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.138–141.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  1. 1.Department of Chinese, Translation and LinguisticsCity University of Hong KongKowloonChina

Personalised recommendations