Skip to main content

Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation

  • Conference paper
Natural Language Processing and Chinese Computing (NLPCC 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 333))

Abstract

A key problem in Chinese Word Segmentation is that the performance of a system will decrease when applied to a different domain. We propose an approach in which n-gram features from large raw corpus are explored to realize domain adaptation for Chinese Word Segmentation. The n-gram features include n-gram frequency feature and AV feature. We used the CRF model and a raw corpus consisting of 1 million patent description sentences to verify the proposed method. For test data, 300 patent description sentences are randomly selected and manually annotated. The results show that the improvement of Chinese Word Segmentation on the test data achieved at 2.53%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, M., Deng, Z., Che, W.: Combining Ststistical Model and Dictionary for Domain Adaptation of Chinese Word Segmentation. Journal of Chinese Information Processing (2012)

    Google Scholar 

  2. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: ICML (2001)

    Google Scholar 

  3. Zhao, H., Huang, C., Li, M., Lu, B.: Effective tag set selection in Chinese Word Segmentation via conditional random field modeling. In: PACLIC 2006, Wuhan, China, pp. 87–94 (2006)

    Google Scholar 

  4. Low, J.K., Ng, H.T., Guo, W.: A Maximum Entropy Approach to Chinese Word Segmentation. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (SIGHAN 2005), pp. 161–164 (2005)

    Google Scholar 

  5. Wang, Y., Kazama, J., Tsuruoka, Y., Chen, W., Zhang, Y., Torisawa, K.: Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data. In: Proceedings of the 5th IJCNLP, pp. 309–317 (2011)

    Google Scholar 

  6. Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor variety criteria for Chinese word extraction. J. Computational Linguistics 30, 75–93 (2004)

    Article  Google Scholar 

  7. Zhao, H., Kit, C.: Incorporating global information into supervised learning for Chinese Word Segmentation. In: PACLING 2007, Melbourne, Australia, pp. 66–74 (2007)

    Google Scholar 

  8. Zhao, H., Kit, C.: Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition. In: Proceedings of the Six SIGHAN Workshop on Chinese Language Processing, Hyderabad, India, pp. 106–111 (2008)

    Google Scholar 

  9. Luo, Y., Huang, D.: Chinese Word Segmentation Based on the Marginal Probabilities Generated by CRFs. Journal of Chinese Information Processing 23, 3–8 (2009)

    Google Scholar 

  10. Xia, F.: The Segmentation Guidelines for the Penn Chinese Treebank (3.0) (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, Z., Zhang, Y., Su, C., Xu, J. (2012). Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation. In: Zhou, M., Zhou, G., Zhao, D., Liu, Q., Zou, L. (eds) Natural Language Processing and Chinese Computing. NLPCC 2012. Communications in Computer and Information Science, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34456-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34456-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34455-8

  • Online ISBN: 978-3-642-34456-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics