Skip to main content

An Integrated Approach to Chinese Word Segmentation and Part-of-Speech Tagging

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

  • 1019 Accesses

Abstract

This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration’, is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 133–143 (2003)

    Google Scholar 

  2. Emerson, T.: The Second International Chinese Word Segmentation Bakeoff. In: Proceedings of the Third SIHAN Workshop on Chinese Language Processing, Jeju, Korea (2005)

    Google Scholar 

  3. Liang, N.Y.: Knowledge of Chinese Word Segmentation. Journal of Chinese Information Processing 4(2), 29–33 (1990)

    Google Scholar 

  4. Sun, M.S., Lai, B.Y., et al.: Some Issues on Statistical Approach to Chinese Word Identification. In: Proceedings of the 3rd International Conference on Chinese Information Processing, Beijing, pp. 246–253 (1992)

    Google Scholar 

  5. Chang, C.H., Chen, C.D.: A Study on Integrating Chinese Word Segmentation and Part-of-speech Tagging. Communications of COLIPS 3(2), 69–77 (1993)

    Google Scholar 

  6. Lai, B.Y., Sun, M.S., et al.: Tagging-based First Order Markov Model Approach to Chinese Word Identification. In: Proceedings of 1992 International Conference on Computer Processing of Chinese and Oriental Languages, Florida, USA (1992)

    Google Scholar 

  7. Bai, S.H.: The Method of Integration of Word Segmentation and Part-of-speech Tagging in Chinese Texts. In: Advance and Application of Computational Linguistics, pp. 56–61. Tsinghua University Press, Beijing (1995)

    Google Scholar 

  8. Lai, B.Y., Sun, M.S., et al.: Chinese Word Segmentation and Part-of-speech Tagging in One Step. In: Proceedings of International Conference: 1997 Research on Computational Linguistics, Taipei, pp. 229–236 (1997)

    Google Scholar 

  9. Wu, A.D., Jiang, Z.X.: Word Segmentation in Sentence Analysis. In: Proceedings of the 1998 International Conference on Chinese Information Processing, Beijing, pp. 169–180 (1998)

    Google Scholar 

  10. Sun, M.S., Xu, D.L., Tsou, B.K.: Integrated Chinese Word Segmentation and Part-of-speech Tagging Based on the Divide-and-Conquer Strategy. In: Proceedings of IEEE-NLPKE, Beijing (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, M., Xu, D., Tsou, B.K., Lu, H. (2006). An Integrated Approach to Chinese Word Segmentation and Part-of-Speech Tagging. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_31

Download citation

  • DOI: https://doi.org/10.1007/11940098_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics