Skip to main content

HANS: A Service-Oriented Framework for Chinese Language Processing

  • Conference paper
  • First Online:
  • 888 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Abstract

A service-oriented architecture called as HANS is proposed to facilitate Chinese natural language processing. This unified framework seamlessly integrates fundamental NLP tasks including word segmentation, part-of-speech tagging, named entity recognition, chunking, paring, and semantic role labeling to enhance Chinese language processing functionality. A basic Chinese word segmentation task is used to illustrate the function of the proposed architecture. to demonstrate the effects. Evaluated benchmarks are taken from the SIGHAN 2005 bakeoff and the NLPCC 2016 shared task. We implement publicly released toolkits including Stanford CoreNLP, FudanNLP and CKIP as services in our HANS framework for performance comparison. Experimental results confirm the feasibility of the proposed architecture. Findings are also discussed to point to potential future developments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wong, K.-F., Li, W., Xu, R., Zhang, Z.: Introduction to Chinese natural language processing. Synth. Lect. Hum. Lang. Technol. 2, 1–148 (2009)

    Article  Google Scholar 

  2. Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Int. J. Comput. Linguist. Chin. Lang. Process. 3(1), 27–44 (1998)

    Google Scholar 

  3. Chen, K.-J., Ma, W.-Y.: Unknown word extraction for Chinese documents. In: 19th International Conference on Computational Linguistics, pp. 169–175. ACL Anthology (2002)

    Google Scholar 

  4. Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: a pragmatic approach. Comput. Linguist. 31(4), 531–574 (2005)

    Article  Google Scholar 

  5. Peng, F., Feng, F., MaCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: 20th International Conference on Computational Linguistics, pp. 562–568. ACL Anthology (2004)

    Google Scholar 

  6. Li, J., Wang, H., Ren, D., Li, G.: Discriminative pruning of language models for Chinese word segmentation. In: 44th Annual Meeting of the Association for Computational Linguistics, pp. 1001–1008. ACL Anthology (2006)

    Google Scholar 

  7. Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)

    Article  Google Scholar 

  8. Zhao, H., Huang, C.-N., Li, M., Lu, B.-L.: A unified character-based tagging framework for Chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2) (2010). Article 5

    Google Scholar 

  9. Wang, F.L., Yang, C.C.: Mining web data for Chinese segmentation. J. Am. Soc. Inf. Sci. Technol. 58(12), 1820–1837 (2007)

    Article  Google Scholar 

  10. Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657. ACL Anthology (2013)

    Google Scholar 

  11. Wang, M., Voigt, R., Manning, C.D.: Two knives cut better than one: Chinese word segmentation with dual decomposition. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 193–198. ACL Anthology (2014)

    Google Scholar 

  12. Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 293–303. ACL Anthology (2014)

    Google Scholar 

  13. Sproat, R., Emerson, T.: The first international Chinese word segmentation bakeoff. In: 2nd SIGHAN Workshop on Chinese Language Processing. ACL Anthology (2003)

    Google Scholar 

  14. Emerson, T.: The second international Chinese word segmentation bakeoff. In: 4th SIGHAN Workshop on Chinese Language Processing, pp. 123–133. ACL Anthology (2005)

    Google Scholar 

  15. Levow, G.-A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: 5th SIGHAN Workshop on Chinese Language Processing, pp. 108–117. ACL Anthology (2006)

    Google Scholar 

  16. Jin, G., Chen, X.: The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In: 6th SIGHAN Workshop on Chinese Language Processing, pp. 69–81. ACL Anthology (2008)

    Google Scholar 

  17. Qiu, X., Qian, P., Yin, L., Wu, S., Huang, X.: Overview of the NLPCC 2015 shared task: Chinese word segmentation and POS tagging for micro-blog texts. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 541–549. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_50

    Chapter  Google Scholar 

  18. Qiu, X., Qian, P., Shi, Z.: Overview of the NLPCC-ICCPOL 2016 shared task: Chinese word segmentation for micro-blog texts. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 901–906. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_84

    Chapter  Google Scholar 

  19. ActiveMQ. http://activemq.apache.org

  20. Ma, W.-Y., Chen, K.-J.: Design of CKIP Chinese word segmentation system. Int. J. Asian Lang. Process. 14(3), 235–249 (2004)

    Google Scholar 

  21. Qiu, X., Zhang, Q., Huang, X.: FudanNLP: a toolkit for Chinese natural language processing. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 49–54. ACL Anthology (2013)

    Google Scholar 

  22. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. ACL Anthology (2014)

    Google Scholar 

Download references

Acknowledgments

This study was partially supported by the Ministry of Science and Technology, under the grant MOST 105-2221-E-003-020-MY2 and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lung-Hao Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, LH., Lee, KC., Tseng, YH. (2018). HANS: A Service-Oriented Framework for Chinese Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics