Abstract
A service-oriented architecture called as HANS is proposed to facilitate Chinese natural language processing. This unified framework seamlessly integrates fundamental NLP tasks including word segmentation, part-of-speech tagging, named entity recognition, chunking, paring, and semantic role labeling to enhance Chinese language processing functionality. A basic Chinese word segmentation task is used to illustrate the function of the proposed architecture. to demonstrate the effects. Evaluated benchmarks are taken from the SIGHAN 2005 bakeoff and the NLPCC 2016 shared task. We implement publicly released toolkits including Stanford CoreNLP, FudanNLP and CKIP as services in our HANS framework for performance comparison. Experimental results confirm the feasibility of the proposed architecture. Findings are also discussed to point to potential future developments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wong, K.-F., Li, W., Xu, R., Zhang, Z.: Introduction to Chinese natural language processing. Synth. Lect. Hum. Lang. Technol. 2, 1–148 (2009)
Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Int. J. Comput. Linguist. Chin. Lang. Process. 3(1), 27–44 (1998)
Chen, K.-J., Ma, W.-Y.: Unknown word extraction for Chinese documents. In: 19th International Conference on Computational Linguistics, pp. 169–175. ACL Anthology (2002)
Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: a pragmatic approach. Comput. Linguist. 31(4), 531–574 (2005)
Peng, F., Feng, F., MaCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: 20th International Conference on Computational Linguistics, pp. 562–568. ACL Anthology (2004)
Li, J., Wang, H., Ren, D., Li, G.: Discriminative pruning of language models for Chinese word segmentation. In: 44th Annual Meeting of the Association for Computational Linguistics, pp. 1001–1008. ACL Anthology (2006)
Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)
Zhao, H., Huang, C.-N., Li, M., Lu, B.-L.: A unified character-based tagging framework for Chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2) (2010). Article 5
Wang, F.L., Yang, C.C.: Mining web data for Chinese segmentation. J. Am. Soc. Inf. Sci. Technol. 58(12), 1820–1837 (2007)
Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657. ACL Anthology (2013)
Wang, M., Voigt, R., Manning, C.D.: Two knives cut better than one: Chinese word segmentation with dual decomposition. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 193–198. ACL Anthology (2014)
Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 293–303. ACL Anthology (2014)
Sproat, R., Emerson, T.: The first international Chinese word segmentation bakeoff. In: 2nd SIGHAN Workshop on Chinese Language Processing. ACL Anthology (2003)
Emerson, T.: The second international Chinese word segmentation bakeoff. In: 4th SIGHAN Workshop on Chinese Language Processing, pp. 123–133. ACL Anthology (2005)
Levow, G.-A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: 5th SIGHAN Workshop on Chinese Language Processing, pp. 108–117. ACL Anthology (2006)
Jin, G., Chen, X.: The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In: 6th SIGHAN Workshop on Chinese Language Processing, pp. 69–81. ACL Anthology (2008)
Qiu, X., Qian, P., Yin, L., Wu, S., Huang, X.: Overview of the NLPCC 2015 shared task: Chinese word segmentation and POS tagging for micro-blog texts. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 541–549. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_50
Qiu, X., Qian, P., Shi, Z.: Overview of the NLPCC-ICCPOL 2016 shared task: Chinese word segmentation for micro-blog texts. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 901–906. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_84
ActiveMQ. http://activemq.apache.org
Ma, W.-Y., Chen, K.-J.: Design of CKIP Chinese word segmentation system. Int. J. Asian Lang. Process. 14(3), 235–249 (2004)
Qiu, X., Zhang, Q., Huang, X.: FudanNLP: a toolkit for Chinese natural language processing. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 49–54. ACL Anthology (2013)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. ACL Anthology (2014)
Acknowledgments
This study was partially supported by the Ministry of Science and Technology, under the grant MOST 105-2221-E-003-020-MY2 and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lee, LH., Lee, KC., Tseng, YH. (2018). HANS: A Service-Oriented Framework for Chinese Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)