Abstract
Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. Several methods have been proposed to deal with the POS tagging task in Vietnamese. They can be divided into two types of models: word-based models and syllable-based models. While a word-based model assigns a POS tag to each word, a syllable-based model assigns a POS tag to each syllable. This chapter presents a new model for Vietnamese POS tagging using dual decomposition. The chapter shows how dual decomposition can be exploited to integrate a word-based model and a syllable-based model to yield a more powerfulmodel for tagging Vietnamese sentences. Then the chapter describes experiments on the Viet Treebank corpus, a large annotated corpus for Vietnamese POS tagging. This chapter also presents an error analysis to investigate which types of words in Vietnamese are more difficult to tag than other words. Experimental results show that the word-based model and the syllable-based model are complementary. Moreover, the proposed model using dual decomposition outperforms both the word-based and the syllable-based models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Le, H., Roussanaly, A., Nguyen, T., Rossignol, M.: An empirical study of maximum entropy approach for part-of-speech tagging of vietnamese texts. In: Proceedings of TALN 2010 (2010)
Tran, T., Le, A., Ha, Q., Le, H.: An experimental study on vietnamese pos tagging. In: Proceedings of the International Conference on Asian Language Processing (IALP), pp. 23–27 (2009)
Tran, T., Le, A., Ha, Q.: Improving vietnamese word segmentation and pos tagging using mem with various kinds of resources. Journal of Natural Language Processing 17(3), 41–60 (2010)
Nghiem, M., Dinh, D., Nguyen, M.: Improving vietnamese pos tagging by integrating a rich feature set and support vector machines. In: Proceedings of the IEEE International Conference on Research, Innovation and Vision for the Future in Computing & Communication Technologies (RIVF), pp. 128–133 (2008)
Nguyen, L., Xuan, B., Viet, C., Nhat, M., Shimazu, A.: A semi-supervised learning method for vietnamese part-of-speech tagging. In: Proceedings of the 2nd International Conference on Knowledge and Systems Engineering (KSE), pp. 141–146 (2010)
Rush, A., Collins, M.: A tutorial on dual decomposition and lagrangian relaxation for inference in natural language processing. Journal of Artificial Intelligence Research 45, 305–362 (2012)
Sontag, D., Globerson, A., Jaakkola, T.: Introduction to dual decomposition for inference. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press (2011)
Berger, A., Pietra, V., Pietra, S.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)
Tsuruoka, Y.: A simple c++ library for maximum entropy classification (2006), http://www-tsujii.is.s.u-tokyo.ac.jp/tsuruoka/maxent/
Nguyen, P., Vu, X., Nguyen, T., Nguyen, V., Le, H.: Building a large syntactically-annotated corpus of vietnamese. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP, pp. 182–185 (2009)
Mai, N., Vu, D., Hoang, T.: Foundations of Linguistics and Vietnamese. Education Publisher (1997)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 133–142 (1996)
Darroch, J., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics of Computation 35(151), 773–782 (1980)
Koeling, R.: Chunking with maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning(CoNLL), pp. 139–141 (2000)
Zhao, H., Kit, C.: Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning (CoNLL), pp. 203–207 (2008)
Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies(NAACL-HLT), pp. 406–414 (2009)
Rush, A., Sontag, D., Collins, M., Tommi, J.: On dual decomposition and linear programming relaxations for natural language processing. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–11 (2010)
Koo, T., Rush, A., Collins, M., Jaakkola, T., Sontag, D.: Dual decomposition for parsing with non-projective head automata. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1288–1298 (2010)
Hanamoto, A., Matsuzaki, T., Tsujii, J.: Coordination structure analysis using dual decomposition. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 430–438 (2012)
Xuan Bach, N., Le Minh, N., Shimazu, A.: UDRST: A novel system for unlabeled discourse parsing in the RST framework. In: Isahara, H., Kanzaki, K. (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 250–261. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Bach, N.X., Hiraishi, K., Le Minh, N., Shimazu, A. (2015). A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition. In: Tweedale, J., Jain, L., Watada, J., Howlett, R. (eds) Knowledge-Based Information Systems in Practice. Smart Innovation, Systems and Technologies, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-319-13545-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-13545-8_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13544-1
Online ISBN: 978-3-319-13545-8
eBook Packages: EngineeringEngineering (R0)