A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition

Bach, Ngo Xuan; Hiraishi, Kunihiko; Le Minh, Nguyen; Shimazu, Akira

doi:10.1007/978-3-319-13545-8_20

A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition

Ngo Xuan Bach⁷,
Kunihiko Hiraishi⁷,
Nguyen Le Minh⁷ &
…
Akira Shimazu⁷

Chapter

872 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 30))

Abstract

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. Several methods have been proposed to deal with the POS tagging task in Vietnamese. They can be divided into two types of models: word-based models and syllable-based models. While a word-based model assigns a POS tag to each word, a syllable-based model assigns a POS tag to each syllable. This chapter presents a new model for Vietnamese POS tagging using dual decomposition. The chapter shows how dual decomposition can be exploited to integrate a word-based model and a syllable-based model to yield a more powerfulmodel for tagging Vietnamese sentences. Then the chapter describes experiments on the Viet Treebank corpus, a large annotated corpus for Vietnamese POS tagging. This chapter also presents an error analysis to investigate which types of words in Vietnamese are more difficult to tag than other words. Experimental results show that the word-based model and the syllable-based model are complementary. Moreover, the proposed model using dual decomposition outperforms both the word-based and the syllable-based models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Le, H., Roussanaly, A., Nguyen, T., Rossignol, M.: An empirical study of maximum entropy approach for part-of-speech tagging of vietnamese texts. In: Proceedings of TALN 2010 (2010)
Google Scholar
Tran, T., Le, A., Ha, Q., Le, H.: An experimental study on vietnamese pos tagging. In: Proceedings of the International Conference on Asian Language Processing (IALP), pp. 23–27 (2009)
Google Scholar
Tran, T., Le, A., Ha, Q.: Improving vietnamese word segmentation and pos tagging using mem with various kinds of resources. Journal of Natural Language Processing 17(3), 41–60 (2010)
Article Google Scholar
Nghiem, M., Dinh, D., Nguyen, M.: Improving vietnamese pos tagging by integrating a rich feature set and support vector machines. In: Proceedings of the IEEE International Conference on Research, Innovation and Vision for the Future in Computing & Communication Technologies (RIVF), pp. 128–133 (2008)
Google Scholar
Nguyen, L., Xuan, B., Viet, C., Nhat, M., Shimazu, A.: A semi-supervised learning method for vietnamese part-of-speech tagging. In: Proceedings of the 2nd International Conference on Knowledge and Systems Engineering (KSE), pp. 141–146 (2010)
Google Scholar
Rush, A., Collins, M.: A tutorial on dual decomposition and lagrangian relaxation for inference in natural language processing. Journal of Artificial Intelligence Research 45, 305–362 (2012)
MATH MathSciNet Google Scholar
Sontag, D., Globerson, A., Jaakkola, T.: Introduction to dual decomposition for inference. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press (2011)
Google Scholar
Berger, A., Pietra, V., Pietra, S.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)
Google Scholar
Tsuruoka, Y.: A simple c++ library for maximum entropy classification (2006), http://www-tsujii.is.s.u-tokyo.ac.jp/tsuruoka/maxent/
Nguyen, P., Vu, X., Nguyen, T., Nguyen, V., Le, H.: Building a large syntactically-annotated corpus of vietnamese. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP, pp. 182–185 (2009)
Google Scholar
Mai, N., Vu, D., Hoang, T.: Foundations of Linguistics and Vietnamese. Education Publisher (1997)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 133–142 (1996)
Google Scholar
Darroch, J., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Article MATH MathSciNet Google Scholar
Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics of Computation 35(151), 773–782 (1980)
Article MATH MathSciNet Google Scholar
Koeling, R.: Chunking with maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning(CoNLL), pp. 139–141 (2000)
Google Scholar
Zhao, H., Kit, C.: Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning (CoNLL), pp. 203–207 (2008)
Google Scholar
Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies(NAACL-HLT), pp. 406–414 (2009)
Google Scholar
Rush, A., Sontag, D., Collins, M., Tommi, J.: On dual decomposition and linear programming relaxations for natural language processing. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–11 (2010)
Google Scholar
Koo, T., Rush, A., Collins, M., Jaakkola, T., Sontag, D.: Dual decomposition for parsing with non-projective head automata. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1288–1298 (2010)
Google Scholar
Hanamoto, A., Matsuzaki, T., Tsujii, J.: Coordination structure analysis using dual decomposition. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 430–438 (2012)
Google Scholar
Xuan Bach, N., Le Minh, N., Shimazu, A.: UDRST: A novel system for unlabeled discourse parsing in the RST framework. In: Isahara, H., Kanzaki, K. (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 250–261. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan
Ngo Xuan Bach, Kunihiko Hiraishi, Nguyen Le Minh & Akira Shimazu

Authors

Ngo Xuan Bach
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Hiraishi
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Le Minh
View author publications
You can also search for this author in PubMed Google Scholar
Akira Shimazu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ngo Xuan Bach .

Editor information

Editors and Affiliations

Air Operations Division Defence Science and Technology Organisation, Edinburgh, South Australia, Australia
Jeffrey W. Tweedale
University of Canberra, Australia and University of South Australia, Adelaide, South Australia, Australia
Lakhmi C. Jain
Waseda University, Fukuoka, Japan
Junzo Watada
KES International, Shoreham-by-sea, United Kingdom
Robert J. Howlett

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bach, N.X., Hiraishi, K., Le Minh, N., Shimazu, A. (2015). A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition. In: Tweedale, J., Jain, L., Watada, J., Howlett, R. (eds) Knowledge-Based Information Systems in Practice. Smart Innovation, Systems and Technologies, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-319-13545-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-13545-8_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13544-1
Online ISBN: 978-3-319-13545-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics