Skip to main content

A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition

  • Chapter
  • 872 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 30))

Abstract

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. Several methods have been proposed to deal with the POS tagging task in Vietnamese. They can be divided into two types of models: word-based models and syllable-based models. While a word-based model assigns a POS tag to each word, a syllable-based model assigns a POS tag to each syllable. This chapter presents a new model for Vietnamese POS tagging using dual decomposition. The chapter shows how dual decomposition can be exploited to integrate a word-based model and a syllable-based model to yield a more powerfulmodel for tagging Vietnamese sentences. Then the chapter describes experiments on the Viet Treebank corpus, a large annotated corpus for Vietnamese POS tagging. This chapter also presents an error analysis to investigate which types of words in Vietnamese are more difficult to tag than other words. Experimental results show that the word-based model and the syllable-based model are complementary. Moreover, the proposed model using dual decomposition outperforms both the word-based and the syllable-based models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Le, H., Roussanaly, A., Nguyen, T., Rossignol, M.: An empirical study of maximum entropy approach for part-of-speech tagging of vietnamese texts. In: Proceedings of TALN 2010 (2010)

    Google Scholar 

  2. Tran, T., Le, A., Ha, Q., Le, H.: An experimental study on vietnamese pos tagging. In: Proceedings of the International Conference on Asian Language Processing (IALP), pp. 23–27 (2009)

    Google Scholar 

  3. Tran, T., Le, A., Ha, Q.: Improving vietnamese word segmentation and pos tagging using mem with various kinds of resources. Journal of Natural Language Processing 17(3), 41–60 (2010)

    Article  Google Scholar 

  4. Nghiem, M., Dinh, D., Nguyen, M.: Improving vietnamese pos tagging by integrating a rich feature set and support vector machines. In: Proceedings of the IEEE International Conference on Research, Innovation and Vision for the Future in Computing & Communication Technologies (RIVF), pp. 128–133 (2008)

    Google Scholar 

  5. Nguyen, L., Xuan, B., Viet, C., Nhat, M., Shimazu, A.: A semi-supervised learning method for vietnamese part-of-speech tagging. In: Proceedings of the 2nd International Conference on Knowledge and Systems Engineering (KSE), pp. 141–146 (2010)

    Google Scholar 

  6. Rush, A., Collins, M.: A tutorial on dual decomposition and lagrangian relaxation for inference in natural language processing. Journal of Artificial Intelligence Research 45, 305–362 (2012)

    MATH  MathSciNet  Google Scholar 

  7. Sontag, D., Globerson, A., Jaakkola, T.: Introduction to dual decomposition for inference. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press (2011)

    Google Scholar 

  8. Berger, A., Pietra, V., Pietra, S.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)

    Google Scholar 

  9. Tsuruoka, Y.: A simple c++ library for maximum entropy classification (2006), http://www-tsujii.is.s.u-tokyo.ac.jp/tsuruoka/maxent/

  10. Nguyen, P., Vu, X., Nguyen, T., Nguyen, V., Le, H.: Building a large syntactically-annotated corpus of vietnamese. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP, pp. 182–185 (2009)

    Google Scholar 

  11. Mai, N., Vu, D., Hoang, T.: Foundations of Linguistics and Vietnamese. Education Publisher (1997)

    Google Scholar 

  12. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 133–142 (1996)

    Google Scholar 

  13. Darroch, J., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  14. Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics of Computation 35(151), 773–782 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  15. Koeling, R.: Chunking with maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning(CoNLL), pp. 139–141 (2000)

    Google Scholar 

  16. Zhao, H., Kit, C.: Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In: Proceedings of Conference on Computational Natural Language Learning (CoNLL), pp. 203–207 (2008)

    Google Scholar 

  17. Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies(NAACL-HLT), pp. 406–414 (2009)

    Google Scholar 

  18. Rush, A., Sontag, D., Collins, M., Tommi, J.: On dual decomposition and linear programming relaxations for natural language processing. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–11 (2010)

    Google Scholar 

  19. Koo, T., Rush, A., Collins, M., Jaakkola, T., Sontag, D.: Dual decomposition for parsing with non-projective head automata. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1288–1298 (2010)

    Google Scholar 

  20. Hanamoto, A., Matsuzaki, T., Tsujii, J.: Coordination structure analysis using dual decomposition. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 430–438 (2012)

    Google Scholar 

  21. Xuan Bach, N., Le Minh, N., Shimazu, A.: UDRST: A novel system for unlabeled discourse parsing in the RST framework. In: Isahara, H., Kanzaki, K. (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 250–261. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ngo Xuan Bach .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bach, N.X., Hiraishi, K., Le Minh, N., Shimazu, A. (2015). A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition. In: Tweedale, J., Jain, L., Watada, J., Howlett, R. (eds) Knowledge-Based Information Systems in Practice. Smart Innovation, Systems and Technologies, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-319-13545-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13545-8_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13544-1

  • Online ISBN: 978-3-319-13545-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics