Skip to main content

Part-of-Speech Induction for Vietnamese

  • Conference paper
Knowledge and Systems Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 245))

  • 1025 Accesses

Abstract

This paper presents a method for automatically inducing the parts-ofspeech of the Vietnamese language from a large text corpus. We first build a classbased bigram language model using several statistical algorithms assigning words to classes based on their ability to combine with neighbouring words.We then show that this model is able to extract word classes that have the flavor of either syntactically based or semantically based groupings of Vietnamese words, which are the long disputed approaches among the Vietnamese linguistic community. Finally, the quality of word clusters is quantitatively evaluated when word cluster features are used to improve the accuracy of a statistical part-of-speech tagger for Vietnamese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schütze, H.: Part-of-speech induction from scratch. In: Proceedings of ACL, pp. 251–258 (1993)

    Google Scholar 

  2. Con, N.H.: On the determination of Vietnamese word classes. Journal of Language, Vietnamese Institute of Linguistics, 36–46 (2003) (in Vietnamese)

    Google Scholar 

  3. Vietnam Social Science Committee (ed.): Vietnamese Grammar. Social Sciences Publisher, Hanoi (1983) (in Vietnamese)

    Google Scholar 

  4. Diep, Q.B., Hoang, V.T.: Vietnamese Grammar. Vietnam Education Publisher, Hanoi (1999) (in Vietnamese)

    Google Scholar 

  5. Doan, T.T., Nguyen, K.H., Pham, N.Q.: A Concise Vietnamese Grammar (For Non-native Speakers). World Publishers, Ha Noi (2003) (in Vietnamese)

    Google Scholar 

  6. Bao, H.T.: Building basic resources and tools for Vietnamese language and speech processing (VLSP). Technical report, The KC/01/06-10 project (2010)

    Google Scholar 

  7. Christodoulopoulos, C., Goldwater, S., Steedman, M.: Two decades of unsupervised POS induction: How far have we come? In: Proceedings of ACL (2010)

    Google Scholar 

  8. Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Computational Linguistics 18, 467–479 (1992)

    Google Scholar 

  9. Liang, P.: Semi-supervised learning for natural language. Master’s thesis. MIT (2005)

    Google Scholar 

  10. Nguyen, P.T., Xuan, L.V., Nguyen, T.M.H., Nguyen, V.H., Le-Hong, P.: Building a large syntactically-annotated corpus of Vietnamese. In: Proceedings of the 3rd Linguistic Annotation Workshop, ACL-IJCNLP, Singapore (2009)

    Google Scholar 

  11. Le-Hong, P., Nguyen, T.M.H., Roussanaly, A., Ho, T.V.: A hybrid approach to word segmentation of Vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information and segmentation. In: Proceedings of ICML (2000)

    Google Scholar 

  13. Le-Hong, P., Roussanaly, A., Nguyen, T.M.H., Rossignol, M.: An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In: Proceedings of Traitement Automatique des Langues Naturelles (TALN 2010), Montreal, Canada (2010)

    Google Scholar 

  14. Minh, N.L., Bach, N.X., Cuong, N.V., Minh, P.Q.N., Shimazu, A.: A semi-supervised learning method for Vietnamese part-of-speech tagging. In: KSE, pp. 141–146 (2010)

    Google Scholar 

  15. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of EMNLP-CoNLL, pp. 410–420 (2007)

    Google Scholar 

  16. Clark, A.: Combining distributional and morphological information for part-of-speech induction. In: Proceedings of EACL (2003)

    Google Scholar 

  17. Leibbrandt, R.E., Powers, D.M.W.: Robust induction of parts-of-speech in child-directed language by co-clustering of words and contexts. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, Avignon, France, pp. 44–54 (2012)

    Google Scholar 

  18. Chrupała, G.: Hierarchical clustering of word class distributions. In: Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, Montréal, Canada, pp. 100–104 (2012)

    Google Scholar 

  19. Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: Proceedings of ACL, Uppsala, Sweden, pp. 384–394 (2010)

    Google Scholar 

  20. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the ACL, pp. 873–882 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuong Le-Hong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Le-Hong, P., Nguyen, T.M.H. (2014). Part-of-Speech Induction for Vietnamese. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 245. Springer, Cham. https://doi.org/10.1007/978-3-319-02821-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02821-7_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02820-0

  • Online ISBN: 978-3-319-02821-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics