Skip to main content

Using Collaborative Training Method to Build Vietnamese Dependency Treebank

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2016, CCL 2016)

Abstract

For the difficulty of marking Vietnamese dependency tree, this paper proposed the method which combined MST algorithm and improved Nivre algorithm to build Vietnamese dependency treebank. The method took full advantage of the characteristics of collaborative training. Firstly, we built a bit samples. Secondly, we used the samples to build two weak learners with two fully redundant views. Then, we marked a large number of unmarked samples mutually. Next, we selected the samples of high trust to relearn and built a dependency parsing system. Finally, we used 5000 Vietnamese sentences marked manually to do tenfold cross-test and obtained the accuracy of 76.33 %. Experimental results showed that the proposed method in this paper could take full advantage of unmarked corpus to effectively improve the quality of dependency treebank.

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61262041, 61363044 and 61472168) and the key project of National Natural Science Foundation of Yunnan province (Grant No. 2013FA030).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Le-Hong, P., Nguyen, T.M.H.: Part-of-speech induction for Vietnamese. In: Huynh, V.N., Denoeux, T., Tran, D.H., Le, A.C., Pham, B.S. (eds.) KSE 2013, Part II. AISC, vol. 245, pp. 273–286. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  2. Le-Hong, P., Nguyen, T.M.H., Rossignol, M., Roussanaly, A.: An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In: Actes du Traitement Automatique des Langues Naturelles (TALN-2010), Montreal, Canada (2010)

    Google Scholar 

  3. Dinh, Q.T., Nguyen, T.M H., Vu, X.L., Rossignol, M., Le-Hong, P., Nguyen, C.T.: Word segmentation of Vietnamese texts: a comparison of approaches. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008)

    Google Scholar 

  4. Lai, T.B.Y., Huang, C.N., Zhou, M., Miao, J.B., Siu, K.C.: Span-based statistical dependency parsing of Chinese. In: Proceedings of NLPRS, pp. 677–684 (2001)

    Google Scholar 

  5. Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pp. 195–206 (2003)

    Google Scholar 

  6. Ma, J.S., Zhang, Y., Liu, T., Li, S.: A statistical dependency parser of Chinese under small training data. In: Workshop: Beyond Shallow Analyses-Formalisms and Statistical Modeling for Deep Analyses, IJCNLP-2004, San Ya, pp. 113–118 (2004)

    Google Scholar 

  7. Thi, L.N., Vietnam, H.N., Minh, H.N.T., Le Hong, P.: Building a treebank for Vietnamese dependency parsing. In: IEEE RIVF International Conference on Computing and Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 10–13 November 2013

    Google Scholar 

  8. McDonald, R.: Non-projective dependency parsing using spanning tree algorithms, pp. 523–530. Association for Computational Linguistics (2005)

    Google Scholar 

  9. Eisner, J.: Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the COLING (1996)

    Google Scholar 

  10. Chu, Y.J., Liu, T.H.: On the shortest arborescence of a directed graph. Sci. Sinica 14, 1396–1400 (1965)

    MathSciNet  MATH  Google Scholar 

  11. Edmonds, J.: Optimum branchings. J. Res. Natl. Bur. Stand. 71B, 233–240 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  12. Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, pp. 359–370 (1999)

    Google Scholar 

  13. Findlater, L., Hamilton, H.J.: Iceberg-cube algorithms: an empirical evaluation on synthetic and real data. Intell. Data Anal. 7(2), 77–97 (2003)

    MATH  Google Scholar 

  14. Nivre, J., Scholz, M.: Deterministic dependency parsing of English text. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING), pp. 64–70 (2004)

    Google Scholar 

  15. Nivre, J., McDonald, R.: Integrating graphbased and transition-based dependency parsers. In: Proceedings of ACL, pp. 950–958 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianyi Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Qiu, G., Guo, J., Yu, Z., Xian, Y., Mao, C. (2016). Using Collaborative Training Method to Build Vietnamese Dependency Treebank. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics