Skip to main content

Phrase-Based Compressive Summarization for English-Vietnamese

  • Conference paper
  • First Online:
Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9978))

Abstract

Cross-language summarization is the novel topic which is extremely practical and necessary for capturing, tracing, and retrieving the huge data. Especially, for many low-resource languages as Vietnamese, Chinese, ..., there are not any previous works to solve this problem as well as datasets. Therefore we propose to apply Phrase-based Compressive Summarization for English-Vietnamese. This model takes advantages of the relation between translation and summarization phases to overcome the popular drawback in most antecedent researches. Besides, the bilingual corpus for English-Vietnamese summarization built manually on the dataset is extremely helpful for a lot of later works. In this dataset, our system achieves approximately 37 % in ROUGE-1 score which is equivalent to systems on other language pairs. This significant and encouraging result proves the effectiveness of our approach and the quality of our manual datasets in English-Vietnamese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/lupanh/VietnameseMDS.

  2. 2.

    http://textblob.readthedocs.io/en/dev/.

  3. 3.

    https://github.com/despawnerer/summarize.

  4. 4.

    https://github.com/dookgulliver/BingTranslator.

References

  1. Boudin, F., Huet, S., Torres-Moreno, J.: A graph-based approach to cross-language multi-document summarization. Polibits 43, 113–118 (2011)

    Article  Google Scholar 

  2. Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 340–348. Association for Computational Linguistics (2010)

    Google Scholar 

  3. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 48–54. Association for Computational Linguistics, Stroudsburg, PA, USA (2003)

    Google Scholar 

  4. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Marie-Francine Moens, S.S. (ed.) Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (2004)

    Google Scholar 

  5. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 912–920 (2010)

    Google Scholar 

  6. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 510–520. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)

    Google Scholar 

  7. Litvak, M., Last, M., Friedman, M.: A new approach to improving multilingual summarization using a genetic algorithm. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 927–936. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)

    Google Scholar 

  8. Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retr. 5(2–3), 103–233 (2011)

    Article  Google Scholar 

  9. Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Yao, J.g., Wan, X., Xiao, J.: Phrase-based compressive cross-language summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 118–127. Association for Computational Linguistics, Lisbon, Portugal, September 2015

    Google Scholar 

  11. Zhong, S., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tung Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Le, T., Nguyen, LM., Shimazu, A., Dien, D. (2016). Phrase-Based Compressive Summarization for English-Vietnamese. In: Huynh, VN., Inuiguchi, M., Le, B., Le, B., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2016. Lecture Notes in Computer Science(), vol 9978. Springer, Cham. https://doi.org/10.1007/978-3-319-49046-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49046-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49045-8

  • Online ISBN: 978-3-319-49046-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics