ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization


Automatic text summarization is important in this era due to the exponential growth of documents available on the Internet. In the Vietnamese language, VietnameseMDS is the only publicly available dataset for this task. Although the dataset has 199 clusters, there are only three documents in each cluster, which is small compared to typical datasets in English. This motivates us to construct ViMs—a big and high-quality Vietnamese dataset for abstractive multi-document summarization. To that end, we recruited 29 annotators and enhanced MDSWriter—an open-source annotation tool, to support the annotators in creating gold standard summaries. As a result, ViMs has 600 summaries corresponding to 300 clusters of 1,945 documents. We have verified the reliability of our dataset by using a variety of metrics including conventional Cohen’s \(\kappa \), relaxed Cohen’s \(\kappa \)—a new metric that we propose to make it more suitable for abstractive summarization, and ROUGE scores. A relaxed \(\kappa \) score of 0.55 indicate that ViMs could attain moderate agreement between annotators. Meanwhile, ROUGE scores are 0.729 of ROUGE-1, 0.507 of ROUGE-2 and 0.524 of ROUGE-SU4. We have further evaluated ViMs by using three different summarization systems: TextRank, CFVi and MUSEEC. Their performances are 0.628, 0.711 and 0.732 of ROUGE-1, respectively. These results show that the ViMs dataset is suitable for both training and evaluating multi-document summarization systems. We have made the dataset and evaluation results of this work publicly available for research community. It is noted that unlike previous work that only published the final summarization dataset, we also publish intermediate annotation results, which can be used in other NLP problems such as sentence classification.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

  2. 2.

  3. 3.

    This rate is nearly the same when we used words as the calculation unit.

  4. 4.

    Obviously, if two sentences have hard agreement, they also have soft-and agreement and soft-or agreement; if two sentences have soft-and agreement, they also have soft-or agreement. In the equations, we do not show these above cases for clearer representation.

  5. 5.

  6. 6.

    We used the default parameters and only modified the summary length parameter to 250 syllables.


  1. Benikova, D., Mieskes, M., Meyer, C.M., & Gurevych, I. (2016). Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1039–1050).

  2. Dang, H.T. (2005). Overview of DUC 2005. In Proceedings of the document understanding conference (vol. 2005, pp. 1–12).

  3. Dang, H.T., & Owczarzak, K. (2008). Overview of the TAC 2008 opinion question answering and summarization tasks. In Proceedings of the First Text Analysis Conference (vol. 2).

  4. Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.

    Article  Google Scholar 

  5. Giannakopoulos, G. (2013) Multi-document multilingual summarization and evaluation tracks in acl 2013 multiling workshop. In Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (pp. 20–28).

  6. Giannakopoulos, G., Karkaletsis, V., Vouros, G., & Stamatopoulos, P. (2008). Summarization system evaluation revisited: N-gram graphs. ACM Transactions on Speech and Language Processing (TSLP), 5(3), 5.

    Google Scholar 

  7. Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., & Varma, V. (2011). TAC 2011 MultiLing pilot overview.

  8. Giannakopoulos, G., Kubina, J., Conroy, J., Steinberger, J., Favre, B., Kabadjov, M., Kruschwitz, U., & Poesio, M. (2015). MultiLing 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 270–274).

  9. Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P.A., Lloret, E., Steinberger, J., Litvak, M., & Favre, B. (2017). MultiLing 2017 overview. In Proceedings of the MultiLing 2017 workshop on summarization and summary evaluation across source types and genres (pp. 1–6).

  10. Goldstein J, Mittal V, Carbonell J, Callan J (2000) Creating and evaluating multi-document sentence extract summaries. In Proceedings of the ninth international conference on information and knowledge management (CIKM) McLean, VA, USA, pp 165–172.

  11. Hong Phuong, L., Thi Minh Huyen, N., Roussanaly, A., & Vinh, H.T. (2008). A hybrid approach to word segmentation of Vietnamese texts. In International Conference on Language and Automata Theory and Applications (pp. 240–249). Springer.

  12. Jaidka, K., Chandrasekaran, M.K., Elizalde, B.F., Jha, R., Jones, C., Kan, M.Y., Khanna, A., Molla-Aliod, D., Radev, D.R., Ronzano, F., et al. (2014). The computational linguistics summarization pilot task. In Proceedings of Text Ananlysis Conference, Gaithersburg, USA.

  13. Ji, H., Grishman, R., Dang, H.T., Griffitt, K., & Ellis, J. (2010). Overview of the TAC 2010 knowledge base population track. In Third Text Analysis Conference (vol. 3, pp. 3–3).

  14. Jing, H., Barzilay, R., McKeown, K., & Elhadad, M. (1998). Summarization evaluation methods: Experiments and analysis. In AAAI symposium on intelligent summarization, Palo Alto, CA (pp. 51–59).

  15. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Article  Google Scholar 

  16. Le, T., Nguyen, L.M., Shimazu, A., & Dien, D. (2016) Phrase-based compressive summarization for English-Vietnamese. In International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (pp. 331–342). Springer.

  17. Lin, C.Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop, Barcelona, Spain (vol. 8).

  18. Litvak, M., Vanetik, N., Last, M., & Churkin, E. (2016). Museec: a multilingual text summarization tool. In Proceedings of ACL-2016 System Demonstrations (pp. 73–78).

  19. Liu, F., & Liu, Y. (2008). Correlation between rouge and human evaluation of extractive meeting summaries. In Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: Short papers, Association for Computational Linguistics (pp. 201–204).

  20. Lloret, E., & Palomar, M. (2012). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1–41.

    Article  Google Scholar 

  21. Loupy C, Guegan M, Ayache C, Seng S, Torres Moreno J-M (2010) A French human reference corpus for multi-document summarization and sentence compression. In Proceedings of the seventh international conference on language resources and evaluation (LREC), Valletta, Malta, pp 3113–3118

  22. Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., & Sundheim, B. (2002). SUMMAC: a text summarization evaluation. Natural Language Engineering, 8(1), 43–68.

    Article  Google Scholar 

  23. Meyer, C.M., Benikova, D., Mieskes, M., & Gurevych, I. (2016). MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora. In Proceedings of ACL-2016 System Demonstrations (pp. 97–102).

  24. Mihalcea, R., & Tarau, P. (2004) Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.

  25. Nenkova, A., Siddharthan, A., & McKeown, K. (2005) Automatically learning cognitive status for multi-document summarization of newswire. In Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics (pp. 241–248).

  26. Nenkova, A., Passonneau, R., & McKeown, K. (2007). The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing (TSLP), 4(2), 4.

    Article  Google Scholar 

  27. Nguyen, H.N., Van Le, T., Le, H.S., & Pham, T.V. (2014). Domain specific sentiment dictionary for opinion mining of Vietnamese text. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence (pp. 136–148). Springer.

  28. Nguyen, M.T., Lai, D.V., Do, P.K., Tran, D.V., & Nguyen, M.L. (2016). VSoLSCSum: Building a Vietnamese sentence-comment dataset for social context summarization. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12) (pp. 38–48).

  29. Nguyen, T.C., Le, H.M., & Phan, T.T. (2009). Building knowledge base for Vietnamese information retrieval. In Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services (pp. 482–486). ACM.

  30. Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

    Article  Google Scholar 

  31. Radev, D.R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., & Drabek, E. (2003). Evaluation challenges in large-scale document summarization. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics (pp. 375–382).

  32. Seki, Y., Eguchi, K., Kando, N., Aono, M. (2005). Multi-document summarization with subjectivity analysis at DUC 2005. In Proceedings of the Document Understanding Conference (DUC).

  33. Ung, V.G., Luong, A.V., Tran, N.T., & Nghiem, M.Q. (2015). Combination of features for Vietnamese news multi-document summarization. In Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on, IEEE (pp. 186–191).

  34. Verberne, S., Krahmer, E., Hendrickx, I., Wubben, S., & van Den Bosch, A. (2017). Creating a reference data set for the summarization of discussion forum threads. Language Resources and Evaluation, 52, 1–23.

    Google Scholar 

  35. William, M., & Thompson, S. (1988). Rhetorical structure theory: towards a functional theory of text organization. Text, 8(3), 243–281.

    Google Scholar 

Download references


Ho Chi Minh City Department of Science and Technology 15/2016/HĐ-SKHCN.

Author information



Corresponding author

Correspondence to Nhi-Thao Tran.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tran, N., Nghiem, M., Nguyen, N.T.H. et al. ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization. Lang Resources & Evaluation (2020).

Download citation


  • Abstractive summarization
  • Multi-document summarization
  • Vietnamese dataset
  • Automatic summarization