Automatic text summarization is important in this era due to the exponential growth of documents available on the Internet. In the Vietnamese language, VietnameseMDS is the only publicly available dataset for this task. Although the dataset has 199 clusters, there are only three documents in each cluster, which is small compared to typical datasets in English. This motivates us to construct ViMs—a big and high-quality Vietnamese dataset for abstractive multi-document summarization. To that end, we recruited 29 annotators and enhanced MDSWriter—an open-source annotation tool, to support the annotators in creating gold standard summaries. As a result, ViMs has 600 summaries corresponding to 300 clusters of 1,945 documents. We have verified the reliability of our dataset by using a variety of metrics including conventional Cohen’s \(\kappa \), relaxed Cohen’s \(\kappa \)—a new metric that we propose to make it more suitable for abstractive summarization, and ROUGE scores. A relaxed \(\kappa \) score of 0.55 indicate that ViMs could attain moderate agreement between annotators. Meanwhile, ROUGE scores are 0.729 of ROUGE-1, 0.507 of ROUGE-2 and 0.524 of ROUGE-SU4. We have further evaluated ViMs by using three different summarization systems: TextRank, CFVi and MUSEEC. Their performances are 0.628, 0.711 and 0.732 of ROUGE-1, respectively. These results show that the ViMs dataset is suitable for both training and evaluating multi-document summarization systems. We have made the dataset and evaluation results of this work publicly available for research community. It is noted that unlike previous work that only published the final summarization dataset, we also publish intermediate annotation results, which can be used in other NLP problems such as sentence classification.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
This rate is nearly the same when we used words as the calculation unit.
Obviously, if two sentences have hard agreement, they also have soft-and agreement and soft-or agreement; if two sentences have soft-and agreement, they also have soft-or agreement. In the equations, we do not show these above cases for clearer representation.
We used the default parameters and only modified the summary length parameter to 250 syllables.
Benikova, D., Mieskes, M., Meyer, C.M., & Gurevych, I. (2016). Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1039–1050).
Dang, H.T. (2005). Overview of DUC 2005. In Proceedings of the document understanding conference (vol. 2005, pp. 1–12).
Dang, H.T., & Owczarzak, K. (2008). Overview of the TAC 2008 opinion question answering and summarization tasks. In Proceedings of the First Text Analysis Conference (vol. 2).
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.
Giannakopoulos, G. (2013) Multi-document multilingual summarization and evaluation tracks in acl 2013 multiling workshop. In Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (pp. 20–28).
Giannakopoulos, G., Karkaletsis, V., Vouros, G., & Stamatopoulos, P. (2008). Summarization system evaluation revisited: N-gram graphs. ACM Transactions on Speech and Language Processing (TSLP), 5(3), 5.
Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., & Varma, V. (2011). TAC 2011 MultiLing pilot overview.
Giannakopoulos, G., Kubina, J., Conroy, J., Steinberger, J., Favre, B., Kabadjov, M., Kruschwitz, U., & Poesio, M. (2015). MultiLing 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 270–274).
Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P.A., Lloret, E., Steinberger, J., Litvak, M., & Favre, B. (2017). MultiLing 2017 overview. In Proceedings of the MultiLing 2017 workshop on summarization and summary evaluation across source types and genres (pp. 1–6).
Goldstein J, Mittal V, Carbonell J, Callan J (2000) Creating and evaluating multi-document sentence extract summaries. In Proceedings of the ninth international conference on information and knowledge management (CIKM) McLean, VA, USA, pp 165–172.
Hong Phuong, L., Thi Minh Huyen, N., Roussanaly, A., & Vinh, H.T. (2008). A hybrid approach to word segmentation of Vietnamese texts. In International Conference on Language and Automata Theory and Applications (pp. 240–249). Springer.
Jaidka, K., Chandrasekaran, M.K., Elizalde, B.F., Jha, R., Jones, C., Kan, M.Y., Khanna, A., Molla-Aliod, D., Radev, D.R., Ronzano, F., et al. (2014). The computational linguistics summarization pilot task. In Proceedings of Text Ananlysis Conference, Gaithersburg, USA.
Ji, H., Grishman, R., Dang, H.T., Griffitt, K., & Ellis, J. (2010). Overview of the TAC 2010 knowledge base population track. In Third Text Analysis Conference (vol. 3, pp. 3–3).
Jing, H., Barzilay, R., McKeown, K., & Elhadad, M. (1998). Summarization evaluation methods: Experiments and analysis. In AAAI symposium on intelligent summarization, Palo Alto, CA (pp. 51–59).
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Le, T., Nguyen, L.M., Shimazu, A., & Dien, D. (2016) Phrase-based compressive summarization for English-Vietnamese. In International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (pp. 331–342). Springer.
Lin, C.Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop, Barcelona, Spain (vol. 8).
Litvak, M., Vanetik, N., Last, M., & Churkin, E. (2016). Museec: a multilingual text summarization tool. In Proceedings of ACL-2016 System Demonstrations (pp. 73–78).
Liu, F., & Liu, Y. (2008). Correlation between rouge and human evaluation of extractive meeting summaries. In Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: Short papers, Association for Computational Linguistics (pp. 201–204).
Lloret, E., & Palomar, M. (2012). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1–41.
Loupy C, Guegan M, Ayache C, Seng S, Torres Moreno J-M (2010) A French human reference corpus for multi-document summarization and sentence compression. In Proceedings of the seventh international conference on language resources and evaluation (LREC), Valletta, Malta, pp 3113–3118
Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., & Sundheim, B. (2002). SUMMAC: a text summarization evaluation. Natural Language Engineering, 8(1), 43–68.
Meyer, C.M., Benikova, D., Mieskes, M., & Gurevych, I. (2016). MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora. In Proceedings of ACL-2016 System Demonstrations (pp. 97–102).
Mihalcea, R., & Tarau, P. (2004) Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
Nenkova, A., Siddharthan, A., & McKeown, K. (2005) Automatically learning cognitive status for multi-document summarization of newswire. In Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics (pp. 241–248).
Nenkova, A., Passonneau, R., & McKeown, K. (2007). The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing (TSLP), 4(2), 4.
Nguyen, H.N., Van Le, T., Le, H.S., & Pham, T.V. (2014). Domain specific sentiment dictionary for opinion mining of Vietnamese text. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence (pp. 136–148). Springer.
Nguyen, M.T., Lai, D.V., Do, P.K., Tran, D.V., & Nguyen, M.L. (2016). VSoLSCSum: Building a Vietnamese sentence-comment dataset for social context summarization. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12) (pp. 38–48).
Nguyen, T.C., Le, H.M., & Phan, T.T. (2009). Building knowledge base for Vietnamese information retrieval. In Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services (pp. 482–486). ACM.
Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.
Radev, D.R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., & Drabek, E. (2003). Evaluation challenges in large-scale document summarization. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics (pp. 375–382).
Seki, Y., Eguchi, K., Kando, N., Aono, M. (2005). Multi-document summarization with subjectivity analysis at DUC 2005. In Proceedings of the Document Understanding Conference (DUC).
Ung, V.G., Luong, A.V., Tran, N.T., & Nghiem, M.Q. (2015). Combination of features for Vietnamese news multi-document summarization. In Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on, IEEE (pp. 186–191).
Verberne, S., Krahmer, E., Hendrickx, I., Wubben, S., & van Den Bosch, A. (2017). Creating a reference data set for the summarization of discussion forum threads. Language Resources and Evaluation, 52, 1–23.
William, M., & Thompson, S. (1988). Rhetorical structure theory: towards a functional theory of text organization. Text, 8(3), 243–281.
Ho Chi Minh City Department of Science and Technology 15/2016/HĐ-SKHCN.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tran, N., Nghiem, M., Nguyen, N.T.H. et al. ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization. Lang Resources & Evaluation (2020). https://doi.org/10.1007/s10579-020-09495-4
- Abstractive summarization
- Multi-document summarization
- Vietnamese dataset
- Automatic summarization