Advertisement

UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics

  • Lei Li
  • Yazhao ZhangEmail author
  • Junqi Chi
  • Zuying Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)

Abstract

In this paper, we put forward UIDS, a new high-performing extensible framework for extractive MultiLingual Document Summarization. Our approach looks on a document in a multilingual corpus as an item sequence set, in which each sentence is an item sequence and each item is the minimal semantic unit. Then we formalize the extractive summary as summary diversity sampling problem that considers topic diversity and redundancy at the same time. The topic diversity is reflected using hierarchical topic models, the redundancy is reflected using similarity and the summary diversity is enhanced using Determinantal Point Processes. We then illustrate how this method encompasses a framework that is amenable to compute summaries for MultiLingual Single- and Multi-documents. Experiments on the MultiLing summarization task datasets demonstrate the effectiveness of our approach.

Keywords

Multilingual document summarization Summary diversity Determinantal point processes 

Notes

Acknowledgements

This work was supported by the National Social Science Foundation of China under Grant 16ZDA055; National Natural Science Foundation of China under Grant 91546121, 71231002 and 61202247; EU FP7 IRSES MobileCloud Project 612212; the 111 Project of China under Grant B08004; Engineering Research Center of Information Networks, Ministry of Education; the project of Beijing Institute of Science and Technology Information; the project of CapInfo Company Limited.

References

  1. 1.
    Alex, K., Ben, T.: Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012)
  2. 2.
    Balikas, G., Amini, M.R.: The participation of UJF-grenoble team at multiling 2015 (2015)Google Scholar
  3. 3.
    Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM (JACM) 57(2), 7 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Borodin, A.: Determinantal point processes (2009)Google Scholar
  6. 6.
    Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 815–824. Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Conroy, J.M., Davis, S.T., Kubina, J.: Preprocessing and term weights in multilingual summarization (2015)Google Scholar
  8. 8.
    Davis, S.T., Conroy, J.M., Schlesinger, J.D.: OCCAMS-an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 454–463. IEEE (2012)Google Scholar
  9. 9.
    Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)CrossRefGoogle Scholar
  10. 10.
    Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P.A.: Multiling 2017 overview (2017)Google Scholar
  11. 11.
    Giannakopoulos, G., Kubina, J., Conroy, J.M., Steinberger, J., Favre, B., Kabadjov, M.A., Kruschwitz, U., Poesio, M.: Multiling 2015: multilingual summarization of single and multi-documents, on-line Fora, and call-center conversations. In: SIGDIAL Conference, pp. 270–274 (2015)Google Scholar
  12. 12.
    Giannakopoulos, G., Lloret, E., Conroy, M.J., Steinberger, J., Litvak, M., Rankel, P., Favre, B.: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres. Association for Computational Linguistics (2017). http://aclweb.org/anthology/W17-1000
  13. 13.
    Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization based on multiple feature combination (2016)Google Scholar
  14. 14.
    Hung, H.T., Shih, K.W., Chen, B.: The NTNU summarization system at MultiLing 2015 (2015)Google Scholar
  15. 15.
    Kam-Fai, W., Mingli, W., Wenjie, L.: Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 985–992. Association for Computational Linguistics (2008)Google Scholar
  16. 16.
    Matérn, B.: Stochastic previous models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforskningsinstitut 49, 5 (1960)Google Scholar
  17. 17.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proc. EMNLP 2004, 404–411 (2004)Google Scholar
  18. 18.
    Ren, Z., de Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 93–102. ACM (2015)Google Scholar
  19. 19.
    Technology, B.: Rosette base linguistics (2016). https://www.rosette.com/function/tokenization/
  20. 20.
    Thomas, S., Beutenmüller, C., de la Puente, X., Remus, R., Bordag, S.: EXB text summarizer. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 260 (2015)Google Scholar
  21. 21.
    Vicente, M., Alcón, O., Lloret, E.: The university of alicante at multiling 2015: approach, results and further insights. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 250 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Center for Intelligence Science and Technology, School of ComputerBeijing University of Posts and TelecommunicationsBeijingPeople’s Republic of China

Personalised recommendations