Advertisement

Knowledge and Information Systems

, Volume 58, Issue 2, pp 481–499 | Cite as

Cross-language document summarization via extraction and ranking of multiple summaries

  • Xiaojun WanEmail author
  • Fuli Luo
  • Xue Sun
  • Songfang Huang
  • Jin-ge Yao
Short Paper

Abstract

The task of cross-language document summarization aims to produce a summary in a target language (e.g., Chinese) for a given document set in a different source language (e.g., English). Previous studies focus on ranking and selection of translated sentences in the target language. In this paper, we propose a new framework for addressing the task by extraction and ranking of multiple summaries in the target language. First, we extract multiple candidate summaries by proposing several schemes for improving the upper-bound quality of the summaries. Then, we propose a new ensemble ranking method for ranking the candidate summaries by making use of bilingual features. Extensive experiments have been conducted on a benchmark dataset and the results verify the effectiveness of our proposed framework, which outperforms a variety of baselines, including supervised baselines.

Keywords

Document summarization Natural language generation Natural language processing Text mining 

Notes

Acknowledgements

This work was supported by National Natural Science Foundation of China (61331011, 61772036), IBM Global Faculty Award Program, and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

References

  1. 1.
    Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning. pp 89–96Google Scholar
  2. 2.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  3. 3.
    Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning. pp 129–136Google Scholar
  4. 4.
    Cao Z, Wei F, Dong L, Li S, Zhou M (2015) Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of AAAI. pp 2153–2159Google Scholar
  5. 5.
    Erkan G, Radev D (2004) LexPageRank: Prestige in multi-document text summarization. In: Proceedings of EMNLP. pp 365–371Google Scholar
  6. 6.
    Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010). pp 322–330Google Scholar
  7. 7.
    Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetzbMATHGoogle Scholar
  8. 8.
    Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI summarization system at TAC 2008. In: Proceedings of the text understanding conferenceGoogle Scholar
  10. 10.
    Hong K, Marcus M, Nenkova A (2015) System combination for multi-document summarization. In: Proceedings of EMNLP. pp 107–117Google Scholar
  11. 11.
    Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. pp 133–142Google Scholar
  12. 12.
    Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the ACL-04 workshop on text summarization branches outGoogle Scholar
  13. 13.
    Li J, Li L, Li T (2012) Multi-document summarization via submodularity. Appl Intell 37(3):420–430CrossRefGoogle Scholar
  14. 14.
    Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 912–920Google Scholar
  15. 15.
    Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1, Association for computational linguistics. pp 510–520Google Scholar
  16. 16.
    Orasan C, Chiorean OA (2008) Evaluation of a cross-lingual romanian-english multi-document summariser. In: Proceedings of LRECGoogle Scholar
  17. 17.
    Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237CrossRefGoogle Scholar
  18. 18.
    Ouyang Y, Li S, Li W (2007) Developing learning strategies for topic-based summarization. In: Proceedings of the Sixteenth ACM conference on information and knowledge management, ACM. pp 79–86Google Scholar
  19. 19.
    Pingali P, Jagarlamudi J, Varma V (2007) Experiments in cross language query focused multi-document summarization. In: Workshop on cross lingual information access addressing the information need of multilingual societies in IJCAI2007Google Scholar
  20. 20.
    Radev D, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938CrossRefzbMATHGoogle Scholar
  21. 21.
    Shen D, Sun JT, Li H, Yang Q, Chen Z (2007) Document summarization using conditional random fields. In: Proceedings of IJCAI. pp 2862–2867Google Scholar
  22. 22.
    Wan X (2011) Using bilingual information for cross-language document summarization. In: Proceedings of ACL. pp 1546–1555Google Scholar
  23. 23.
    Wan X, Li H, Xiao J (2010) Cross-language document summarization based on machine translation quality prediction. In: Proceedings of ACL. pp 917–926Google Scholar
  24. 24.
    Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of IJCAI. pp 2903–2908Google Scholar
  25. 25.
    Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. pp 299–306Google Scholar
  26. 26.
    Wan X, Cao Z, Wei F, Li S, Zhou M (2015) Multi-document summarization via discriminative summary reranking. arXiv:1507.02062
  27. 27.
    Yao JG, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization. In: Proceedings of EMNLP. pp 118–127Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.Key Laboratory of Computational Linguistics (Peking University)MOEBeijingChina
  3. 3.IBM China Research LaboratoryBeijingChina

Personalised recommendations