Skip to main content

Investigating Unit Weighting and Unit Selection Factors in Thai Multi-document Summarization

  • Conference paper
  • First Online:
Knowledge, Information and Creativity Support Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 416))

  • 983 Accesses

Abstract

Breaking down documents into small units, unit weighting and unit selection are two important factors in summarization of multiple related documents. This paper presents an investigation on performance of several variants of unit weighting and selection schemes on Thai multi-document summarization. Fifty sets of Thai news articles with their reference summaries are used to evaluate the performance of various weighting and selection methods. Compared to PageRank and Maximal Marginal Relevance (MMR) with four ROUGE measures, the results show that iterative weighting gets higher performance of traditional TF-IDF, the iterative node weighting, query relevance, centroid-based selection, and unit redundancy consideration can help improving summary quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barzilay, R., McKeown, K.R., Elhadad, M.: Information fusion in the context of multi-document summarization. In: Proceedings of the 37th Annual Meeting of the ACL, pp. 550–557 (1999)

    Google Scholar 

  2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. Res. Dev. Inf. Retr. 335–336 (1998)

    Google Scholar 

  3. Jaruskulchai, C., Kruengkrai, C.: A practical text summarizer by paragraph extraction for Thai. In: Proceedings of the sixth international workshop on Information retrieval with Asian languages (AsianIR ’03), vol. 11, pp. 9–16 (2003)

    Google Scholar 

  4. Ketui, N., Theeramunkong, T.: Inclusion-based and exclusion-based approaches in graph-based multiple news summarization. In: Knowledge, Information and Cre-ativity Support Systems, LNCS 6746, pp. 91–102 (2010)

    Google Scholar 

  5. Ketui, N., Theeramunkong, T., Onsuwan, C.: Thai elementary discourse unit analysis and syntactic-based segmentation. Inf. Int. Interdisc. J. (INFORMATION-TOKYO) 16(10B), 7423–7436 (2013)

    Google Scholar 

  6. Kittiphattanabawon, N., Theeramunkong, T., Nantajeewarawat, E.: News relation discovery based on association rule mining with combining factors. IEICE Trans. 94(D(3)), 404–415 (2011)

    Google Scholar 

  7. Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of ACL Workshop on Text Summarization, pp. 74–81 (2004)

    Google Scholar 

  8. Maier, D.: The complexity of some problems on subsequences and supersequences. J. Assoc. Comput. Machin. 25(2), 322–336 (1978)

    Article  MathSciNet  Google Scholar 

  9. Mani, I.: Multi-document summarization by graph search and matching. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-97), pp. 622–628 (1997)

    Google Scholar 

  10. McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. AAAI/IAAI 453–460 (1999)

    Google Scholar 

  11. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL: on Interactive poster and demonstration sessions, ACLDEMO ’04. Association for Computational Linguistics, Stroudsburg, PA, USA (2004)

    Google Scholar 

  12. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University (1998)

    Google Scholar 

  13. Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of NAACL-ANLP 2000 Workshop on Automatic Summarization, pp. 21–30 (2000)

    Google Scholar 

  14. Theeramunkong, T., Boriboon, M., Haruechaiyasak, C., Kittiphattanabawon, N., Kosawat, K., Onsuwan, C., Siriwat, I., Suwanapong, T., Tongtep, N.: Thai-nest: a framework for Thai named entity tagging specification and tools. In: Proceedings of the 2nd International Conference on Corpus Linguistics (CILC ’10), pp. 895–908. University of A Coruna, Spain (2010)

    Google Scholar 

  15. Tongtep, N., Theeramunkong, T.: Multi-stage automatic NE and POS annotation using pattern-based and statistical-based techniques for Thai corpus construction. IEICE Trans. Inf. Syst. E96–D(10), 2245–2256 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Research University Project of Thailand Office of Higher Education Commission, Thammasat Center of Excellence in Intelligent Informatics, Speech and Language Technology and Service Innovation, and Rajamangala University of Technology Lanna Nan. We would like to thank to all members at KINDML laboratory at Sirindhorn International Institute of Technology for fruitful discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nongnuch Ketui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ketui, N., Theeramunkong, T. (2016). Investigating Unit Weighting and Unit Selection Factors in Thai Multi-document Summarization. In: Kunifuji, S., Papadopoulos, G., Skulimowski, A., Kacprzyk  , J. (eds) Knowledge, Information and Creativity Support Systems. Advances in Intelligent Systems and Computing, vol 416. Springer, Cham. https://doi.org/10.1007/978-3-319-27478-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27478-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27477-5

  • Online ISBN: 978-3-319-27478-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics