An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms

Methodologies and Application

Abstract

Many researches have been converging on automatic text summarization as increasing of text documents due to the expansion of information diffusion constantly. The objective of this proposal is to achieve the most reliable and substantial context or most relevant brief summary of the text in extractive manner. The extractive text summarization produces the short summary of a certain text which contains the most important information of original text by extracting the set of sentences from the original document. This paper proposes an improved extractive text summarization method for documents by enhancing the conventional lexical chain method to produce better relevant information of the text using three distinct features or characteristics of keyword in a text. The keyword of the document is labeled using our previous work, transition probability distribution generator model which can learn the characteristics of the keyword in a document, and generates their probability distribution upon each feature.

Keywords

Automatic text summarization Keyword extraction Lexical chain Markov chain WordNet Semantic-related terms Web contents Machine learning 

Notes

Acknowledgements

This study was supported by research Fund from Chosun University, 2015.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Annapurna P. Patil SD, Syed AAA, Tanay A, Varun B (2014) Automatic text summarizer. In: Proceedings of 2014 international conference on advances in computing, communications and informatics ICACCI, pp 1530–1534Google Scholar
  2. Asad A, Idris N, Rasim MA, Ramiz MR (2015) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput. doi: 10.1007/s00500-015-1881-4 Google Scholar
  3. Barzilay R, Elhadad M (1997) Using lexical chains for text summarization. In: Proceedings of the 35th annual meeting of the association for computational linguistics and the 8th European chapter meeting of the association for computational linguistics, workshop on intelligent scalable text summarization, pp 10–17Google Scholar
  4. Cohen JD (1999) Highlights: language and domain-independent automatic indexing terms for abstracting. J Am Soc Inf Sci 46(3):162–174Google Scholar
  5. Dipanjan D, Martins AFT (2007) A survey on automatic text summarization. Technical Report 8Google Scholar
  6. Halliday M, Hasan R (1976) Cohesion in English. Longman, LondonGoogle Scholar
  7. Harabagiu S, Moldovan D (1998) WordNet: an electronic lexical database. Chapter knowledge processing on an extended wordnet. MIT press, CambridgeGoogle Scholar
  8. Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on emprical methods in natural language processing EMNLP ’03. Association for Computational Linguistics, pp 216–223Google Scholar
  9. Ibrahim OAS, Landa-Silva D (2016) Term frequency with average term occurrences for textual information retrieval. Soft Comput 20:3045CrossRefGoogle Scholar
  10. Karen SJ (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21. doi: 10.1108/eb026526
  11. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation, ACM Press, pp 24–26Google Scholar
  12. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on text summarization branches out WAS2004, pp 74–81Google Scholar
  13. Lynn HM, Choi C, Choi JH, Shin J, Pankoo K (2016) The method of semi-supervised automatic keyword extraction for web documents using transition probability distribution generator. In: Proceedings of the international conference on research in adaptive and convergent systems RACS ’16, pp 1–6. doi: 10.1145/2987386.2987399
  14. Mani I (2001) Automatic summarization. Natural language processing 3. John Benjamins Publishing Company, Amsterdam, Philadelphia. doi: 10.1075/nlp.3 CrossRefGoogle Scholar
  15. Mani I, Maybury M (1999) Advances in automatic text summarization. Comput Linguist 26(2):280–281Google Scholar
  16. Martin D, Karel J (2011) Automatic keyphrase extraction based on NLP and statistical methods. In: Proceedings of the Dateso 2011: annual international workshop on databases, texts, specifications and objects, CEUR workshop proceedings 706:140–145Google Scholar
  17. Michael JG (2005) A comparative analysis of keyword extraction techniques. The State University of New Jersey, RutgersGoogle Scholar
  18. Morris J, Hirst G (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Comput Linguist 17(l):21–48Google Scholar
  19. Rada M, Paul T (2004) TextRank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing EMNLP ’04. Association for Computational Linguistics, pp 404–411Google Scholar
  20. Rose S, Engel D, Cramer N, Cowley W (2010) Automatic keyword extraction from individual documents. In: Berry MW, Kogan J (eds) Text mining: theory and applications. John Wiley, Chichester, UK. doi: 10.1002/9780470689646.ch1
  21. Zhang K, Xu H, Tang J, Li JZ (2006) Keyword extraction using support vector machine. In: Proceedings of the 7th international conference on web-age information management WAIM ’06. pp 85–96. doi: 10.1007/11775300_8

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of Computer EngineeringChosun UniversityGwangjuSouth Korea

Personalised recommendations