Advertisement

Text Summarization with Automatic Keyword Extraction in Telugu e-Newspapers

  • Reddy NaiduEmail author
  • Santosh Kumar Bharti
  • Korra Sathya Babu
  • Ramesh Kumar Mohapatra
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 77)

Abstract

Summarization is the process of shortening a text document to make a summary that keeps the main points of the actual document. Extractive summarizers work on the given text to extract sentences that best express the message hidden in the text. Most extractive summarization techniques revolve around the concept of finding keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting relevant words having a higher frequency than others, with stress on important one’s. Manual extraction or annotation of keywords is a tedious process brimming with errors involving lots of manual effort and time. In this work, we proposed an algorithm that automatically extracts keyword for text summarization in Telugu e-newspaper datasets. The proposed method compares with the experimental result of articles having the similar title in five different Telugu e-newspapers to check the similarity and consistency in summarized results.

Keywords

Automatic keyword extraction e-newspapers NLP Summarization Telugu 

References

  1. 1.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. ACL (2008)Google Scholar
  2. 2.
    Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization, vol. 293. MIT Press, Cambridge (1999)Google Scholar
  3. 3.
    Thomas, J.R., Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization in e-newspapers. In: Proceedings of the International Conference on Informatics and Analytics, pp. 86–93. ACM (2016)Google Scholar
  4. 4.
  5. 5.
    Chien, L.F.: Pat-tree-based keyword extraction for chinese information retrieval. In: ACM SIGIR Forum, vol. 31, pp. 50–58. ACM (1997)Google Scholar
  6. 6.
    Giarlo, M.J.: A comparative analysis of keyword extraction techniques (2005)Google Scholar
  7. 7.
    Humphreys, J.K.: An HTML keyphrase extractor. Department of Computer Science, University of California, Riverside, CA, USA, Technical Report (2002)Google Scholar
  8. 8.
    Reddy, S., Sharo, S.: Cross Language POS taggers (and other tools) for Indian languages an experiment with Kannada using Telugu resources. In: Proceedings of IJCNLP Workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies. Chiang Mai, Thailand (2011)Google Scholar
  9. 9.
    Bharati, A., Sangal, R., Sharma, D.M., Bai, L.: Anncorra: annotating corpora guidelines for pos and chunk annotation for indian languages. Technical Report. Technical Report (TRLTRC-31), LTRC, IIIT-Hyderabad (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Reddy Naidu
    • 1
    Email author
  • Santosh Kumar Bharti
    • 1
  • Korra Sathya Babu
    • 1
  • Ramesh Kumar Mohapatra
    • 1
  1. 1.National Institute of TechnologyRourkelaIndia

Personalised recommendations