Abstract
Summarization is the process of shortening a text document to make a summary that keeps the main points of the actual document. Extractive summarizers work on the given text to extract sentences that best express the message hidden in the text. Most extractive summarization techniques revolve around the concept of finding keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting relevant words having a higher frequency than others, with stress on important one’s. Manual extraction or annotation of keywords is a tedious process brimming with errors involving lots of manual effort and time. In this work, we proposed an algorithm that automatically extracts keyword for text summarization in Telugu e-newspaper datasets. The proposed method compares with the experimental result of articles having the similar title in five different Telugu e-newspapers to check the similarity and consistency in summarized results.
References
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. ACL (2008)
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization, vol. 293. MIT Press, Cambridge (1999)
Thomas, J.R., Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization in e-newspapers. In: Proceedings of the International Conference on Informatics and Analytics, pp. 86–93. ACM (2016)
Chien, L.F.: Pat-tree-based keyword extraction for chinese information retrieval. In: ACM SIGIR Forum, vol. 31, pp. 50–58. ACM (1997)
Giarlo, M.J.: A comparative analysis of keyword extraction techniques (2005)
Humphreys, J.K.: An HTML keyphrase extractor. Department of Computer Science, University of California, Riverside, CA, USA, Technical Report (2002)
Reddy, S., Sharo, S.: Cross Language POS taggers (and other tools) for Indian languages an experiment with Kannada using Telugu resources. In: Proceedings of IJCNLP Workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies. Chiang Mai, Thailand (2011)
Bharati, A., Sangal, R., Sharma, D.M., Bai, L.: Anncorra: annotating corpora guidelines for pos and chunk annotation for indian languages. Technical Report. Technical Report (TRLTRC-31), LTRC, IIIT-Hyderabad (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Naidu, R., Bharti, S.K., Babu, K.S., Mohapatra, R.K. (2018). Text Summarization with Automatic Keyword Extraction in Telugu e-Newspapers. In: Satapathy, S., Bhateja, V., Das, S. (eds) Smart Computing and Informatics . Smart Innovation, Systems and Technologies, vol 77. Springer, Singapore. https://doi.org/10.1007/978-981-10-5544-7_54
Download citation
DOI: https://doi.org/10.1007/978-981-10-5544-7_54
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5543-0
Online ISBN: 978-981-10-5544-7
eBook Packages: EngineeringEngineering (R0)