Hierarchical Clustering for Sentence Extraction Using Cosine Similarity Measure

Kavyasrujana, D.; Rao, B. Chakradhara

doi:10.1007/978-3-319-13728-5_21

D. Kavyasrujana⁶ &
B. Chakradhara Rao⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 337))

2748 Accesses
4 Citations

Abstract

Clustering is an unsupervised learning technique, grouping a set of objects into subsets or clusters. It forms the clusters that are similar with the data points internally, but dissimilar with the data points that are present in other clusters from each other. Extraction of data efficiently and effectively from the datasets or data holders need enhanced mechanism. Extraction of relevant sentences based on user query plays a big role in data mining and web mining etc. In this paper we propose an efficient and effective way to extract sentences by taking query as input and forming hierarchical clustering with cosine similarity measure. A Threshold value is taken initially, and clusters are divided depending on it. Further clustering is done based on the previous Threshold value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hatzivassiloglou, V., Klavans, J.L., Holcombe, M.L., Barzilay, R., Kan, M., McKeown, K.R.: SIMFINDER: A Flexible Clustering Tool for Summarization. In: Proc. NAACL Workshop Automatic Summarization, pp. 41–49 (2001)
Google Scholar
Zha, H.: Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In: Proc. 25th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 113–120 (2002)
Google Scholar
Radev, D.R., Jing, H., Stys, M., Tam, D.: Centroid-Based Summarization of Multiple Documents. Information Processing and Management: An Int’l J. 40, 919–938 (2004)
Article MATH Google Scholar
Aliguyev, R.M.: A New Sentence Similarity Measure and Sentence Based Extractive Technique for Automatic Text Summarization. Expert Systems with Applications 36, 7764–7772 (2009)
Article Google Scholar
Skabar, A., Abdalgader, K.: Clustering Sentence-level Text Using a Novel Fuzzy Relational Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering 25(1) (2013)
Google Scholar
Hanyurwimfura, D., Bo, L., Njagi, D., Dukuzumuremyi, J.P.: A Centroid and Relationship based Clustering for Organizing Research Papers. International Journal of Multimedia and Ubiquitous Engineering 9(3), 219–234 (2014)
Google Scholar
Wang, D., Li, T., Zhu, S., Ding, C.: Multi-Document Summarization via Sentence-Level Semantic Analysis and Symmetric Matrix Factorization. In: Proc. 31st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 307–314 (2008)
Google Scholar
Nasa, D.: Text Mining Techniques- A Survey. International Journal of Advanced Research in Computer Science and Software Engineering 2(4) (April 2012) ISSN: 2277 128X
Google Scholar
Gupta, V., Lehal, G.S.: A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence 1(1) (August. 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Sangivalasa, Bheemunipatnam[M], Visakhapatnam, India
D. Kavyasrujana & B. Chakradhara Rao

Authors

D. Kavyasrujana
View author publications
You can also search for this author in PubMed Google Scholar
B. Chakradhara Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Kavyasrujana .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
School of Information Technology, Jawaharlal Nehru Technological University Hyderabad, Hyderabad, India
A. Govardhan
Department of CSE, CMR Technical Campus, Hyderabad, India
K. Srujan Raju
Department of Computer Science & Engineering, Faculty of Engg., Tech. & Management, University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kavyasrujana, D., Rao, B.C. (2015). Hierarchical Clustering for Sentence Extraction Using Cosine Similarity Measure. In: Satapathy, S., Govardhan, A., Raju, K., Mandal, J. (eds) Emerging ICT for Bridging the Future - Proceedings of the 49th Annual Convention of the Computer Society of India (CSI) Volume 1. Advances in Intelligent Systems and Computing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-319-13728-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-13728-5_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13727-8
Online ISBN: 978-3-319-13728-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics