Text Similarity and Clustering

  • Dipanjan Sarkar


Previous chapters have covered several techniques of analyzing text and extracting interesting insights. We have looked at supervised machine learning (ML) techniques that are used to classify or categorize text documents into several pre-assumed categories. Unsupervised techniques like topic models and document summarization have also been also covered, which involved trying to extract and retrieve key themes and information from large text documents and corpora. In this chapter, we will be looking at several other techniques and use-cases that leverage unsupervised learning and information retrieval concepts.


Edit Distance Cosine Similarity Distance Metrics Affinity Propagation Document Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Dipanjan Sarkar 2016

Authors and Affiliations

  • Dipanjan Sarkar
    • 1
  1. 1.BangaloreIndia

Personalised recommendations