Text Similarity and Clustering

Previous chapters have covered several techniques of analyzing text and extracting interesting insights. We have looked at supervised machine learning (ML) techniques that are used to classify or categorize text documents into several pre-assumed categories. Unsupervised techniques like topic models and document summarization have also been also covered, which involved trying to extract and retrieve key themes and information from large text documents and corpora. In this chapter, we will be looking at several other techniques and use-cases that leverage unsupervised learning and information retrieval concepts.


