Document and Term Clustering
Clustering plays many important roles in Information Retrieval. It can support both the search functions as well as the display of results. Given very large data sets, clustering can quickly become computationally impossible and many systems limit the number of items that can be clustered. Different clustering techniques can also affect the resources needed to do the clustering. As the computations required for a technique are reduced the accuracy of the clustering also decreases. Clustering can be applied to items, thus creating a document cluster which can be used in suggesting additional items or to be used in visualization of search results. Clustering to a lesser extent can be applied to the words in items and can be used to generate automatically a statistical thesaurus. The major clustering techniques are described along with discussion on how to create hierarchical clusters.