Document and Term Clustering

Kowalski, Gerald

doi:10.1007/978-1-4419-7716-8_6

Gerald Kowalski²

1935 Accesses
1 Citations

Abstract

Clustering plays many important roles in Information Retrieval. It can support both the search functions as well as the display of results. Given very large data sets, clustering can quickly become computationally impossible and many systems limit the number of items that can be clustered. Different clustering techniques can also affect the resources needed to do the clustering. As the computations required for a technique are reduced the accuracy of the clustering also decreases. Clustering can be applied to items, thus creating a document cluster which can be used in suggesting additional items or to be used in visualization of search results. Clustering to a lesser extent can be applied to the words in items and can be used to generate automatically a statistical thesaurus. The major clustering techniques are described along with discussion on how to create hierarchical clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Ashburn, VA, USA
Gerald Kowalski

Authors

Gerald Kowalski
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kowalski, G. (2011). Document and Term Clustering. In: Information Retrieval Architecture and Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-7716-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7716-8_6
Published: 01 December 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-7715-1
Online ISBN: 978-1-4419-7716-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics