Synonyms
Document clustering
Definition
Clustering is a technique that allows similar objects to be grouped together based on common attributes. It has been used in information retrieval for different retrieval process tasks and objects of interest (e.g., documents, authors, index terms). Attributes used for clustering may include assigned terms within documents and their co-occurrences, the documents themselves if the focus is on index terms, or linkages (e.g., hypertext links of Web documents, citations or co-citations within documents, documents accessed). Clustering in IR facilitates browsing and assessment of retrieved documents for relevance and may reveal unexpected relationships among the clustered objects.
Historical Background
A fundamental challenge of information retrieval (IR) that continues today is how to best match user queries with documents in a queried collection. Many mathematical models have been developed over the years to facilitate the matching process. The...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Carpineto C, Mizzaro S, Romano G, Snidero M. Mobile information retrieval with search results clustering: prototypes and evaluations. J Am Soc Inf Sci Technol. 2009;60(5):877–95.
Chen HM, Cooper MD. Using clustering techniques to detect usage patterns in a web-based information system. J Am Soc Inf Sci Technol. 2001;52(11):888–904.
Crestani F, Wu S. Testing the cluster hypothesis in distributed information retrieval. Inf Process Manage. 2006;42(5):1137–50.
Crouch CJ. A cluster-based approach to thesaurus construction. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1988. p. 309–20.
Cutting DR, Karger DR, Pedersen JO, Tukey JW. Scatter/Gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 318–29.
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR. Efficient stochastic algorithms for document clustering. Inform Sci. 2013(Jan);220:269–91.
Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 76–84.
Jardine N, van Rijsbergen C. The use of hierarchic clustering in information retrieval. Inf Storage Retr. 1971;7(5):217–40.
Kalogeratos A, Likas A. Document clustering using synthetic cluster prototypes. Data Knowl Eng. 2011;70(3):284–306.
Liu X, Croft WB. Cluster-based retrieval using language models. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 186–93.
Rasmussen E. Clustering algorithms. In: Frakes WB, Baeza-Yates R, editors. Information retrieval data structures & algorithms. Englewood Cliffs: Prentice Hall; 1992. p. 419–42.
Wen JR, Nie JY, Zhang HJ. Query clustering using user logs. ACM Trans Inf Syst. 2002;20(1):59–81.
Wu W, Xiong H, Shekhar S, editors. Clustering and information retrieval. Norwell: Kluwer; 2004.
Zhu WZ, Allen RB. Document clustering using the LSI subspace signature model. J Am Soc Inf Sci Technol. 2013;64(4):844–60.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Wolfram, D. (2018). Clustering for Post Hoc Information Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_950
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_950
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering