Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Clustering for Post Hoc Information Retrieval

  • Dietmar WolframEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_950


Document clustering


Clustering is a technique that allows similar objects to be grouped together based on common attributes. It has been used in information retrieval for different retrieval process tasks and objects of interest (e.g., documents, authors, index terms). Attributes used for clustering may include assigned terms within documents and their co-occurrences, the documents themselves if the focus is on index terms, or linkages (e.g., hypertext links of Web documents, citations or co-citations within documents, documents accessed). Clustering in IR facilitates browsing and assessment of retrieved documents for relevance and may reveal unexpected relationships among the clustered objects.

Historical Background

A fundamental challenge of information retrieval (IR) that continues today is how to best match user queries with documents in a queried collection. Many mathematical models have been developed over the years to facilitate the matching process. The...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Carpineto C, Mizzaro S, Romano G, Snidero M. Mobile information retrieval with search results clustering: prototypes and evaluations. J Am Soc Inf Sci Technol. 2009;60(5):877–95.CrossRefGoogle Scholar
  2. 2.
    Chen HM, Cooper MD. Using clustering techniques to detect usage patterns in a web-based information system. J Am Soc Inf Sci Technol. 2001;52(11):888–904.CrossRefGoogle Scholar
  3. 3.
    Crestani F, Wu S. Testing the cluster hypothesis in distributed information retrieval. Inf Process Manage. 2006;42(5):1137–50.CrossRefGoogle Scholar
  4. 4.
    Crouch CJ. A cluster-based approach to thesaurus construction. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1988. p. 309–20.Google Scholar
  5. 5.
    Cutting DR, Karger DR, Pedersen JO, Tukey JW. Scatter/Gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 318–29.Google Scholar
  6. 6.
    Forsati R, Mahdavi M, Shamsfard M, Meybodi MR. Efficient stochastic algorithms for document clustering. Inform Sci. 2013(Jan);220:269–91.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 76–84.Google Scholar
  8. 8.
    Jardine N, van Rijsbergen C. The use of hierarchic clustering in information retrieval. Inf Storage Retr. 1971;7(5):217–40.CrossRefGoogle Scholar
  9. 9.
    Kalogeratos A, Likas A. Document clustering using synthetic cluster prototypes. Data Knowl Eng. 2011;70(3):284–306.CrossRefGoogle Scholar
  10. 10.
    Liu X, Croft WB. Cluster-based retrieval using language models. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 186–93.Google Scholar
  11. 11.
    Rasmussen E. Clustering algorithms. In: Frakes WB, Baeza-Yates R, editors. Information retrieval data structures & algorithms. Englewood Cliffs: Prentice Hall; 1992. p. 419–42.Google Scholar
  12. 12.
    Wen JR, Nie JY, Zhang HJ. Query clustering using user logs. ACM Trans Inf Syst. 2002;20(1):59–81.CrossRefGoogle Scholar
  13. 13.
    Wu W, Xiong H, Shekhar S, editors. Clustering and information retrieval. Norwell: Kluwer; 2004.zbMATHGoogle Scholar
  14. 14.
    Zhu WZ, Allen RB. Document clustering using the LSI subspace signature model. J Am Soc Inf Sci Technol. 2013;64(4):844–60.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of Wisconsin-MilwaukeeMilwaukeeUSA

Section editors and affiliations

  • Edie Rasmussen
    • 1
  1. 1.Library, Archival & Information StudiesThe University of British ColumbiaVancouverCanada