Skip to main content

Clustering for Post Hoc Information Retrieval

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 9 Accesses

Synonyms

Document clustering

Definition

Clustering is a technique that allows similar objects to be grouped together based on common attributes. It has been used in information retrieval for different retrieval process tasks and objects of interest (e.g., documents, authors, index terms). Attributes used for clustering may include assigned terms within documents and their co-occurrences, the documents themselves if the focus is on index terms, or linkages (e.g., hypertext links of Web documents, citations or co-citations within documents, documents accessed). Clustering in IR facilitates browsing and assessment of retrieved documents for relevance and may reveal unexpected relationships among the clustered objects.

Historical Background

A fundamental challenge of information retrieval (IR) that continues today is how to best match user queries with documents in a queried collection. Many mathematical models have been developed over the years to facilitate the matching process. The...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Carpineto C, Mizzaro S, Romano G, Snidero M. Mobile information retrieval with search results clustering: prototypes and evaluations. J Am Soc Inf Sci Technol. 2009;60(5):877–95.

    Article  Google Scholar 

  2. Chen HM, Cooper MD. Using clustering techniques to detect usage patterns in a web-based information system. J Am Soc Inf Sci Technol. 2001;52(11):888–904.

    Article  Google Scholar 

  3. Crestani F, Wu S. Testing the cluster hypothesis in distributed information retrieval. Inf Process Manage. 2006;42(5):1137–50.

    Article  Google Scholar 

  4. Crouch CJ. A cluster-based approach to thesaurus construction. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1988. p. 309–20.

    Google Scholar 

  5. Cutting DR, Karger DR, Pedersen JO, Tukey JW. Scatter/Gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 318–29.

    Google Scholar 

  6. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR. Efficient stochastic algorithms for document clustering. Inform Sci. 2013(Jan);220:269–91.

    Article  MathSciNet  Google Scholar 

  7. Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 76–84.

    Google Scholar 

  8. Jardine N, van Rijsbergen C. The use of hierarchic clustering in information retrieval. Inf Storage Retr. 1971;7(5):217–40.

    Article  Google Scholar 

  9. Kalogeratos A, Likas A. Document clustering using synthetic cluster prototypes. Data Knowl Eng. 2011;70(3):284–306.

    Article  Google Scholar 

  10. Liu X, Croft WB. Cluster-based retrieval using language models. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 186–93.

    Google Scholar 

  11. Rasmussen E. Clustering algorithms. In: Frakes WB, Baeza-Yates R, editors. Information retrieval data structures & algorithms. Englewood Cliffs: Prentice Hall; 1992. p. 419–42.

    Google Scholar 

  12. Wen JR, Nie JY, Zhang HJ. Query clustering using user logs. ACM Trans Inf Syst. 2002;20(1):59–81.

    Article  Google Scholar 

  13. Wu W, Xiong H, Shekhar S, editors. Clustering and information retrieval. Norwell: Kluwer; 2004.

    MATH  Google Scholar 

  14. Zhu WZ, Allen RB. Document clustering using the LSI subspace signature model. J Am Soc Inf Sci Technol. 2013;64(4):844–60.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dietmar Wolfram .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Wolfram, D. (2018). Clustering for Post Hoc Information Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_950

Download citation

Publish with us

Policies and ethics