Skip to main content

From “Identical” to “Similar”: Fusing Retrieved Lists Based on Inter-document Similarities

  • Conference paper
Book cover Advances in Information Retrieval Theory (ICTIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5766))

Included in the following conference series:

Abstract

We present a novel approach to fusing document lists that are retrieved in response to a query. Our approach is based on utilizing information induced from inter-document similarities. Specifically, the key insight guiding the derivation of our methods is that similar documents from different lists can provide relevance-status support to each other. We use a graph-based method to model relevance-status propagation between documents. The propagation is governed by inter-document-similarities and by retrieval scores of documents in the lists. Empirical evaluation shows the effectiveness of our methods in fusing TREC runs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Croft, W.B.: Combining approaches to information retrieval. In: [33], ch. 1, pp. 1–36.

    Google Scholar 

  2. Croft, W.B., Thompson, R.H.: I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology 38(6), 389–404 (1984)

    Article  Google Scholar 

  3. Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Proceedings of TREC-2 (1994)

    Google Scholar 

  4. Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR, pp. 21–28 (1995)

    Google Scholar 

  5. Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of SIGIR, pp. 267–276 (1997)

    Google Scholar 

  6. Das-Gupta, P., Katzer, J.: A study of the overlap among document representations. In: SIGIR, pp. 106–114 (1983)

    Google Scholar 

  7. Griffiths, A., Luckhurst, H.C., Willett, P.: Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS) 37(1), 3–11 (1986)

    Article  Google Scholar 

  8. Chowdhury, A., Frieder, O., Grossman, D.A., McCabe, M.C.: Analyses of multiple-evidence combinations for retrieval strategies. In: Proceedings of SIGIR, pp. 394–395 (2001), poster

    Google Scholar 

  9. Soboroff, I., Nicholas, C.K., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of SIGIR, pp. 66–73 (2001)

    Google Scholar 

  10. Beitzel, S.M., Jensen, E.C., Chowdhury, A., Frieder, O., Grossman, D.A., Goharian, N.: Disproving the fusion hypothesis: An analysis of data fusion via effective information retrieval strategies. In: Proceedings of SAC, pp. 823–827 (2003)

    Google Scholar 

  11. van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)

    Google Scholar 

  12. Kurland, O., Lee, L.: PageRank without hyperlinks: Structural re-ranking using links induced by language models. In: Proceedings of SIGIR, pp. 306–313 (2005)

    Google Scholar 

  13. Kurland, O.: Inter-document similarities, language models, and ad hoc retrieval, PhD thesis. Cornell University (2006)

    Google Scholar 

  14. Diaz, F.: Regularizing ad hoc retrieval scores. In: Proceedings of CIKM, pp. 672–679 (2005)

    Google Scholar 

  15. Pinski, G., Narin, F.: Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management 12, 297–312 (1976)

    Article  Google Scholar 

  16. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International World Wide Web Conference, pp. 107–117 (1998)

    Google Scholar 

  17. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  18. Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of SIGIR, pp. 276–284 (2001)

    Google Scholar 

  19. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of CIKM, pp. 538–548 (2002)

    Google Scholar 

  20. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR, pp. 46–54 (1998)

    Google Scholar 

  21. Craswell, N., Hawking, D., Thistlewaite, P.B.: Merging results from isolated search engines. In: Proceedings of the Australian Database Conference, pp. 189–200 (1999)

    Google Scholar 

  22. Beitzel, S.M., Jensen, E.C., Frieder, O., Chowdhury, A., Pass, G.: Surrogate scoring for improved metasearch precision. In: Proceedings of SIGIR, pp. 583–584 (2005)

    Google Scholar 

  23. Selvadurai, S.B.: Implementing a metasearch framework with content-directed result merging, Master’s thesis. North Carolina State University (2007)

    Google Scholar 

  24. Daniłowicz, C., Baliński, J.: Document ranking based upon Markov chains. Information Processing and Management 41(4), 759–775 (2000)

    MATH  Google Scholar 

  25. Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: Proceedings of SIGIR, pp. 504–511 (2005)

    Google Scholar 

  26. Kurland, O., Lee, L.: Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In: Proceedings of SIGIR, pp. 83–90 (2006)

    Google Scholar 

  27. Otterbacher, J., Erkan, G., Radev, D.R.: Using random walks for question-focused sentence retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)

    Google Scholar 

  28. Diaz, F.: A method for transferring retrieval scores between collections with non overlapping vocabularies. In: Proceedings of SIGIR, pp. 805–806 (2008) (poster)

    Google Scholar 

  29. Diaz, F.: Performance prediction using spatial autocorrelation. In: Proceedings of SIGIR, pp. 583–590 (2007)

    Google Scholar 

  30. Erkan, G., Radev, D.R.: LexPageRank: Prestige in multi-document text summarization. In: Proceedings of EMNLP, pp. 365–371 (2004), poster

    Google Scholar 

  31. Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004), poster

    Google Scholar 

  32. Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)

    Google Scholar 

  33. Croft, W.B. (ed.): Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval. The Kluwer International Series on Information Retrieval, vol. 7. Kluwer, Dordrecht (2000)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kozorovitzky, A.K., Kurland, O. (2009). From “Identical” to “Similar”: Fusing Retrieved Lists Based on Inter-document Similarities. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04417-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04416-8

  • Online ISBN: 978-3-642-04417-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics