Abstract
Retrievability is an important measure in information retrieval (IR) that can be used to analyse retrieval models and document collections. Rather than just focusing on a set of few documents that are given in the form of relevance judgements, retrievability examines what is retrieved, how frequently it is retrieved and how much effort is needed to retrieve it. Such a measure is of particular interest within the recall-oriented retrieval systems (e.g. patent or legal retrieval), because in this context a document needs to be retrieved before it can be judged for relevance. If a retrieval model makes some patents hard to find, patent searchers could miss relevant documents just because of the bias of the retrieval model. In this chapter we explain the concept of retrievability in information retrieval. We also explain how it can be estimated and how it can be used for analysing a retrieval bias of retrieval models. We also show how retrievability relates to effectiveness by analysing the relationship between retrievability and effectiveness measures and how the retrievability measure can be used to improve effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arampatzis A, Kamps J, Kooken M, Nussbaum N (2007) Access to legal documents: exact match, best match, and combinations. In: Proceedings of the sixteenth text retrieval conference (TREC’07)
Azzopardi L, Bache R (2010) On the relationship between effectiveness and accessibility. In: SIGIR ’10: proceeding of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, Geneva, pp 889–890
Azzopardi L, Owens C (2009) Search engine predilection towards news media providers. In: SIGIR ’09: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, Boston, MA, pp 774–775
Azzopardi L, Vinay V (2008) Accessibility in information retrieval. In: ECIR’08: proceedings of the 30th European conference on IR research, pp 482–489
Azzopardi L, Vinay V (2008) Retrievability: an evaluation measure for higher order information access tasks. In: CIKM ’08: proceeding of the 17th ACM conference on information and knowledge management, Napa Valley, CA, pp 561–570
Bache R, Azzopardi L (2010) Improving access to large patent corpora. In: Transactions on large-scale data- and knowledge-centered systems II, vol 2. Springer, Berlin, pp 103–121
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York
Bashir S, Rauber A (2014) Automatic ranking of retrieval models using retrievability measure. Knowl Inf Syst 41(1):189–221
Callan J, Connell M (2001) Query-based sampling of text databases. ACM Trans Inf Syst J 19(2):97–130
Chowdhury GG (2004) Introduction to modern information retrieval, 2nd edn. Facet Publishing, London
Dumble PL, Morris JM, Wigan MR (1979) Accessibility indicators for transport planning. Transp Res Part A Gen 13:91–109
Efron M (2009) Using multiple query aspects to build test collections without human relevance judgments. In: Advances in information retrieval, proceedings of 31th European conference on IR research, ECIR 2009, Toulouse, 6–9 April 2009, pp 276–287
Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manage J 43(5):1149–1153
Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306–316
Geurs KT, van Wee B (2004) Accessibility evaluation of land-use and transport strategies: Review and research directions. J Transp Geogr 12:127–140
Hansen WG (1959) How accessibility shape land use. J Am Inst Plann 25:73–76
Harter SP, Hert CA (1997) Evaluation of information retrieval systems: approaches, issues, and methods. Ann Rev Inf Sci Technol 32:3–94
Hauff C, Hiemstra D, Azzopardi L, de Jong F (2010) A case for automatic system evaluation. In: Advances in information retrieval, proceedings of the 32nd European conference on IR research, ECIR 2010, Milton Keynes, 28–31 March 2010, pp 153–165
Lauw HW, Lim E-P, Wang K (2006) Bias and controversy: beyond the statistical deviation. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, pp 625–630
Litman T (2008) Evaluating accessibility for transportation planning. Victoria Transport Policy Institute
Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at TREC. In: SIGIR forum, vol 43, no 2. ACM, New York, pp 63–70
Magdy W, Jones GJF (2010) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: SIGIR’10: ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 611–618
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process 4(2):190–206
Mowshowitz A, Kawaguchi A (2002) Bias on the web. In: Communications of the ACM, vol 45, no 9. ACM, New York, NY, pp 56–60
Nuray R, Can F (2006) Automatic ranking of information retrieval systems using data fusion. Inf Process Manage 42(3):595–614
Ounis I, De Rijke M, Macdonald C, Mishne G, Soboroff I (2006) Overview of the TREC 2006 blog track. In: Proceedings of the text retrieval conference, TREC’06
Petricek V, Escher T, Cox IJ, Margetts H (2006) The web structure of e-government - developing a methodology for quantitative evaluation. In: WWW ’06 proceedings of the 15th international conference on World Wide Web, pp 669–678
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, Dublin, pp 232–241
Sakai T, Lin C-Y (2010) Ranking retrieval systems without relevance assessments: revisited. In: Proceedings of the 3rd international workshop on evaluating information access, EVIA 2010, National Center of Sciences, Tokyo, 15 June 2010, pp 25–33
Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR’05: ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 162–169
Shi Z, Li P, Wang B (2010) Using clustering to improve retrieval evaluation without relevance judgments. In: COLING 2010, 23rd international conference on computational linguistics, posters volume, Beijing, 23–27 August 2010, pp 1131–1139
Shi Z, Wang B, Li P, Shi Z (2010) Using global statistics to rank retrieval systems without relevance judgments. In: Shi Z, Vadera S, Aamodt A, Leake DB (eds) Intelligent information processing. IFIP advances in information and communication technology, vol 340. Springer, Berlin, pp 183–192
Singhal A (1997) AT&T at TREC-6. In: The 6th text retrieval conference (TREC6), pp 227–232
Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24:34–43
Spoerri A (2007) Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Inf Process Manage 43(4):1059–1070
Vaughan L, Thelwall M (2004) Search engine coverage bias: evidence and possible causes. Inf Process Manage J 40(4):693–707
Voorhees EM (2001) Overview of the TREC 2001 question answering track. In: Proceedings of the text retrieval conference, TREC’01, pp 42–51
Voorhees EM (2002) The philosophy of information retrieval evaluation. In: CLEF’01. Springer, Berlin, pp 355–370
Voorhees EM, Harman DK (2005) TREC experiment and evaluation in information retrieval. MIT Press, Cambridge, MA
Wilkie C, Azzopardi L (2014) A retrievability analysis: exploring the relationship between retrieval bias and retrieval performance. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, 3–7 November 2014, pp 81–90
Zhai CX (2002) Risk minimization and language modeling in text retrieval. Ph.D. thesis, Carnegie Mellon University
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Bashir, S., Rauber, A. (2017). Retrieval Models Versus Retrievability. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-53817-3_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53816-6
Online ISBN: 978-3-662-53817-3
eBook Packages: Computer ScienceComputer Science (R0)