Skip to main content

Retrieval Models Versus Retrievability

  • Chapter
  • First Online:
Current Challenges in Patent Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 37))

Abstract

Retrievability is an important measure in information retrieval (IR) that can be used to analyse retrieval models and document collections. Rather than just focusing on a set of few documents that are given in the form of relevance judgements, retrievability examines what is retrieved, how frequently it is retrieved and how much effort is needed to retrieve it. Such a measure is of particular interest within the recall-oriented retrieval systems (e.g. patent or legal retrieval), because in this context a document needs to be retrieved before it can be judged for relevance. If a retrieval model makes some patents hard to find, patent searchers could miss relevant documents just because of the bias of the retrieval model. In this chapter we explain the concept of retrievability in information retrieval. We also explain how it can be estimated and how it can be used for analysing a retrieval bias of retrieval models. We also show how retrievability relates to effectiveness by analysing the relationship between retrievability and effectiveness measures and how the retrievability measure can be used to improve effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arampatzis A, Kamps J, Kooken M, Nussbaum N (2007) Access to legal documents: exact match, best match, and combinations. In: Proceedings of the sixteenth text retrieval conference (TREC’07)

    Google Scholar 

  2. Azzopardi L, Bache R (2010) On the relationship between effectiveness and accessibility. In: SIGIR ’10: proceeding of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, Geneva, pp 889–890

    Google Scholar 

  3. Azzopardi L, Owens C (2009) Search engine predilection towards news media providers. In: SIGIR ’09: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, Boston, MA, pp 774–775

    Chapter  Google Scholar 

  4. Azzopardi L, Vinay V (2008) Accessibility in information retrieval. In: ECIR’08: proceedings of the 30th European conference on IR research, pp 482–489

    Google Scholar 

  5. Azzopardi L, Vinay V (2008) Retrievability: an evaluation measure for higher order information access tasks. In: CIKM ’08: proceeding of the 17th ACM conference on information and knowledge management, Napa Valley, CA, pp 561–570

    Chapter  Google Scholar 

  6. Bache R, Azzopardi L (2010) Improving access to large patent corpora. In: Transactions on large-scale data- and knowledge-centered systems II, vol 2. Springer, Berlin, pp 103–121

    Chapter  Google Scholar 

  7. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York

    Google Scholar 

  8. Bashir S, Rauber A (2014) Automatic ranking of retrieval models using retrievability measure. Knowl Inf Syst 41(1):189–221

    Article  Google Scholar 

  9. Callan J, Connell M (2001) Query-based sampling of text databases. ACM Trans Inf Syst J 19(2):97–130

    Article  Google Scholar 

  10. Chowdhury GG (2004) Introduction to modern information retrieval, 2nd edn. Facet Publishing, London

    Google Scholar 

  11. Dumble PL, Morris JM, Wigan MR (1979) Accessibility indicators for transport planning. Transp Res Part A Gen 13:91–109

    Google Scholar 

  12. Efron M (2009) Using multiple query aspects to build test collections without human relevance judgments. In: Advances in information retrieval, proceedings of 31th European conference on IR research, ECIR 2009, Toulouse, 6–9 April 2009, pp 276–287

    Google Scholar 

  13. Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manage J 43(5):1149–1153

    Article  Google Scholar 

  14. Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306–316

    Article  MathSciNet  Google Scholar 

  15. Geurs KT, van Wee B (2004) Accessibility evaluation of land-use and transport strategies: Review and research directions. J Transp Geogr 12:127–140

    Article  Google Scholar 

  16. Hansen WG (1959) How accessibility shape land use. J Am Inst Plann 25:73–76

    Article  Google Scholar 

  17. Harter SP, Hert CA (1997) Evaluation of information retrieval systems: approaches, issues, and methods. Ann Rev Inf Sci Technol 32:3–94

    Google Scholar 

  18. Hauff C, Hiemstra D, Azzopardi L, de Jong F (2010) A case for automatic system evaluation. In: Advances in information retrieval, proceedings of the 32nd European conference on IR research, ECIR 2010, Milton Keynes, 28–31 March 2010, pp 153–165

    Google Scholar 

  19. Lauw HW, Lim E-P, Wang K (2006) Bias and controversy: beyond the statistical deviation. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, pp 625–630

    Chapter  Google Scholar 

  20. Litman T (2008) Evaluating accessibility for transportation planning. Victoria Transport Policy Institute

    Google Scholar 

  21. Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at TREC. In: SIGIR forum, vol 43, no 2. ACM, New York, pp 63–70

    Google Scholar 

  22. Magdy W, Jones GJF (2010) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: SIGIR’10: ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 611–618

    Google Scholar 

  23. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  24. Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process 4(2):190–206

    Article  Google Scholar 

  25. Mowshowitz A, Kawaguchi A (2002) Bias on the web. In: Communications of the ACM, vol 45, no 9. ACM, New York, NY, pp 56–60

    Google Scholar 

  26. Nuray R, Can F (2006) Automatic ranking of information retrieval systems using data fusion. Inf Process Manage 42(3):595–614

    Article  MATH  Google Scholar 

  27. Ounis I, De Rijke M, Macdonald C, Mishne G, Soboroff I (2006) Overview of the TREC 2006 blog track. In: Proceedings of the text retrieval conference, TREC’06

    Google Scholar 

  28. Petricek V, Escher T, Cox IJ, Margetts H (2006) The web structure of e-government - developing a methodology for quantitative evaluation. In: WWW ’06 proceedings of the 15th international conference on World Wide Web, pp 669–678

    Google Scholar 

  29. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, Dublin, pp 232–241

    Google Scholar 

  30. Sakai T, Lin C-Y (2010) Ranking retrieval systems without relevance assessments: revisited. In: Proceedings of the 3rd international workshop on evaluating information access, EVIA 2010, National Center of Sciences, Tokyo, 15 June 2010, pp 25–33

    Google Scholar 

  31. Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR’05: ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 162–169

    Google Scholar 

  32. Shi Z, Li P, Wang B (2010) Using clustering to improve retrieval evaluation without relevance judgments. In: COLING 2010, 23rd international conference on computational linguistics, posters volume, Beijing, 23–27 August 2010, pp 1131–1139

    Google Scholar 

  33. Shi Z, Wang B, Li P, Shi Z (2010) Using global statistics to rank retrieval systems without relevance judgments. In: Shi Z, Vadera S, Aamodt A, Leake DB (eds) Intelligent information processing. IFIP advances in information and communication technology, vol 340. Springer, Berlin, pp 183–192

    Google Scholar 

  34. Singhal A (1997) AT&T at TREC-6. In: The 6th text retrieval conference (TREC6), pp 227–232

    Google Scholar 

  35. Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24:34–43

    Google Scholar 

  36. Spoerri A (2007) Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Inf Process Manage 43(4):1059–1070

    Article  Google Scholar 

  37. Vaughan L, Thelwall M (2004) Search engine coverage bias: evidence and possible causes. Inf Process Manage J 40(4):693–707

    Article  Google Scholar 

  38. Voorhees EM (2001) Overview of the TREC 2001 question answering track. In: Proceedings of the text retrieval conference, TREC’01, pp 42–51

    Google Scholar 

  39. Voorhees EM (2002) The philosophy of information retrieval evaluation. In: CLEF’01. Springer, Berlin, pp 355–370

    Google Scholar 

  40. Voorhees EM, Harman DK (2005) TREC experiment and evaluation in information retrieval. MIT Press, Cambridge, MA

    Google Scholar 

  41. Wilkie C, Azzopardi L (2014) A retrievability analysis: exploring the relationship between retrieval bias and retrieval performance. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, 3–7 November 2014, pp 81–90

    Google Scholar 

  42. Zhai CX (2002) Risk minimization and language modeling in text retrieval. Ph.D. thesis, Carnegie Mellon University

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Rauber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Bashir, S., Rauber, A. (2017). Retrieval Models Versus Retrievability. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-53817-3_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-53816-6

  • Online ISBN: 978-3-662-53817-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics