Skip to main content

On Effectiveness Measures and Relevance Functions in Ranking INEX Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Abstract

This paper investigates the effect of performance measures and relevance functions in comparing retrieval systems in INEX, an evaluation forum dedicated to XML retrieval. We focus on two interdependent challenges which arise when evaluating XML retrieval systems, namely weak ordering issue of retrieved lists and multivalued relevance scales. Our analysis provides empirical evidence about the reasonableness of popular assumptions in information retrieval (IR) evaluation which state that ties can be ignored and binary relevance is sufficient. We also shed light on the impact of a parameter in Q-measure [18] on the sensitivity of the metric.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: ACM SIGIR 2000, Athens, Greece, pp. 33–40. ACM Press, New York (2000)

    Chapter  Google Scholar 

  2. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Sanderson, et al. (eds.) [19], pp. 25–32

    Google Scholar 

  3. Cooper, W.S.: Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems. American Documentation 19(1), 30–41 (1968)

    Article  Google Scholar 

  4. Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  5. de Vries, A.P., Kazai, G., Lalmas, M.: Evaluation metrics 2004. In: INEX 2004 Workshop Pre-Proceedings, pp. 249–250 (2004), Available at, http://inex.is.informatik.uni-duisburg.de:2004/pdf/INEX2004PreProceedings.pdf

  6. de Vries, A.P., Kazai, G., Lalmas, M.: Tolerance to Irrelevance: A User-effort Oriented Evaluation of Retrieval Systems without Predefined Retrieval Unit. In: RIAO 2004, Avignon, France, pp. 463–473 (April 2004)

    Google Scholar 

  7. Hawking, D., Robertson, S.: On collection size and retrieval effectiveness. Information Retrieval 6(1), 99–105 (2003)

    Article  Google Scholar 

  8. Hull, D.A., Kantor, P., Ng, K.: Advanced approaches to the statistical analysis of TREC information retrieval experiments. Technical report (1997), Unpublished, contact the first author for a copy: hull@clairvoyancecorp.com

    Google Scholar 

  9. Kazai, G., Lalmas, M., de Vries, A.P.: The overlap problem in content-oriented XML retrieval evaluation. In: Sanderson, et al. (eds.) [19], pp. 72–79

    Google Scholar 

  10. Kazai, G., Lalmas, M., de Vries, A.P.: Reliability Tests for the XCG and inex-2002 Metric. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 60–72. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the evaluation of XML retrieval (INEX 2002). Journal of the American Society for Information Science and Technology (JASIST) 55(6), 551–556 (2004)

    Article  Google Scholar 

  12. Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology (JASIST) 53(13), 1120–1129 (2002)

    Article  Google Scholar 

  13. Kraaij, W.: Variations on Language Modeling for Information Retrieval. PhD thesis, University of Twente (2004)

    Google Scholar 

  14. Mea, V.D., Mizzaro, S.: Measuring retrieval effectiveness: a new proposal and a first experimental validation. Journal of the American Society for Information Science and Technology (JASIST) 55(6), 530–543 (2004)

    Article  Google Scholar 

  15. Myaeng, S.H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A Flexible Model for Retrieval of SGML documents. In: SIGIR 1998, Melbourne, Australia, pp. 138–140 (August 1998)

    Google Scholar 

  16. Raghavan, V.V., Jung, G.S., Bollmann, P.: A critical investigation of recall and precision as measures of retrieval system performance. ACM Transactions on Information Systems 7(3), 205–229 (1989)

    Article  Google Scholar 

  17. Sakai, T.: New Performance metrics based on Multigrade Relevance: Their Application to Question Answering. In: NTCIR-4 Proceedings (2004)

    Google Scholar 

  18. Sakai, T.: Ranking the NTCIR Systems Based on Multigrade Relevance. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 251–262. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR 2004: Proc. of the 27th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, July 25-29 (2004)

    Google Scholar 

  20. Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: ACM SIGIR 2005 (2005) (to appear)

    Google Scholar 

  21. Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Info. Process. Management 33(4), 495–512 (1997)

    Article  Google Scholar 

  22. Soboroff, I.: On evaluating web search with very few relevant documents. In: Sanderson, et al. (eds.) [19], pp. 530–531

    Google Scholar 

  23. Tague-Sutcliffe, J., Blustein, J.: A statistical analysis of the TREC-3 data. In: Proceedings of TREC-3, NIST Special Publication 500-225, pp. 385–398 (April 1995)

    Google Scholar 

  24. Van Rijsbergen, C.J.: Information Retrieval, Butterworths (1979)

    Google Scholar 

  25. Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1), 11–20 (2005)

    Article  Google Scholar 

  26. Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: ACM SIGIR 2002, pp. 316–323. ACM Press, New York (August 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vu, HT., Gallinari, P. (2005). On Effectiveness Measures and Relevance Functions in Ranking INEX Systems. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_24

Download citation

  • DOI: https://doi.org/10.1007/11562382_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics