On Effectiveness Measures and Relevance Functions in Ranking INEX Systems

Vu, Huyen-Trang; Gallinari, Patrick

doi:10.1007/11562382_24

On Effectiveness Measures and Relevance Functions in Ranking INEX Systems

Huyen-Trang Vu²⁰ &
Patrick Gallinari²⁰

Conference paper

993 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Abstract

This paper investigates the effect of performance measures and relevance functions in comparing retrieval systems in INEX, an evaluation forum dedicated to XML retrieval. We focus on two interdependent challenges which arise when evaluating XML retrieval systems, namely weak ordering issue of retrieved lists and multivalued relevance scales. Our analysis provides empirical evidence about the reasonableness of popular assumptions in information retrieval (IR) evaluation which state that ties can be ignored and binary relevance is sufficient. We also shed light on the impact of a parameter in Q-measure [18] on the sensitivity of the metric.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: ACM SIGIR 2000, Athens, Greece, pp. 33–40. ACM Press, New York (2000)
Chapter Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Sanderson, et al. (eds.) [19], pp. 25–32
Google Scholar
Cooper, W.S.: Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems. American Documentation 19(1), 30–41 (1968)
Article Google Scholar
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
de Vries, A.P., Kazai, G., Lalmas, M.: Evaluation metrics 2004. In: INEX 2004 Workshop Pre-Proceedings, pp. 249–250 (2004), Available at, http://inex.is.informatik.uni-duisburg.de:2004/pdf/INEX2004PreProceedings.pdf
de Vries, A.P., Kazai, G., Lalmas, M.: Tolerance to Irrelevance: A User-effort Oriented Evaluation of Retrieval Systems without Predefined Retrieval Unit. In: RIAO 2004, Avignon, France, pp. 463–473 (April 2004)
Google Scholar
Hawking, D., Robertson, S.: On collection size and retrieval effectiveness. Information Retrieval 6(1), 99–105 (2003)
Article Google Scholar
Hull, D.A., Kantor, P., Ng, K.: Advanced approaches to the statistical analysis of TREC information retrieval experiments. Technical report (1997), Unpublished, contact the first author for a copy: hull@clairvoyancecorp.com
Google Scholar
Kazai, G., Lalmas, M., de Vries, A.P.: The overlap problem in content-oriented XML retrieval evaluation. In: Sanderson, et al. (eds.) [19], pp. 72–79
Google Scholar
Kazai, G., Lalmas, M., de Vries, A.P.: Reliability Tests for the XCG and inex-2002 Metric. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 60–72. Springer, Heidelberg (2005)
Chapter Google Scholar
Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the evaluation of XML retrieval (INEX 2002). Journal of the American Society for Information Science and Technology (JASIST) 55(6), 551–556 (2004)
Article Google Scholar
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology (JASIST) 53(13), 1120–1129 (2002)
Article Google Scholar
Kraaij, W.: Variations on Language Modeling for Information Retrieval. PhD thesis, University of Twente (2004)
Google Scholar
Mea, V.D., Mizzaro, S.: Measuring retrieval effectiveness: a new proposal and a first experimental validation. Journal of the American Society for Information Science and Technology (JASIST) 55(6), 530–543 (2004)
Article Google Scholar
Myaeng, S.H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A Flexible Model for Retrieval of SGML documents. In: SIGIR 1998, Melbourne, Australia, pp. 138–140 (August 1998)
Google Scholar
Raghavan, V.V., Jung, G.S., Bollmann, P.: A critical investigation of recall and precision as measures of retrieval system performance. ACM Transactions on Information Systems 7(3), 205–229 (1989)
Article Google Scholar
Sakai, T.: New Performance metrics based on Multigrade Relevance: Their Application to Question Answering. In: NTCIR-4 Proceedings (2004)
Google Scholar
Sakai, T.: Ranking the NTCIR Systems Based on Multigrade Relevance. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 251–262. Springer, Heidelberg (2005)
Chapter Google Scholar
Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR 2004: Proc. of the 27th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, July 25-29 (2004)
Google Scholar
Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: ACM SIGIR 2005 (2005) (to appear)
Google Scholar
Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Info. Process. Management 33(4), 495–512 (1997)
Article Google Scholar
Soboroff, I.: On evaluating web search with very few relevant documents. In: Sanderson, et al. (eds.) [19], pp. 530–531
Google Scholar
Tague-Sutcliffe, J., Blustein, J.: A statistical analysis of the TREC-3 data. In: Proceedings of TREC-3, NIST Special Publication 500-225, pp. 385–398 (April 1995)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval, Butterworths (1979)
Google Scholar
Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1), 11–20 (2005)
Article Google Scholar
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: ACM SIGIR 2002, pp. 316–323. ACM Press, New York (August 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Computer Science (LIP6), University Pierre and Marie Curie, 8, rue du capitaine Scott, 75015, Paris, France
Huyen-Trang Vu & Patrick Gallinari

Authors

Huyen-Trang Vu
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, HT., Gallinari, P. (2005). On Effectiveness Measures and Relevance Functions in Ranking INEX Systems. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_24

Download citation

DOI: https://doi.org/10.1007/11562382_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics