Advertisement

Reproducibility and Validity in CLEF

  • Norbert FuhrEmail author
Chapter
Part of the The Information Retrieval Series book series (INRE, volume 41)

Abstract

In this paper, we investigate CLEF’s contribution to the reproducibility of IR experiments. After discussing the concepts of reproducibility and validity, we show that CLEF has not only produced test collections that can be re-used by other researchers, but also undertaken various efforts in enabling reproducibility.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agosti M, Di Buccio E, Ferro N, Masiero I, Peruzzo S, Silvello G (2012) DIRECTions: design and specification of an IR evaluation infrastructure. In: Catarci T, Forner P, Hiemstra D, Peñas A, Santucci G (eds) Information access evaluation. Multilinguality, multimodality, and visual analytics. Proceedings of the third international conference of the CLEF initiative (CLEF 2012). Lecture notes in computer science (LNCS), vol 7488. Springer, Heidelberg, pp 88–99CrossRefGoogle Scholar
  2. Angelini M, Ferro N, Santucci G, Silvello G (2016) A visual analytics approach for what-if analysis of information retrieval systems. In Perego R, Sebastiani F, Aslam JA, Ruthven I, Zobel J (eds) Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR 2016, Pisa, July 17–21, 2016. ACM, New York, pp 1081–1084. ISBN 978-1-4503-4069-4. http://doi.acm.org/10.1145/2911451.2911462 CrossRefGoogle Scholar
  3. Armstrong TG, Moffat A, Webber W, Zobel J (2009) Improvements that don’t add up: ad-hoc retrieval results since 1998. In: Cheung DW-L, Song I-Y, Chu WW, Hu X, Lin JJ (eds) Proceedings of the 18th ACM conference on Information and knowledge management CIKM. ACM, New York, pp 601–610. ISBN 978-1-60558-512-3CrossRefGoogle Scholar
  4. Besançon R, Chaudiron S, Mostefa D, Timimi I, Choukri K, Laïb M (2010) Information filtering evaluation: overview of CLEF 2009 INFILE track. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D, Peñas A, Roda G (eds) Multilingual information access evaluation vol. I. Text retrieval experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS), vol 6241. Springer, Heidelberg, pp 342–353Google Scholar
  5. Braschler M (2002) CLEF 2001 – overview of results. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Evaluation of cross-language information retrieval systems: second workshop of the cross–language evaluation forum (CLEF 2001) revised papers. Lecture notes in computer science (LNCS), vol 2406. Springer, Heidelberg, pp 9–26CrossRefGoogle Scholar
  6. Carterette BA (2012) Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans Inf Syst 30(1):4:1–4:34. http://doi.acm.org/10.1145/2094072.2094076 CrossRefGoogle Scholar
  7. Di Nunzio GM, Ferro N (2005) DIRECT: a system for evaluating information access components of digital libraries. In: Rauber A, Christodoulakis C, Tjoa AM (eds) Research and advanced technology for digital libraries, 9th European conference, ECDL 2005, Vienna, Austria, September 18–23, 2005, proceedings. Springer, Berlin, pp 483–484. https://doi.org/10.1007/11551362_46 CrossRefGoogle Scholar
  8. Ferro N, Fuhr N, Jarvelin K, Kando N, Lippold M, Zobel J (2016) Increasing reproducibility in IR: findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science”. SIGIR Forum 50(1):68–82. http://sigir.org/files/forum/2016J/p068.pdf CrossRefGoogle Scholar
  9. Ferro N, Fuhr N, Grefenstette G, Konstan JA, Castells P, Daly EM, Declerck T, Ekstrand MD, Geyer W, Gonzalo J, Kuflik T, Linden K, Magnini B, Nie J-Y, Perego R, Shapira B, Soboroff I, Tintarev N, Verspoor K, Willemsen MC, Zobel J (2018) The Dagstuhl perspectives workshop on performance modeling and prediction. SIGIR Forum 52(1):91–101CrossRefGoogle Scholar
  10. Freire J, Fuhr N, Rauber A (2016) Reproducibility of data-oriented experiments in e-science. Dagstuhl Rep 6(1):108–159. http://drops.dagstuhl.de/opus/institut_dagrep.php?fakultaet=07 Google Scholar
  11. Fuhr N (2017) Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51(3):32–41. http://sigir.org/wp-content/uploads/2018/01/p032.pdf CrossRefGoogle Scholar
  12. Gonzalo J, Oard DW (2005) iCLEF 2004 track overview: pilot experiments in interactive cross-language question answering. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images: fifth workshop of the cross–language evaluation forum (CLEF 2004) revised selected papers. Lecture notes in computer science (LNCS), vol 3491. Springer, Heidelberg, pp 310–322CrossRefGoogle Scholar
  13. Kille B, Lommatzsch A, Hopfgartner F, Larson M, Brodt T (2017) CLEF 2017 newsreel overview: offline and online evaluation of stream-based news recommender systems. In Cappellato L, Ferro N, Goeuriot L, Mandl T (eds) Working notes of CLEF 2017 - conference and labs of the evaluation forum, Dublin, September 11–14, 2017. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1866/. http://ceur-ws.org/Vol-1866/invited_paper_17.pdf
  14. Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):943–952CrossRefGoogle Scholar
  15. Potthast M, Gollub T, Hagen M, Kiesel J, Michel M, Oberländer A, Tippmann M, Barrón-Cedeño A, Gupta P, Rosso P, Stein B (2012) Overview of the 4th international competition on plagiarism detection. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073, http://ceur-ws.org/Vol-1178/
  16. Potthast M, Hagen M, Gollub T, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
  17. Rao J, Lin JJ, Efron M (2015) Reproducible experiments on lexical and temporal feedback for tweet search. In Hanbury A, Kazai G, Rauber A, Fuhr N (eds) Advances in information retrieval - 37th European conference on IR research, ECIR 2015, Vienna, March 29–April 2, 2015. Proceedings. Lecture Notes in Computer Science, vol 9022, pp 755–767. ISBN 978-3-319-16353-6. https://doi.org/10.1007/978-3-319-16354-3_82 Google Scholar
  18. Schuth A, Balog K, Kelly L (2015) Overview of the living labs for information retrieval evaluation (LL4IR) CLEF Lab 2015. In: Mothe J, Savoy J, Kamps J, Pinel-Sauvagnat K, Jones GJF, SanJuan E, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the sixth international conference of the CLEF association (CLEF 2015). Lecture notes in computer science (LNCS), vol 9283. Springer, Heidelberg, pp 484–496CrossRefGoogle Scholar
  19. Silvello G, Bordea G, Ferro N, Buitelaar P, Bogers T (2017) Semantic representation and enrichment of information retrieval experimental data. Int J Digit Libr 18(2):145–172. ISSN 1432-5012. https://doi.org/10.1007/s00799-016-0172-8 CrossRefGoogle Scholar
  20. Voorhees EM, Buckley C (2002) The effect of topic set size on retrieval experiment error. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02. ACM, New York, pp 316–323. ISBN 1-58113-561-0. https://doi.org/10.1145/564376.564432 CrossRefGoogle Scholar
  21. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco. ISBN 0123748569, ISBN 9780123748560CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Duisburg-EssenDuisburgGermany

Personalised recommendations