Skip to main content

An Innovative Approach to Data Management and Curation of Experimental Data Generated Through IR Test Collections

  • Chapter
  • First Online:
Book cover Information Retrieval Evaluation in a Changing World

Part of the book series: The Information Retrieval Series ((INRE,volume 41))

Abstract

This paper describes the steps that led to the invention, design and development of the Distributed Information Retrieval Evaluation Campaign Tool (DIRECT) system for managing and accessing the data used and produced within experimental evaluation in Information Retrieval (IR). We present the context in which DIRECT was conceived, its conceptual model and its extension to make the data available on the Web as Linked Open Data (LOD) by enabling and enhancing their enrichment, discoverability and re-use. Finally, we discuss possible further evolutions of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agosti M, Di Nunzio GM, Ferro N (2007a) A proposal to extend and enrich the scientific data curation of evaluation campaigns. In: Sakay T, Sanderson M, Evans DK (eds) Proceedings of the 1st international workshop on evaluating information access (EVIA 2007). National Institute of Informatics, Tokyo, pp 62–73

    Google Scholar 

  • Agosti M, Di Nunzio GM, Ferro N (2007b) Scientific data of an evaluation campaign: do we properly deal with them? In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 11–20

    Chapter  Google Scholar 

  • Agosti M, Di Nunzio GM, Ferro N (2007c) The importance of scientific data curation for evaluation campaigns. In: Thanos C, Borri F, Candela L (eds) Digital libraries: research and development. First international DELOS conference. Revised selected papers. Lecture notes in computer science (LNCS), vol 4877. Springer, Heidelberg, pp 157–166

    Chapter  Google Scholar 

  • Agosti M, Ferro N, Thanos C (2009) DESIRE 2011: first international workshop on data infrastructures for supporting information retrieval evaluation. In: Ounis I, Ruthven I, Berendt B, de Vries AP, Wenfei F (eds) Proceedings of the 20th international conference on information and knowledge management (CIKM 2011). ACM Press, New York, pp 2631–2632

    Google Scholar 

  • Agosti M, Di Buccio E, Ferro N, Masiero I, Peruzzo S, Silvello G (2012) DIRECTions: design and specification of an IR evaluation infrastructure. In: Catarci T, Forner P, Hiemstra D, Peñas A, Santucci G (eds) Information access evaluation. Multilinguality, multimodality, and visual analytics. Proceedings of the third international conference of the CLEF initiative (CLEF 2012). Lecture notes in computer science (LNCS), vol 7488. Springer, Heidelberg, pp 88–99

    Chapter  Google Scholar 

  • Agosti M, Fuhr N, Toms E, Vakkari P (2013) Evaluation methodologies in information retrieval (dagstuhl seminar 13441). Dagstuhl Rep 3(10):92–126

    Google Scholar 

  • Agosti M, Fuhr N, Toms EG, Vakkari P (2014) Evaluation methodologies in information retrieval Dagstuhl seminar 13441. SIGIR Forum 48(1):36–41. https://doi.org/10.1145/2641383.2641390

    Article  Google Scholar 

  • Allan J, Aslam J, Azzopardi L, Belkin N, Borlund P, Bruza P, Callan J, Carman C, Clarke M, Craswell N, Croft WB, Culpepper JS, Diaz F, Dumais S, Ferro N, Geva S, Gonzalo J, Hawking D, Järvelin K, Jones G, Jones R, Kamps J, Kando N, Kanoulos E, Karlgren J, Kelly D, Lease M, Lin J, Mizzaro S, Moffat A, Murdock V, Oard DW, de Rijke M, Sakai T, Sanderson M, Scholer F, Si L, Thom J, Thomas P, Trotman A, Turpin A, de Vries AP, Webber W, Zhang X, Zhang Y (2012) Frontiers, challenges, and opportunities for information retrieval – report from SWIRL 2012, the second strategic workshop on information retrieval in Lorne, February 2012. SIGIR Forum 46(1):2–32

    Article  Google Scholar 

  • Armstrong TG, Moffat A, Webber W, Zobel J (2009) EvaluatIR: an online tool for evaluating and comparing IR systems. In: Allan J, Aslam JA, Sanderson M, Zhai C, Zobel J (eds) Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2009). ACM Press, New York, p 833

    Google Scholar 

  • Borgman CL (2015) Big data, little data, no data. MIT Press, Cambridge

    Book  Google Scholar 

  • Bowers S (2012) Scientific workflow, provenance, and data modeling challenges and approaches. J Data Semant 1(1):19–30. https://doi.org/10.1007/s13740-012-0004-y

    Article  Google Scholar 

  • Buneman P, Khanna S, Tan WC (2000) Data provenance: some basic issues. In: Kapoor S, Prasad S (eds) Foundations of software technology and theoretical computer science, 20th conference, FST TCS 2000 New Delhi, India, December 13–15, 2000, Proceedings. Lecture notes in computer science, vol 1974. Springer, Berlin, pp 87–93. https://doi.org/10.1007/3-540-44450-5_6

    Chapter  Google Scholar 

  • Candela L, Castelli D, Ferro N, Ioannidis Y, Koutrika G, Meghini C, Pagano P, Ross S, Soergel D, Agosti M, Dobreva M, Katifori V, Schuldt H (2007) The DELOS digital library reference model. Foundations for digital libraries. ISTI-CNR at Gruppo ALI, Pisa, Italy. https://tinyurl.com/y7fxsz2d

  • Cleverdon CW (1997) The cranfield tests on index languages devices. In: Spärck Jones K, Willett P (eds) Readings in information retrieval. Morgan Kaufmann Publisher, San Francisco, pp 47–60

    Google Scholar 

  • Davidson SB, Buneman P, Deutch D, Milo T, Silvello G (2017) Data citation: a computational challenge. In: Sallinger E, den Bussche JV, Geerts F (eds) Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS 2017, Chicago, IL, USA, May 14–19, 2017. ACM, New York, pp 1–4. https://doi.org/10.1145/3034786.3056123

    Google Scholar 

  • Di Nunzio GM, Ferro N (2005) DIRECT: a distributed tool for information retrieval evaluation campaigns. In: Ioannidis Y, Schek HJ, Weikum G (eds) Proceedings of the 8th DELOS thematic workshop on future digital library management systems: system architecture and information access, pp 58–63

    Google Scholar 

  • Ferro N (2017) Reproducibility challenges in information retrieval evaluation. ACM J Data Inf Qual 8(2):8:1–8:4. https://doi.org/10.1145/3020206

    Article  Google Scholar 

  • Ferro N, Hanbury A, Müller H, Santucci G (2011) Harnessing the scientific data produced by the experimental evaluation of search engines and information access systems. Proc Comput Sci 4:740–749

    Article  Google Scholar 

  • Forner P, Bentivogli L, Braschler M, Choukri K, Ferro N, Hanbury A, Karlgren J, Müller H (2013) PROMISE technology transfer day: spreading the word on information access evaluation at an industrial event. SIGIR Forum 47(1):53–58

    Article  Google Scholar 

  • Gollub T, Stein B, Burrows S, Hoppe D (2012) TIRA: configuring, executing, and disseminating information retrieval experiments. In: Hameurlain A, Tjoa AM, Wagner RR (eds) 23rd international workshop on database and expert systems applications, DEXA 2012, Vienna, Austria, September 3–7, 2012. IEEE Computer Society, Washington, pp 151–155

    Google Scholar 

  • Gray AJG, Groth P, Loizou A, Askjaer S, Brenninkmeijer CYA, Burger K, Chichester C, Evelo CTA, Goble CA, Harland L, Pettifer S, Thompson M, Waagmeester A, Williams AJ (2014) Applying linked data approaches to pharmacology. Architectural decisions and implementation. Seman Web 5(2):101–113

    Google Scholar 

  • Harman DK (ed) (1995) The fourth Text REtrieval Conference (TREC-4), National Institute of Standards and Technology (NIST), Special Publication 500–236, Washington, USA. http://trec.nist.gov/pubs/trec4/t4_proceedings.html

  • Harman DK (2011) Information retrieval evaluation. Morgan & Claypool Publishers, San Rafael

    Book  Google Scholar 

  • Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  • Ioannakis G, Koutsoudis A, Pratikakis I, Chamzas C (2018) RETRIEVAL – an online performance evaluation tool for information retrieval methods. IEEE Trans Multimedia 20(1):119–127. https://doi.org/10.1109/TMM.2017.2716193

    Article  Google Scholar 

  • Robertson SE (2008) On the history of evaluation in IR. J Inf Sci 34(4):439–456. https://doi.org/10.1177/0165551507086989

    Article  Google Scholar 

  • Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York

    MATH  Google Scholar 

  • Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375

    Article  Google Scholar 

  • Silvello G (2017) Theory and practice of data citation. J Assoc Inf Sci Technol 69:6

    Article  Google Scholar 

  • Silvello G, Bordea G, Ferro N, Buitelaar P, Bogers T (2017) Semantic representation and enrichment of information retrieval experimental data. Int J Digit Libr 18(2):145–172

    Article  Google Scholar 

  • Spärck Jones K, Bates RG (1977) Report on a design study for the ‘ideal’ information retrieval test collection. British Library Research and Development Report 5428, University Computer Laboratory, Cambridge

    Google Scholar 

  • Spärck Jones K, van Rijsbergen CJ (1975) Report on the need for and provision of an ‘ideal’ information retrieval test collection. British Library Research and Development Report 5266, University Computer Laboratory, Cambridge

    Google Scholar 

  • Voorhees EM (2002) The philosophy of information retrieval evaluation. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Evaluation of cross-language information retrieval systems: second workshop of the cross–language evaluation forum (CLEF 2001) revised papers. Lecture notes in computer science (LNCS), vol 2406. Springer, Heidelberg, pp 355–370

    Chapter  Google Scholar 

  • Voorhees EM (2007) TREC: continuing information retrieval’s tradition of experimentation. Commun ACM 50(11):51–54

    Article  Google Scholar 

  • Voorhees EM, Harman DK (2005) TREC: experiment and evaluation in information retrieval. The MIT Press, Cambridge

    Google Scholar 

  • W3C (2004) Resource description framework (RDF): concepts and abstract syntax – W3C recommendation 10 February 2004. https://www.w3.org/TR/rdf-concepts/

  • Zapilko B, Schaible J, Mayr P, Mathiak B (2013) TheSoz: a SKOS representation of the thesaurus for the social sciences. Seman Web 4(3):257–263. https://doi.org/10.3233/SW-2012-0081

    Google Scholar 

  • Zobel J, Webber W, Sanderson M, Moffat A (2011) Principles for robust evaluation infrastructure. In: Proceedings of the workshop on data infrastructures for supporting information retrieval evaluation (DESIRE 2011), pp 3–6

    Google Scholar 

Download references

Acknowledgements

The results we have presented have mostly originated in the context of the research activities of the Information Management System (IMS) research group of the Department of Information Engineering of the University of Padua, Italy, but they have benefitted from the collaboration and the support of many experts, in particular of Carol Peters of ISTI, CNR, Pisa, Italy, and of Donna Harman of NIST, USA, to whom our sincere thanks are given. The research activities have been supported by the financial support of different European projects, namely DELOS (FP6 NoE, 2004–2007, Contract n. G038-507618), TrebleCLEF (FP7 CA, 2008–2009, Contract n. 215231), and PROMISE (FP7 NoE, 2010–2013, Contract n. 258191).

We are most grateful to our referees for their very helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Maria Di Nunzio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Agosti, M., Di Nunzio, G.M., Ferro, N., Silvello, G. (2019). An Innovative Approach to Data Management and Curation of Experimental Data Generated Through IR Test Collections. In: Ferro, N., Peters, C. (eds) Information Retrieval Evaluation in a Changing World. The Information Retrieval Series, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-22948-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22948-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22947-4

  • Online ISBN: 978-3-030-22948-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics