Skip to main content

The Road Towards Reproducibility in Science: The Case of Data Citation

  • Conference paper
  • First Online:
Digital Libraries and Archives (IRCDL 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 733))

Included in the following conference series:

Abstract

Data citation has a profound impact on the reproducibility of science, a hot topic in many disciplines such as as astronomy, biology, physics, computer science and more. Lately, several authoritative journals have been requesting the sharing of data and the provision of validation methodologies for experiments (e.g., Nature Scientific Data and Nature Physics); these publications and the publishing industry in general see data citation as the means to provide new, reliable and usable means for sharing and referring to scientific data. In this paper, we present the state of the art of data citation and we discuss open issues and research directions with a specific focus on reproducibility. Furthermore, we investigate reproducibility issues by using experimental evaluation in Information Retrieval (IR) as a test case. (This paper is a revised and extended version of [33, 35, 57]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.acm.org/publications/policies/artifact-review-badging.

  2. 2.

    http://db-reproducibility.seas.harvard.edu/.

  3. 3.

    http://www.tira.io/.

  4. 4.

    Actually, this would be difficult to achieve.

  5. 5.

    http://direct.dei.unipd.it/.

  6. 6.

    http://lod-direct.dei.unipd.it/.

References

  1. Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, vol. 12. CODATA-ICSTI Task Group on Data Citation Standards and Practices, September 2013

    Google Scholar 

  2. Reproducibility and reliability of biomedical research: improving research practice. Technical report, The Academy of Medical Science (2015)

    Google Scholar 

  3. Freire, J., Fuhr, N., Rauber, A. (eds.): Report from Dagstuhl Seminar 16041: Reproducibility of Data-Oriented Experiments in e-Science. Dagstuhl Reports, vol. 6, no. 1. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Germany (2016)

    Google Scholar 

  4. Agosti, M., Di Buccio, E., Ferro, N., Masiero, I., Peruzzo, S., Silvello, G.: DIRECTions: design and specification of an IR evaluation infrastructure. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 88–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33247-0_11

    Chapter  Google Scholar 

  5. Agosti, M., Di Nunzio, G.M., Ferro, N.: The importance of scientific data curation for evaluation campaigns. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 157–166. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77088-6_15

    Chapter  Google Scholar 

  6. Agosti, M., Ferro, N.: Towards an evaluation infrastructure for DL performance evaluation. In: Tsakonas, G., Papatheodorou, C. (eds.) Evaluation of Digital Libraries: An Insight into Useful Applications and Methods, pp. 93–120. Chandos Publishing, Oxford (2009)

    Chapter  Google Scholar 

  7. Alonso, O., Mizzaro, S.: Using crowdsourcing for TREC relevance assessment. Inf. Process. Manage. 48(6), 1053–1066 (2012)

    Article  Google Scholar 

  8. Altman, M., Crosas, M.: The evolution of data citation: from principles to implementation. IAssist Q. 37(1–4), 62–70 (2013)

    Google Scholar 

  9. Altman, M., King, G.: A proposed standard for the scholarly citation of quantitative data. IASSIST (2006). http://www.iassistdata.org/conferences/archive/2006

  10. Amigó, E., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M.: Overview of RepLab 2012: evaluating online reputation management systems. In: Forner, P., Karlgren, J., Womser-Hacker, C., Ferro, N. (eds.) CLEF 2012 Working Notes. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613–0073 (2012). http://ceur-ws.org/Vol-1178/

  11. Angelini, M., Ferro, N., Larsen, B., Müller, H., Santucci, G., Silvello, G., Tsikrika, T.: Measuring and analyzing the scholarly impact of experimental evaluation initiatives. Procedia Comput. Sci. 38, 133–137 (2014)

    Article  Google Scholar 

  12. Arguello, J., Crane, M., Diaz, F., Lin, J., Trotman, A.: Report on the SIGIR 2015 workshop on reproducibility, inexplicability, and generalizability of results (RIGOR). SIGIR Forum 49(2), 107–116 (2015)

    Article  Google Scholar 

  13. Armstrong, T.G., Moffat, A., Webber, W., Zobel, J.: EvaluatIR: an online tool for evaluating and comparing IR systems. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) Proceedings of 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), USA, p. 833. ACM, New York (2009)

    Google Scholar 

  14. Badan, A., Benvegnù, L., Biasetton, M., Bonato, G., Brighente, A., Cenzato, A., Ceron, P., Cogato, G., Marchesin, S., Minetto, A., Pellegrina, L., Purpura, A., Simionato, R., Soleti, N., Tessarotto, M., Tonon, A., Vendramin, F., Ferro, N.: Towards open-source shared implementations of keyword-based access systems to relational data. In: Ferro, N., Guerra, F., Ives, Z., Silvello, G., Theobald, M. (eds.) Proceedings of 1st International Workshop on Keyword-Based Access and Ranking at Scale (KARS 2017) - Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference (EDBT/ICDT 2017). CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613–0073 (2017). http://ceur-ws.org/Vol-1810/

  15. Badan, A., BenvegnĂą, L., Biasetton, M., Bonato, G., Brighente, A., Marchesin, S., Minetto, A., Pellegrina, L., Purpura, A., Simionato, R., Soleti, N., Tessarotto, M., Tonon, A., Ferro, N.: Keyword-based access to relational data: to reproduce, or to not reproduce? In: Greco et al. [39]

    Google Scholar 

  16. Baggerly, K.: Disclose all data in publications. Nature 467, 401 (2010)

    Article  Google Scholar 

  17. Bardi, A., Manghi, P.: A framework supporting the shift from traditional digital publications to enhanced publications. D-Lib Magaz. 21(1/2) (2015). http://dx.doi.org/10.1045/january2015-bardi

  18. Bloom, T., Ganly, E., Winker, M.: Data access for the open access literature: PLOS’s data policy. PLoS Biol. 12(2), e1001797 (2014)

    Article  Google Scholar 

  19. Borgman, C.L.: The conundrum of sharing research data. JASIST 63(6), 1059–1078 (2012). http://dx.doi.org/10.1002/asi.22634

  20. Borgman, C.L.: Why are the attribution and citation of scientific data important? In: Board on Research Data and Information, Policy and Global Affairs Division, National Academy of Sciences (eds.) Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop, pp. 1–8. National Academies Press, Washington DC (2012)

    Google Scholar 

  21. Borgman, C.L.: Big Data, Little Data, No Data. MIT Press, Cambridge (2015)

    Google Scholar 

  22. Buneman, P., Davidson, S.B., Frew, J.: Why data citation is a computational problem. Commun. ACM (CACM) 59(9), 50–57 (2016)

    Article  Google Scholar 

  23. Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. ACM Trans. Database Syst. (TODS) 29(1), 2–42 (2004)

    Article  Google Scholar 

  24. Buneman, P., Silvello, G.: A rule-based citation system for structured and evolving datasets. IEEE Data Eng. Bull. 33(3), 33–41 (2010). http://sites.computer.org/debull/A10sept/buneman.pdf

  25. Burton, A., Koers, H., Manghi, P., La Bruzzo, S., Aryani, A., Diepenbroek, M., Schindler, U.: On bridging data centers and publishers: the data-literature interlinking service. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 324–335. Springer, Cham (2015). doi:10.1007/978-3-319-24129-6_28

    Chapter  Google Scholar 

  26. Candela, L., Castelli, D., Manghi, P., Tani, A.: Data journals: a survey. J. Assoc. Inf. Sci. Technol. 66(9), 1747–1762 (2015). http://dx.doi.org/10.1002/asi.23358

  27. Carr, D., Littler, K.: Sharing research data to improve public health: a funder perspective. J. Empir. Res. Hum. Res. Ethics 10(3), 314–316 (2015)

    Article  Google Scholar 

  28. Davidson, S.B., Deutsch, D., Milo, T., Silvello, G.: A model for fine-grained data citation. In: Greco et al. [39]

    Google Scholar 

  29. Davidson, S.B., Deutsch, D., Tova, M., Silvello, G.: A model for fine-grained data citation. In: 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017) (2017)

    Google Scholar 

  30. Davidson, S.B., Buneman, P., Deutch, D., Milo, T., Silvello, G.: Data citation: a computational challenge. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2017), USA, pp. 1–4 (2017). http://doi.acm.org/10.1145/3034786.3056123

  31. De Roure, D.: The future of scholarly communications. Insights 27(3), 233–238 (2014)

    Google Scholar 

  32. Dussin, M., Ferro, N.: Managing the knowledge creation process of large-scale evaluation campaigns. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 63–74. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04346-8_8

    Chapter  Google Scholar 

  33. Ferro, N.: Reproducibility challenges in information retrieval evaluation. ACM J. Data Inf. Qual. (JDIQ) 8(2), 8:1–8:4 (2017)

    Google Scholar 

  34. Ferro, N., et al. (eds.): ECIR 2016. LNCS, vol. 9626. Springer, Cham (2016)

    Google Scholar 

  35. Ferro, N., Fuhr, N., Järvelin, K., Kando, N., Lippold, M., Zobel, J.: Increasing reproducibility in IR: findings from the dagstuhl seminar on “reproducibility of data-oriented experiments in e-science”. SIGIR Forum 50(1), 68–82 (2016)

    Article  Google Scholar 

  36. Ferro, N., Silvello, G.: Rank-biased precision reloaded: reproducibility and generalization. In: Hanbury et al. [41], pp. 768–780

    Google Scholar 

  37. FORCE-11: Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. FORCE11, San Diego, CA, USA (2014)

    Google Scholar 

  38. Freire, J., Bonnet, P., Shasha, D.: Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 593–596 (2012). http://doi.acm.org/10.1145/2213836.2213908

  39. Greco, S., SaccĂ , D., Flesca, S., Masciari, E. (eds.): Proceedings of 25th Italian Symposium on Advanced Database Systems (SEBD 2017) (2017)

    Google Scholar 

  40. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)

    Article  Google Scholar 

  41. Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.): ECIR 2015. LNCS, vol. 9022. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3

    Google Scholar 

  42. Hanbury, A., MĂĽller, H., Balog, K., Brodt, T., Cormack, G.V., Eggel, I., Gollub, T., Hopfgartner, F., Kalpathy-Cramer, J., Kando, N., Krithara, A., Lin, J., Mercer, S., Potthast, M.: Evaluation-as-a-service: overview and outlook. CoRR abs/1512.07454, December 2015

    Google Scholar 

  43. Harman, D.K.: Information Retrieval Evaluation. Morgan & Claypool Publishers, San Rafael (2011)

    Google Scholar 

  44. Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)

    Google Scholar 

  45. Huang, Y.H., Rose, P.W., Hsu, C.N.: Citing a data repository: a case study of the protein data bank. PLoS ONE 10(8), e0136631 (2015)

    Article  Google Scholar 

  46. Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.): CLEF 2014. LNCS, vol. 8685. Springer, Cham (2014). doi:10.1007/978-3-319-11382-1

    Google Scholar 

  47. Klump, J., Huber, R., Diepenbroek, M.: DOI for geoscience data - how early practices shape present perceptions. Earth Sci. Inform. 1–14 (2015). http://dx.doi.org/10.1007/s12145-015-0231-5

  48. Lipani, A., Piroi, F., Andersson, L., Hanbury, A.: An Information Retrieval Ontology for Information Retrieval Nanopublications. In: Kanoulas et al. [46], pp. 44–49

    Google Scholar 

  49. Papavasileiou, V., Flouris, G., Fundulaki, I., Kotzinos, D., Christophides, V.: High-level change detection in RDF(S) KBs. ACM Trans. Database Syst. 38(1), 1 (2013)

    Article  MathSciNet  Google Scholar 

  50. Potthast, M., Gollub, T., Rangel Pardo, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas et al. [46], pp. 268–299

    Google Scholar 

  51. Pröll, S., Rauber, A.: Scalable data citation in dynamic, large databases: model and reference implementation. In: Hu, X., Young, T.L., Raghavan, V., Wah, B.W., Baeza-Yates, R., Fox, G., Shahabi, C., Smith, M., Yang, Q., Ghani, R., Fan, W., Lempel, R., Nambiar, R. (eds.) Proceedings of the 2013 IEEE International Conference on Big Data, pp. 307–312. IEEE (2013)

    Google Scholar 

  52. Pröll, S., Rauber, A.: Asking the right questions - query-based data citation to precisely identify subsets of data. ERCIM News 100 (2015)

    Google Scholar 

  53. Robinson-Garcia, N., Jiménez-Contreras, E., Torres-Salinas, D.: Analyzing data citation practices according to the data citation index. J. Am. Soc. Inf. Sci. Technol. (JASIST) 67, 2964–2975 (2015)

    Article  Google Scholar 

  54. Silvello, G.: A methodology for citing linked open data subsets. D-Lib Magaz. 21(1/2) (2015). http://dx.doi.org/10.1045/january2015-silvello

  55. Silvello, G.: Learning to cite framework: how to automatically construct citations for hierarchical data. J. Am. Soc. Inf. Sci. Technol. (JASIST), 1–28 (2017)

    Google Scholar 

  56. Silvello, G., Bordea, G., Ferro, N., Buitelaar, P., Bogers, T.: Semantic representation and enrichment of information retrieval experimental data. Int. J. Digit. Libr. (IJDL) 18(2), 145–172 (2017)

    Article  Google Scholar 

  57. Silvello, G., Ferro, N.: Data citation is coming. Introduction to the special issue on data citation. Bullet. IEEE Tech. Committee Digit. Libr. (IEEE-TCDL) 12(1), 1–5 (2016)

    Google Scholar 

  58. Simons, N.: Implementing DOIs for research data. D-Lib Magaz. 18(5/6) (2012). http://dx.doi.org/10.1045/may2012-simons

  59. Vernooy-Gerritsen, M.: Enhanced Publications: Linking Publications and Research Data in Digital Repositories. Amsterdam University Press, Amsterdam (2009)

    Google Scholar 

  60. Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manage. 36(5), 697–716 (2000)

    Article  Google Scholar 

  61. Voorhees, E.M., Rajput, S., Soboroff, I.: Promoting repeatability through open runs. In: Yilmaz, E., Clarke, C.L.A. (eds.) Proceedings of 7th International Workshop on Evaluating Information Access (EVIA 2016), pp. 17–20. National Institute of Informatics, Tokyo, Japan (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianmaria Silvello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ferro, N., Silvello, G. (2017). The Road Towards Reproducibility in Science: The Case of Data Citation. In: Grana, C., Baraldi, L. (eds) Digital Libraries and Archives. IRCDL 2017. Communications in Computer and Information Science, vol 733. Springer, Cham. https://doi.org/10.1007/978-3-319-68130-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68130-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68129-0

  • Online ISBN: 978-3-319-68130-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics