Skip to main content

Clouds and Reproducibility: A Way to Go to Scientific Experiments?

  • Chapter
  • First Online:
Cloud Computing

Abstract

Scientific research is supported by computing techniques and tools that allow for gathering, management, analysis, visualization, sharing, and reproduction of scientific data and its experiments. The simulations performed in this type of research are called in silico experiments, and they are commonly composed of several applications that execute traditional algorithms and methods. Reproducibility plays a key role and gives the ability to make changes in the data and test environment of a scientific experiment to evaluate the robustness of the proposed scientific method. By verifying and validating generated results of these experiments, there is an increase in productivity and quality of scientific data analysis processes resulting in the improvement of science development and production of complex data in various scientific domains. There are many challenges to enable experimental reproducibility in in silico experiments. Many of these challenges are related to guaranteeing that simulation programs and data are still available when scientists need to reproduce an experiment. Clouds can play a key role by offering the infrastructure for long-term preserving programs and data. The goal of this chapter is to characterize terms and requirements related to scientific reproducibility and show how clouds can aid the development and selection of reproducibility approaches in science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.aps.org/policy/statements/99_6.cfm

  2. 2.

    http://icerm.brown.edu/home/index.php

  3. 3.

    https://aws.amazon.com/s3

  4. 4.

    https://apps.google.com/products/drive/

  5. 5.

    https://www.dropbox.com/

  6. 6.

    https://onedrive.live.com/

  7. 7.

    https://github.com/

  8. 8.

    https://bitbucket.org/

  9. 9.

    http://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html

  10. 10.

    http://linux.die.net/man/2/ptrace

  11. 11.

    http://www.linuxjournal.com/article/6100

  12. 12.

    https://www.python.org/

  13. 13.

    http://www.mongodb.org/

  14. 14.

    https://www.eucalyptus.com/eucalyptus-cloud/iaas

References

  1. Armbrust M, Armando F, Rean G et al (2010) A view of cloud computing. Commun ACM 53(4):50–58

    Article  Google Scholar 

  2. Baggerly KA, Berry DA (2012) Reproducible research, Amstatnews: The Membership Magazine of the American Statistical Association

    Google Scholar 

  3. Barga R, Gannon D (2006) Scientific versus business workflows. In: Workflows for e-Science: scientific workflows for grids. Springer, pp 09–16

    Google Scholar 

  4. Belhajjame K, Roure DD (2012) Goble CA research object management: opportunities and challenges. In: Proceedings of the 2012 ACM conference on computer supported cooperative work – CSCW’2012. ACM, New York

    Google Scholar 

  5. Berriman GB, Groom SL (2013) (2011) How will astronomy archieves survive the data tsunami? ACM Queue 9:1–8

    Google Scholar 

  6. Brammer GR, Crosby RW, Matthews SJ et al (2011) Paper MĂ¢chĂ©: creating dynamic reproducible science. Proc Comput Sci 4:658–667

    Article  Google Scholar 

  7. Cao B, Plale B, Subramanian G, Robertson Ed, Simmhan YL (2009) Provenance information model of Karma version 3. SERVICES I 2009:348–351

    Google Scholar 

  8. Chirigati F, Shasha D, Freire J (2013) Packing experiments for sharing and publication. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data – SIGMOD ’13, pp 977–980

    Google Scholar 

  9. Cooper MH (2010) Charting a course for software licensing and distribution. SIGUCCS 2010:153–156

    Google Scholar 

  10. da Cruz SMS, Barros PM, Bisch PM, Machado Campos ML, Mattoso M (2008) Provenance services for distributed workflows. CCGRID 2008:526–533

    Google Scholar 

  11. Davidson SB, Freire J (2008) Provenance and scientific workfows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data – SIGMOD ’08. pp 1345–1350

    Google Scholar 

  12. Deelman E, Berriman B, Chervenak A et al (2010) Metadata and provenance management. In: Shoshani A, Rotem D (eds) Scientific data management: challenges, technology and deployment. Chapman & Hall/CRC, BocaRaton

    Google Scholar 

  13. Deelman E, Singh G, Livny M, et al (2008) The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08, pp 1–12

    Google Scholar 

  14. de Oliveira D, Ocaña KACS, BaiĂ£o FA, Mattoso M (2012) A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J Grid Comput 10(3): 521–552

    Article  Google Scholar 

  15. Donoho DL (2010) An invitation to reproducible computational research. Biostatistics 3:376–388

    Google Scholar 

  16. Donoho D, Maleki A, Rahman NI et al (2009) Reproducible research in computational harmonic analysis. Comput Sci Eng 11:8–18

    Article  Google Scholar 

  17. Dudley JT, Butte AJ (2010) In silico research in the era of cloud computing. Nat Biotechnol 28:1181–185

    Google Scholar 

  18. Firtina C, Alkan C (2016) On genomic repeats and reproducibility. Bioinformatics 32(15):2243–2247

    Article  Google Scholar 

  19. Freire J, Bonnet P, Shasha D (2012) Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data – SIGMOD’12. ACM, New York, pp 593–596

    Chapter  Google Scholar 

  20. Freire J, Fuhr N, Rauber A (2016) Reproducibility of data-oriented experiments in e-Science (Dagstuhl Seminar 16041). Dagstuhl Rep 6(1):108–159

    Google Scholar 

  21. Gavish M, Donoho D (2011) A universal identifier for computational results. In: International conference on computational science, vol 4, pp 637–647

    Google Scholar 

  22. Gillam L, Antonopoulos N (2010) Cloud computing: principles, systems and applications. Springer, London

    MATH  Google Scholar 

  23. Goble C (2012) The reality of reproducibility in computational science: reproduce? repeat? rerun? and does it matter. Keynotes and panels. In: 8th IEEE international conference on e-Science, vol 327, pp 415–416

    Google Scholar 

  24. Gray J (2009) Jim Gray on eScience: a transformed scientific method. In: Hey T, Tansley S, Tolle K (ed) The fourth paradigm data-intensive scientific discovery. Microsoft Research, Redmond

    Google Scholar 

  25. Goble CA (2013) Results may vary: reproducibility, open science and all that Jazz. LISC@ISWC 2013:1

    Google Scholar 

  26. Greenberg J (2002) Metadata and the world wide web. Encycl Libr Inf Sci 72:244–261

    Google Scholar 

  27. Guo P (2012) CDE: a tool for creating portable experimental software packages. Comput Sci Eng 14:32–35

    Article  Google Scholar 

  28. Guo PJ, Engler D (2011) CDE: using system call interposition to automatically create portable software packages. In: Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC’11, pp 21–21

    Google Scholar 

  29. Guo PJ, Seltzer M (2012) BURRITO: wrapping your lab notebook in computational infrastructure. In: Proceedings of 4th USENIX workshop on the theory and practice of provenance (TaPP’12)

    Google Scholar 

  30. Hanson B, Sugden A, Alberts B (2011) Making data maximally available. Science 331:649

    Article  Google Scholar 

  31. Hiden H, Woodman S, Watson P, Cala J (2013) Developing cloud applications using the e-science central platform. R Soc Lond Philos Trans A Math Phys Eng Sci

    Google Scholar 

  32. Hinsen K (2011) A data and code model for reproducible research and executable. Proc Comput Sci 4:579–588

    Article  Google Scholar 

  33. Howe B (2012) Virtual appliances, cloud computing, and reproducible research. Comput Sci Eng 14:36–41

    Article  Google Scholar 

  34. Juve G et al (2013) Comparing futuregrid, Amazon EC2, and open science grid for scientific workflows. Comput Sci Eng 15:20–29

    Article  Google Scholar 

  35. Karpathiotakis M, Branco M, Alagiannis I, Ailamaki (2014) A adaptive query processing on RAW data. Proc VLDB Endow 7:1119–1130

    Google Scholar 

  36. Klinginsmith J, Mahoui M, Wu YM (2011) Towards reproducible escience in the cloud. In: IEEE third international conference on cloud computing technology and science (CloudCom). pp 582–586

    Google Scholar 

  37. Koop D, Santos E, Mates P et al. (2011) Provenance-based infrastructure to support the life cycle of executable papers. Procedia Computer Science 4:648–657

    Article  Google Scholar 

  38. Krishnamurthi S, Vitek J (2015) The real software crisis: repeatability as a core value. Communications da ACM 58:34–36

    Article  Google Scholar 

  39. Macko P, Chiarini M, Seltzer M (2011) Collecting provenance via the Xen hypervisor. In: Proceedings of 3rd USENIX workshop on the theory and practice of provenance (TaPP ’11), pp 1–15

    Google Scholar 

  40. Marinho A, Murta L, Werner C, Braganholo V, da Cruz SMS, Ogasawara ES, Mattoso M (2012) ProvManager: a provenance management system for scientific workflows. Concurr Comput Pract Exp 24(13):1513–1530

    Article  Google Scholar 

  41. Mcnutt M (2014) Journals unite for reproducibility. Science 346:679

    Article  Google Scholar 

  42. Missier P, Woodman S et al (2013) Provenance and data differencing for workflow reproducibility analysis. Concurr Comput Pract Exp 28:995–1015

    Article  Google Scholar 

  43. Moreau L, Groth P (2013) Provenance: an introduction to PROV. Synthesis lectures on the semantic web: theory and technology. Morgan & Claypool, San Rafael

    Google Scholar 

  44. Nowakowski P, Ciepiela E, Harezlak D et al (2011) The collage authoring environment. In: Executable paper grand challenge international conference on computational science, ICCS 2011, vol 4, pp 608–617

    Google Scholar 

  45. Oliveira D, Ogasawara E, BaiĂ£o F, Mattoso M (2010) SciCumulus: a lightweigh cloud middleware to explore many task computing paradigm in scientific workflows. In: IEEE 3rd international conference on cloud computing

    Google Scholar 

  46. Paskin N (2010) Digital Object Identifier (DOI) system. In: Bates MJ, Maack MN (eds) Encyclopedia of library and information sciences, 3rd edn, chap. 157 Taylor & Francis, pp 1586–1592

    Google Scholar 

  47. Peng R (2009) Reproducible research and biostatistic. Biostatistics 3:405–408

    Article  Google Scholar 

  48. Pieter Van Gorp SM (2011) SHARE: a web portal for creating and sharing executable research papers. Int Conf Comput Sci 4:1–9

    Google Scholar 

  49. Schwab M, Karrenbach M, Claerbout J (2000) Making scientific computations reproducible. Comput Sci Eng 2:61–67

    Article  Google Scholar 

  50. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-Science. SIGMOD Rec 34:31–36

    Article  Google Scholar 

  51. Simmhan Y, Ramakrishnan L, Antoniu G, Goble CA (2016) Cloud computing for data-driven science and engineering. Concur Comput Pract Exp 28(4):947–949

    Article  Google Scholar 

  52. Stodden V (2009) The legal framework for reproducible scientific research: licensing and copyright. Comput Sci Eng 11:35–40

    Article  Google Scholar 

  53. Stodden V, Bailey DH, Borwein J et al (2013) Setting the default to reproducible: reproducibility in computational and experimental mathematics. Technical report, ICERM workshop reproducibility in computational and experimental mathematics

    Google Scholar 

  54. Strijkers R, Cushin R, Vasyunin D (2011) Toward executable scientific publications. Proc Comput Sci 4:707–715

    Article  Google Scholar 

  55. Szalay AS, Blakeley JA (2009) Gray’s laws: database-centric computing in science. In: Hey T, Tansley S, Tolle KM (ed) The fourth paradigm. Microsoft research, Redmond, pp 5–11

    Google Scholar 

  56. Taylor I, Deelman E, Gannon DB et al (2006) Workfows for e-Science: scientific workfows for grids. Springer, New York/Secaucus

    Google Scholar 

  57. Vitek J, Kalibera T (2012) R3: repeatability, reproducibility and rigor. SIGPLAN 47:30–36

    Article  Google Scholar 

  58. Yogesh L. Simmhan, Beth Plale, Gannon D (2008) Karma2: provenance management for data-driven workflows. Int J Web Serv Res 5(2):1–22

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by Brazilian agencies CAPES, FAPERJ, and CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ary H. M. de Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

de Oliveira, A.H.M., de Oliveira, D., Mattoso, M. (2017). Clouds and Reproducibility: A Way to Go to Scientific Experiments?. In: Antonopoulos, N., Gillam, L. (eds) Cloud Computing. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-54645-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54645-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54644-5

  • Online ISBN: 978-3-319-54645-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics