Skip to main content

Reasoning About Discovery Clouds

  • Conference paper
  • First Online:
  • 644 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9698))

Abstract

A discovery cloud is a set of automated, cloud-hosted services to which individuals may outsource their routine and not-so-routine research tasks: finding relevant data, inferring links between data, running computational experiments, inferring new knowledge claims, evaluating the credibility of knowledge claims produced by others, designing experiments, and so on. If developed successfully, a discovery cloud can accelerate and democratize access to data and knowledge tools and the collaborative construction of new knowledge. Such systems are also fascinating to consider from a reasoning perspective because they integrate great complexity at multiple levels: the underlying cloud-based hardware and software, for which issues of reliability and responsiveness may be paramount; the knowledge bases and inference engines that sit on that cloud substrate, for which issues of correctness may be less well defined; and the human communities that form around the discovery clouds, and that arguably form as much as part of the cloud as the hardware, software, and data. I raise questions here about what it might mean to reason about such systems. I do not provide any answers.

This is a preview of subscription content, log in via an institution.

References

  1. Whitehead, A.N.: Introduction to Mathematics. Williams and Norgate, London (1911)

    MATH  Google Scholar 

  2. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)

    Article  Google Scholar 

  3. Quoc, V.L.: Building high-level features using large scale unsupervised learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598. IEEE (2013)

    Google Scholar 

  4. Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  5. Daniel, D.L., Lipson, H.: Learning symbolic representations of hybrid dynamical systems. J. Mach. Learn. Res. 13(1), 3585–3618 (2012)

    MathSciNet  MATH  Google Scholar 

  6. Honavar, V.G., Hill, M.D., Yelick, K.: Accelerating science: a computing research agenda. A white paper prepared for the Computing Community Consortium committee of the Computing Research Association (2016). http://cra.org/ccc/resources/ccc-led-whitepapers/

  7. Djorgovski, S.G.: Virtual astronomy, information technology, and the new scientific methodology. In: 7th International Workshop on Computer Architecture for Machine Perception, pp. 125–132. IEEE (2005)

    Google Scholar 

  8. Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Joubert, G., Grandinetti, L. (eds.) Big Data and High Performance Computing (in press, 2015)

    Google Scholar 

  9. Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., vandenBerg, J.: The SDSS SkyServer - public access to the sloan digital sky server data. In: ACM SIGMOD, pp. 1–11 (2002)

    Google Scholar 

  10. Overbeek, R.A., Disz, T., Stevens, R.L.: The SEED: a peer-to-peer environment for genome annotation. Commun. ACM 47(11), 46–51 (2004)

    Article  Google Scholar 

  11. Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(D1), D206–D214 (2014)

    Article  Google Scholar 

  12. Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)

    Article  Google Scholar 

  13. Szalay, A.S.: From simulations to interactive numerical laboratories. In: 2014 Winter Simulation Conference, pp. 875–886. IEEE Press (2014)

    Google Scholar 

  14. O’Mullane, W., Li, N., Nieto-Santisteban, M., Szalay, A., Thakar, A., Gray, J.: Batch is back: CasJobs, serving multi-TB data on the Web. In: IEEE International Conference on Web Services, pp. 33–40. IEEE (2005)

    Google Scholar 

  15. Chong, F., Carraro, G.: Architecture strategies for catching the long tail. MSDN Library, Microsoft Corporation, pp. 9–10 (2006)

    Google Scholar 

  16. Dubey, A., Wagle, D.: Delivering software as a service. The McKinsey Quarterly, May 2007

    Google Scholar 

  17. Foster, I., Vasiliadis, V., Tuecke, S.: Software as a service as a path to software sustainability. Technical report (2013). doi:10.6084/m9.figshare.791604

  18. Lawton, G.: Developing software online with platform-as-a-service technology. Computer 41(6), 13–15 (2008)

    Article  Google Scholar 

  19. Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011)

    Article  Google Scholar 

  20. Madhavan, K.P.C., Beaun, D., Shivarajapura, S., Adams, G.B., Klimeck, G.: nanoHUB.org serving over 120,000 users worldwide: its first cyber-environment assessment. In: 10th IEEE Conference on Nanotechnology (IEEE-NANO), pp. 90–95. IEEE (2010)

    Google Scholar 

  21. Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., et al.: The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)

    Google Scholar 

  22. Foster, I.: Service-oriented science. Science 308(5723), 814–817 (2005)

    Article  Google Scholar 

  23. Foster, I., Chard, K., Tuecke, S.: The discovery cloud: accelerating and democratizing research on a global scale. In: International Conference on Cloud Engineering (2016)

    Google Scholar 

  24. Chard, K., Tuecke, S., Foster, I.: Efficient and secure transfer, synchronization, and sharing of big data. IEEE Cloud Comput. 1(3), 46–55 (2014)

    Article  Google Scholar 

  25. Ananthakrishnan, R., Chard, K., Foster, I., Tuecke, S.: Globus platform-as-a-service for collaborative science applications. Concurrency Comput.: Pract. Exp. 27(2), 290–305 (2015)

    Article  Google Scholar 

  26. Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Rzhetsky, A., Foster, J.G., Foster, I.T., Evans, J.A.: Choosing experiments to accelerate collective discovery. Proc. Natl. Acad. Sci. 112(47), 14569–14574 (2015)

    Article  Google Scholar 

  28. Mead, C.: Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–1636 (1990)

    Article  Google Scholar 

  29. Goecks, J., Nekrutenko, A., Taylor, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)

    Article  Google Scholar 

  30. Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Karan, V., Berriman, G.B., Good, J., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  31. Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 11, 50–60 (2009)

    Article  Google Scholar 

  32. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34(suppl 2), W729–W732 (2006)

    Article  Google Scholar 

  33. Van der Aalst, W.M.P.: The application of Petri nets to workflow management. J. Circuits, Syst. Comput. 8(01), 21–66 (1998)

    Article  Google Scholar 

  34. Simonet, A., Fedak, G., Ripeanu, M.: Active data: a programming model to manage data life cycle across heterogeneous systems and infrastructures. Future Gener. Comput. Syst. 53, 25–42 (2015)

    Article  Google Scholar 

  35. Simonet, A., Chard, K., Fedak, G., Foster, I.: Using active data to provide smart data surveillance to e-science users. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 269–273. IEEE (2015)

    Google Scholar 

Download references

Acknowledgements

I am grateful to the organizers of Petri Nets 2016 for the opportunity to contribute this article to the proceedings. This work is supported in part by the US Department of Energy contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Foster .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Foster, I. (2016). Reasoning About Discovery Clouds. In: Kordon, F., Moldt, D. (eds) Application and Theory of Petri Nets and Concurrency. PETRI NETS 2016. Lecture Notes in Computer Science(), vol 9698. Springer, Cham. https://doi.org/10.1007/978-3-319-39086-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39086-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39085-7

  • Online ISBN: 978-3-319-39086-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics