Skip to main content

Challenges in Data Intensive Analysis at Scientific Experimental User Facilities

  • Chapter
  • First Online:
Handbook of Data Intensive Computing

Abstract

Today’s scientific challenges such as routes to a sustainable energy future, materials by design or biological and chemical environmental remediation methods, are complex problems that require the integration of a wide range of complementary expertise to be addressed successfully. Experimental and computational science research methods can hereby offer fundamental insights for their solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://public.web.cern.ch/public/en/lhc/lhc-en.html.

  2. 2.

    www.xfel.eu.

  3. 3.

    The HDF Group produces and maintains software for self-describing scientific data via the Hierarchical Data Format. http://www.hdfgroup.org/.

  4. 4.

    The Network Common Data Form (NetCDF) self-describing data format developed by the University Corporation for Atmospheric Research (UCAR). http://en.wikipedia.org/wiki/Netcdf.

  5. 5.

    http://en.wikipedia.org/wiki/MD5.

  6. 6.

    http://en.wikipedia.org/wiki/Secure_Hash_Algorithm.

  7. 7.

    http://en.wikipedia.org/wiki/Mathematics_of_CRC.

  8. 8.

    www.earthsystemgrid.org.

  9. 9.

    http://www.iplantcollaborative.org/.

  10. 10.

    http://genomicscience.energy.gov/compbio/#page=news.

  11. 11.

    http://en.wikipedia.org/wiki/Digital_object_identifier.

  12. 12.

    http://www.doi.org/index.html.

  13. 13.

    SLAC: http://slac.stanford.edu/.

  14. 14.

    SNS: http://neutrons.ornl.gov/facilities/SNS/.

  15. 15.

    PaN-data: http://www.pan-data.eu/PaN-data_Europe.

  16. 16.

    PaN-data Data Policy: http://www.pan-data.eu/imagesGHD/0/08/PaN-data-D2--1.pdf.

  17. 17.

    http://www.icatproject.org/.

  18. 18.

    http://code.google.com/p/icatproject/wiki/IcatMain.

  19. 19.

    http://www.ukoln.ac.uk/projects/I2S2/.

  20. 20.

    http://www.emsl.pnl.gov/emslweb/.

  21. 21.

    http://dicomputing.pnnl.gov/demonstrations/medici/.

  22. 22.

    http://archer.edu.au/about/.

  23. 23.

    http://ecrystals.chem.soton.ac.uk/.

  24. 24.

    http://www.nexusformat.org/Main_Page.

  25. 25.

    http://www.genomeweb.com/informatics/mcw-insilicos-enable-open-source-proteomics-tools-data-analysis-cloud.

  26. 26.

    Mantid Project home page: http://www.mantidproject.org/Main_Page.

  27. 27.

    http://www.datacite.org/.

  28. 28.

    http://www.orcid.org/.

  29. 29.

    http://www.labtrove.org/.

  30. 30.

    http://www.myexperiment.org/.

  31. 31.

    http://omics.pnl.gov/software/CAT.php.

  32. 32.

    http://www.pnl.gov/science/research/chemicalimaging/.

  33. 33.

    https://orbiter.txcorp.com.

  34. 34.

    http://www.dcc.ac.uk/.

  35. 35.

    http://www.earthsystemgrid.org/.

  36. 36.

    http://www.birncommunity.org/.

References

  1. National Research Council. Visualizing Chemistry: The Progress and Promise of Advanced Chemical Imaging, The National Academies Press, Washington, DC, 2006.

    Google Scholar 

  2. Basic Energy Science Advisory Committee, Subcommittee on Facing Our Energy Challenges in a New Era of Science, “Next Generation Photon Sources for Grand Challenges in Science and Energy”, Technical Report, U.S. Department of Energy, May 2009.

    Google Scholar 

  3. F. Maia, P. van der Meulen, A. Ourmazd, I. Vartanyes, G. Bortel, K. Wrona, M. Altarelli, G. Huldt, D. Larsson, R. Abela, V. Elser, T. Ekeberg, K. Cameron, D. van der Spoel, H. Kono, F. Wang, P. Thibault, and A. Mancuso, “Data Analysis and its needs @ European Xfel”. Presentation SPB-Workshop 2008 Working Group 3. http://www.xfel.eu/events/workshops/2008/spb_workshop_2008/ (accessed May 6th 2011)

  4. C. Southan and G. Cameron, “Beyond the Tsunami: Developing the Infrastructure to Deal with Life Sciences data” In The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009, Microsoft Research.

    Google Scholar 

  5. C. Goble and D. De Roure, “The Impact of Workflow Tools on Data-centric Research” In The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009, Microsoft Research.

    Google Scholar 

  6. K. Alapaty, B. Allen, G. Bell, D. Benton, T. Brettin, S. Canon, R. Carlson, S. Cotter, S. Crivelli, E. Dart, V. Dattoria, N. Desai, R. Egan, J. Flick, K. Goodwin, S. Gregurick, S. Hicks, B. Johnston, B. de Jong, K. Kleese van Dam, M. Livny, V. Markowitz, J. McGraw, R. McCord, C. Oehmen, K. Regimbal, G. Shipman, G. Strand, B. Tierney, S. Turnbull, D. Williams, and J. Zurawski, “BER Science Network Requirements”, Report of the Biological and Environmental Research Network Requirements Workshop, April 29 and 30, 2010, Editors E. Dart and B. Tierney, LBNL report LBNL-4089E, October 2010.

    Google Scholar 

  7. B.F. Jones, S. Wuchty, and B. Uzzi, “Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science” in Science Express on 9 October 2008, Science 21 November 2008: Vol. 322. no. 5905, pp. 1259–1262

    Google Scholar 

  8. E. Yang, “Martin Dove’s RMC Workflow Diagram”, a supplementary requirement report, Work Package 1, November 2009 – June 2010, JISC I2S2 project, July 2010, available at: http://www.ukoln.ac.uk/projects/I2S2/documents/ISIS%20RMC%20workflow.pd

  9. E. Dart and B. Tierney, “BES Science Network Requirements – Report of the Basic Energy Sciences Network Requirements Workshop Conducted September 22 and 23, 2010”.

    Google Scholar 

  10. S.D. Miller, A Geist, K.W. Herwig, P.F. Peterson, M.A. Reuter, S. Ren, J.C. Bilheux, S.I. Campbell, J.A. Kohl, S.S. Vazhkudai, J.W. Cobb, V.E. Lynch, M. Chen, J.R. Trater, B.C. Smith, T. Swain, J. Huang, R. Mikkelson, D. Mikkelson, and M.L. Green, “The SNS/HFIR Web Portal System – How Can it Help Me?” 2010 J. Phys.: Conf. Ser. 251 012096. doi:10.1088/1742-6596/251/1/012096.

    Google Scholar 

  11. Federal Information Processing Standards Publication – FIPS PUB 199, “Standards for Security Categorization of Federal Information and Information Systems” February 2004.

    Google Scholar 

  12. Scientific Data Management (SDM) for Government Agencies: Report from the Workshop to Improve SDM. “Harnessing the Power of Digital Data: Taking the Next Step. June 29-July 1, 2010.

    Google Scholar 

  13. D. Flannery, B. Matthews, T. Griffin, J. Bicarregui, M. Gleave, L. Lerusse, S. Sufi, G. Drinkwater, and K. Kleese van Dam, “ICAT: Integrating data infrastructure for facilities based science”. Proc. 5th IEEE International Conference on e-Science (e-science 2009), Oxford, UK, 09–11 Dec 2009

    Google Scholar 

  14. S. Sufi, B. Matthews, and K. Kleese van Dam. (2003) An Interdisciplinary Model for the Representation of Scientific Studies and Associated Data Holdings. UK e-Science All Hands meeting, Nottingham, 02–04 Sep 2003

    Google Scholar 

  15. S. Sufi and B.M. Matthews. (2005) The CCLRC Scientific Metadata Model: a metadata model for the exploitation of scientific studies and associated data. In Contributions in Knowledge and Data Management in Grids, eds. Domenico Talia, Angelos Bilas, Marios Dikaiakos, CoreGRID 3, Springer-Verlag, 2005.

    Google Scholar 

  16. E. Yang, B. Matthews, and M. Wilson, “Enhancing the Core Scientific Metadata Model to Incorporate Derived Data,” eScience, IEEE International Conference on, pp. 145–152, 2010 IEEE Sixth International Conference on e-Science, 2010

    Google Scholar 

  17. B. Matthews, “Using a Core Scientific Metadata Model in Large-Scale Facilities”. Presentation at 5th International Digital Curation Conference (IDCC 2009), London, UK, 02–04 Dec 2009

    Google Scholar 

  18. I.M. Atkinson, D. du Boulay, C. Chee, K. Chiu, T. King, D.F. McMullen, R. Quilici, N.G.D. Sim, P. Turner, and M. Wyatt, “CIMA Based Remote Instrument and Data Access: An Extension into the Australian e-Science Environment.” Proceedings of IEEE International Conference on e-Science and Grid Computing (e-Science 2006) Amsterdam, The Netherlands, December 2006.

    Google Scholar 

  19. I. Gorton, A. Wynne, Y. Liu, and J. Yin, “Components in the Pipeline,” IEEE Software, vol. 28, no. 3, pp. 34–40, May/June 2011, doi:10.1109/MS.2011.23

    Google Scholar 

  20. D. Li, M. Tschopp, X. Sun and M. Khaleel, Comparison of reconstructed spatial microstructure images using different statistical descriptors. Submitted to Computational Materials Science

    Google Scholar 

  21. D. Li Application of chemical image reconstruction on materials science and technology. accepted by Proceeding of 2011 World Congress of Engineering and Technology, IEEE, and will present the paper in October 2011

    Google Scholar 

  22. L.M. Kindle, I.A. Kakadiaris, T. Ju, and J.P. Carson (2011) A semiautomated approach for artefact removal in serial tissue cryosections. Journal of Microscopy. 241(2):200–6.

    Article  Google Scholar 

  23. J.P. Carson, D.R. Einstein, K.R. Minard, M.V. Fanucchi, C.D. Wallis, and R.A Corley (2010) High resolution lung airway cast segmentation with proper topology suitable for computational fluid dynamic simulations. Computerized Medical Imaging and Graphics. In Press.

    Google Scholar 

  24. M. Hohn, G. Tang, G. Goodyear, P.R. Baldwin, Z. Huang, P.A. Penczek, C. Yang, R.M. Glaeser, P.D. Adams, and S.J. Ludtke, “SPARX, a new environment for Cryo-EM image processing” in J Struct Biol. 157, 47–55, 2007

    Google Scholar 

  25. B.F. Jones, S. Wuchty, and B. Uzzi, 2008. ‘Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science’ in Science Express on 9 October 2008, Science 21 November 2008: Vol. 322. no. 5905, pp. 1259–1262

    Google Scholar 

  26. R. Guimera, B. Uzzi, J. Spiro, and L.A.N. Amaral, 2005. ‘Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance’ in Science, 308, 697 (2005).

    Google Scholar 

  27. M. Pianta and D. Archibugi, 1991. ‘Specialization and size of scientific activities: A bibliometric analysis of advanced countries’ in Scientometrics Volume 22, Number 3/November, 1991

    Google Scholar 

  28. W. West and P. Nightingale, 2009. ‘Organizing for innovation: towards successful translational research’ in Trends in Biotechnology, Volume 27, Issue 10, 558–561, 17 August 2009

    Google Scholar 

  29. Committee on Facilitating Interdisciplinary Research, National Academy of Sciences, National Academy of Engineering, Institute of Medicine. 2004. ‘The Drivers for Interdisciplinary Research’ in Facilitating interdisciplinary Research p 26–40, 2004

    Google Scholar 

  30. D. Shotton, K. Portwin, G. Klyne, and A. Miles, 2009. ‘Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article’ in Publication Library of Science Computational Biology. 2009 April; 5(4).

    Google Scholar 

  31. A. de Waard, L. Breure, J.G. Kircz, and H. van Oostendorp, 2006. ‘Modeling rhetoric in scientific publication’ in Proceedings of the International Conference on Multidisciplinary Information Sciences and Technologies, pp 1–5, InSciT2006; 25–28 October 2006; Merida, Spain. http://www.instac.es/inscit2006/papers/pdf/133.pdf.

  32. T. Kuhn, 1962. The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962)

    Google Scholar 

  33. B. Latour, 1987. ‘Science in Action’ in How to Follow Scientists and Engineers through Society, Cambridge, Ma.: Harvard University Press, 1987.

    Google Scholar 

  34. C. Goble and D. deRoure, 2009. “The impact of Workflow tools on data-centric research” In The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009, Microsoft Research.

    Google Scholar 

  35. C.J. Savage and A.J. Vickers (2009) Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078.

    Article  Google Scholar 

  36. J.M. Wicherts, D. Borsboom, J. Kats, and D. Molenaar, 2006. ‘The poor availability of psychological research data for reanalysis’ in American Psychologist 61: 726–728.

    Google Scholar 

  37. D. De Roure, C. Goble, S. Aleksejevs, S. Bechhofer, J. Bhagat, D. Cruickshank, D. Michaelides, and D. Newman, 2009. ‘The myExperiment Open Repository for Scientific Workflows’ in: Open Repositories 2009, May 2009, Atlanta, Georgia, US. (Submitted).

    Google Scholar 

  38. C. Southan and G. Cameron, 2009. “Beyond the Tsunami: Developing the Infrastructure to Deal with Life Sciences data” In The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009, Microsoft Research.

    Google Scholar 

  39. S. Coles and L. Carr, 2008. ‘Experiences with Repositories & Blogs in Laboratories’ in Proceedings of: Third International Conference on Open Repositories 2008, 1–4 April 2008, Southampton, United Kingdom.

    Google Scholar 

  40. T. Velden and C. Lagoze, The Value of new Communication Models for Chemistry, White Paper 2009, eCommens@Cornell, http://hdl.handle.net/1813/14150.

  41. J.D. Blower, A. Santokhee, A.J. Milsted, and J.G. Frey, BlogMyData: a Virtual Research Environment for collaborative visualization of environmental data. All Hands Meeting 2010, Cardiff UK 13–16 Sep 2010 http://eprints.soton.ac.uk/164533/.

  42. I. Gorton, C. Sivaramakrishnan, G. Black, S. White, S. Purohit, M. Madison, and K. Schuchardt, 2011. Velo: riding the knowledge management wave for simulation and modeling. In Proceeding of the 4th international workshop on Software engineering for computational science and engineering (SECSE ’11). ACM, New York, NY, USA, 32–40.

    Google Scholar 

  43. L.E.C. Roberts, L.J. Blanshard, K. Kleese Van Dam, L. Price, S.L. Price, and I. Brown, Providing an Effective Data Infrastructure for the Simulation of Complex Materials. Proc. UK e-Science Programme All Hands Meeting 2006 (AHM 2006).

    Google Scholar 

  44. A.M. Walker, R.P. Bruin, M.T. Dove, T.O.H. White, K. Kleese van Dam, and R.P. Tyer. Integrating computing, data and collaboration grids: the RMCS tool. Philosophical Transactions of The Royal Society A 367 (1890) 1047–1050 (2009) [doi:10.1098/rsta.2008.0159]

    Google Scholar 

  45. A. Woolf, B. Lawrence, R. Lowry, K. Kleese van Dam, R. Cramer, and M. Gutierrez. Data integration with the Climate Science Modelling Language Proc. European Geosciences Union General Assembly 2005, Vienna, Austria, 24–29 Apr 2005, Geophysical Research Abstracts, Volume 7, 08775, 2005 (2005), Fourth GO-ESSP meeting, RAL, UK, 06–08 Jun 2005, Workshop on Grid Middleware and Geospatial Standards for Earth System Science Data, NESC workshop, Edinburgh, Scotland, 06–08 Sep 2005.

    Google Scholar 

  46. S.D. Miller, K.W. Herwig, S. Ren, S.S. Vazhkusai, P.R. Jemian, S. Luitz, A.A. Salnikov, I. Gaponenko, T. Proffen, P. Lewis, and M.L. Green, “Data Management and Its Role in Delivering Science at DOE BES User Facilities – Past, Present, and Future.

    Google Scholar 

  47. J. Ahrens, B. Hendrickson, S. Miller, R. Ross, and D. Williams, “Data Intensive Science in the Department of Energy” October 2010, LA-UR-10-07088.

    Google Scholar 

  48. K. Koski, C. Gheller, S. Heinzel, A. Kennedy, A. Streit, and P. Wittenburg. Strategy for a European Data Infrastructure: White Paper. Technical report, Partnership for Advanced Data in Europe (PARADE), September 2009.

    Google Scholar 

  49. M. Atkinson, M. Kersten, A. Szalay, and J. van Hemert. Data Intensive Research Theme. NESC Technical Report, May 2010.

    Google Scholar 

  50. J. Wood, T. Anderson, A. Bachem, C. Best, F. Genova, D. Lopez, W. Los, M. Marinucci, L. Romary, H. Van de Sompel, J. Vigen, P. Wittenburg, D. Giaretta, R.L. Hudson. Riding the Wave – How Europe can gain from the rising tide of scientific data, October 2010.

    Google Scholar 

  51. J. Ahrens, B. Hendrickson, G. Long, S. Miller, R. Ross, and D. Williams. Data Intensive Science in the Department of Energy, October 2010.

    Google Scholar 

  52. K. Kleese van Dam, T. Critchlow, J. Johnson, I. Gorton, D. Daly, R. Russell, and J. Feo. The Future of Data Intensive Science Experimenting in Data - Across the Scales, Across Technologies, Across the Disciplines. PNNL White Paper, November 2010. https://sites.google.com/site/dataintensivesciencecommunity/home

  53. D. Atkins, T. Detterich, T. Hey, S. Baker, S. Feldman, and L. Lyon, NSF-OCI Task Force on Data and Visualization, March 7, 2011.

    Google Scholar 

  54. P. Rich, “Infrastructure III”, I/O Tutorial, An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago, 2009, http://flash.uchicago.edu/website/codesupport/tutorial_talks/June2009/IOtutorial.pdf (accessed May 6th 2011)

  55. Scientific Grand Challenges – Discovery in Basic Energy Sciences: the Role of Computing at the Extreme Scale, Report of DOE workshop, August 13–15, Washington DC.

    Google Scholar 

  56. B. Fultz, K.W. Herwig, and G.G. Long, “Computational Scattering Science 2010”, Workshop held at Argonne National Laboratory July 7–9 2010. Workshop report. http://neutronscattering.org/2011/01/computational-scattering-science

Download references

Acknowledgements

S.D.M. acknowledges that the research at Oak Ridge National Laboratory’s Spallation Neutron Source was sponsored by the Scientific User Facilities Division, Office of Basic Energy Sciences, U. S. Department of Energy.

S.D.M and J.W.C. acknowledge that the submitted manuscript has been co-authored by a contractor of the U.S. Government under Contract No. DE-AC05-00OR22725. Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.

J.W.C. acknowledges that this material is based upon work supported by the National Science Foundation under Grant No. 050474. This research was supported in part by the National Science Foundation through TeraGrid resources provided by the Neutron Science TeraGrid Gateway.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kerstin Kleese van Dam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

van Dam, K.K., Li, D., Miller, S.D., Cobb, J.W., Green, M.L., Ruby, C.L. (2011). Challenges in Data Intensive Analysis at Scientific Experimental User Facilities. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1415-5_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1414-8

  • Online ISBN: 978-1-4614-1415-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics