Scientific Data Collections and Distributed Collective Practice

  • Melissa H. Cragin
  • Kalpana Shankar


As the basic sciences become increasingly information-intensive, the management and use of research data presents new challenges in the collective activities that constitute scholarly and scientific communication. This also presents new opportunities for understanding the role of informatics in scientific work practices, and for designing new kinds tools and resources needed to support them. These issues of data management, scientific communication and collective activity are brought together at once in scientific data collections (SDCs). What can the development and use of shared SDCs tell us about collective activity, dynamic infrastructures, and distributed scientific work? Using examples drawn from a nascent neuroscience data collection, we examine some unique features of SDCs to illustrate that they do more than act as infrastructures for scientific research. Instead, we argue that they are themselves instantiations of Distributed Collective Practice (DCP), and as such illustrate concepts of transition, emergence, and interdependency that may not be so apparent in other kinds of DCPs. We propose that research into SDCs can yield new insights into institutional arrangements, policymaking, and authority structures in other very large-scale socio-technical networks.

Key words

distributed collective practice e-Science Long-Lived Digital Data Collections scientific data collections scientific information infrastructure socio-technical systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Atkins, Daniel (2003): Revolutionizing Science and Engineering Through Cyberinfrastructure. Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure.Google Scholar
  2. Benson Dennis A., Karsch-Mizrachi Ilene, Lipman David J., Ostell James, Wheeler David L. (2005) GenBank. Nucleic Acids Research 33(1):D34–D38PubMedGoogle Scholar
  3. Berman Helen M., Westbrook John, Feng Zukang, Gilliland Gary, Bhat T.N., Weissig Helge, Shindyalov Ilya N., Bourne Philip E. (2000) The Protein Data Bank. Nucleic Acids Research 28(1): 235–242PubMedCrossRefGoogle Scholar
  4. Birnholtz, Jeremy P. and Matthew J. Bietz. (2003): Data at Work: Supporting Sharing in Science and Engineering. Proceedings of the (2003) International ACM SIGGROUP conference on Supporting Group Work, November 9–12, 2003. Sanibel Island, FL: ACM Press, pp. 339–348Google Scholar
  5. Bowker, Geoffrey (forthcoming): Memory Practices in the Sciences. Cambridge, MA: MIT Press.Google Scholar
  6. Bowker Geoffrey, Susan Leigh Star (2000). Sorting Things Out. MIT Press, Cambridge, MAGoogle Scholar
  7. Cragin Melissa H. (2003) Toward Integrative Science: Organizing Biodiversity and Neuroscience Data. Bulletin of the American Society for Information Science and Technology 30(1): 14–17CrossRefGoogle Scholar
  8. Faundeen John L. (2003). The Challenge of Archiving and Preserving Remotely Sensed Data. Data Science Journal 2:159–163CrossRefGoogle Scholar
  9. Fienberg Stephen E. (1994). Sharing Statistical Data in the Biomedical and Health Sciences: Ethical, Institutional, Legal, and Professional Dimensions. Annual Review of Public Health 15: 1–18PubMedCrossRefGoogle Scholar
  10. Finholt Thomas A. (2003). Collaboratories as a New Form of Scientific Organization. Economics of Innovation and New Technology 12(1): 5–25CrossRefGoogle Scholar
  11. Gerson Elihu M. (1983). Scientific Work and Social Worlds. Knowledge, Creation, Diffusion, Utilization 4(3): 357–377Google Scholar
  12. Huerta Michael F., Koslow Stephen H. (1996) Neuroinformatics: Opportunities across Disciplinary and National Borders. Neuroimage 4(3):S4–S6PubMedCrossRefGoogle Scholar
  13. International Astronomical Union (2000): Astronomical Data and Documentation: Report by the Delegate of the International Astronomical Union to the 22nd CODATA General Assembly. Dino, Italy, October 2000. Available: Last accessed: May 30, 2005
  14. King, John Leslie (2006): Modern Information Infrastructure in the Support of Distributed Collective Practice in Transport. Computer Supported Cooperative Work, (this issue).Google Scholar
  15. Kling, Rob, Geoffrey McKim, Joanna Fortuna and Adam King (2000): Scientific Collaboratories as Socio-Technical Interaction Networks: A Theoretical Approach. School of Library and Information Science, Indiana University-Bloomington. Available at Accessed May 27, 2005.
  16. Koslow Stephen H. (2000) Should the Neuroscience Community Make a Paradigm Shift to Sharing Primary Data? Nature Neuroscience 3(9): 863–865PubMedCrossRefGoogle Scholar
  17. Knowledge Network for Biodiversity. Last Last accessed May 30, 2005
  18. Lewkowicz, Myriam and Eddie Soulier (2004): Reflections on Representations for Infrastructural Studies in the Field of DCP. Invited talk. Distributed Collective Practice: Building new Directions for Infrastructural Studies, a workshop of the CSCW 2004 conference. Chicago, IL, November, 2004Google Scholar
  19. National Science Board (2005): Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. National Science Foundation Report NSB-05-04, September 2005. Accessed March 18, 2006. Available:
  20. Newman Harvey B., Ellisman Mark H., Orcott John A. (2003) Data-intensive e-Science Frontier Research. Communications of the ACM 46(11):69–77CrossRefGoogle Scholar
  21. Palmer, Carole L., Melissa H. Cragin and Timothy P. Hogan (2004): Information at the Intersections of Discovery: Case Studies in Neuroscience. Proceedings of the American Society for Information Science and Technology Annual Meeting, Providence, RI, November 13–18, 2004. Medford, NJ: Information Today, Inc., pp. 448–455Google Scholar
  22. Schmidt, Kjeld (2000). Distributed Collective Practice: A CSCW Perspective. Invited Talk. Conference on Distributed Collective Practice, Paris, September 19–22, 2000. Accessed December 10, 2005. Available:
  23. Smalheiser Neil R. (2003) Linking Investigators: A Centralized Linking Facility for Data Sharing and Coordination of Samples in Tissue Banks. EMBO Reports 4(2): 108–110PubMedCrossRefGoogle Scholar
  24. Star Susan Leigh, Karen Ruhleder (1996) The Ecology of Infrastructures: Problems in the Implementation for Large-Scale Information Spaces. Information Systems Research 7(1): 111–134CrossRefGoogle Scholar
  25. Van House, N.A. (2002): Trust and Epistemic Communities in Biodiversity Data Sharing. In JCDL ’02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, Portland, Oregon, July 14–18, 2002. New York, NY: ACM Press, pp. 231–239Google Scholar
  26. Weiss, Peter (2002): Borders in Cyberspace: Conflicting Public Sector Information Policies and Their Economic Impacts. US National Weather Service, February 2002. Accessed: May 30, 2005. Available:
  27. Winner (2004). Trust and Terror: the Vulnerability of Complex Socio-Technical Systems. Science as Culture 13(2): 155–172CrossRefGoogle Scholar
  28. Zimmerman, Ann (2003): Data Sharing and Secondary Use of Scientific Data: Experiences of Ecologists. Unpublished dissertation, University of Michigan.Google Scholar

Copyright information

© Springer-verlag 2006

Authors and Affiliations

  1. 1.Graduate School of Library and Information ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.School of InformaticsIndiana UniversityBloomingtonUSA

Personalised recommendations