Advertisement

Data Product Configuration Management and Versioning in Large-Scale Production of Satellite Scientific Data

  • Bruce R. Barkstrom
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2649)

Abstract

This paper describes a formal structure for keeping track of files, source code, scripts, and related material for large-scale Earth science data production. We first describe the environment and processes that govern this configuration management problem. Then, we show that a graph with typed nodes and arcs can describe the derivation of production design and of the produced files and their metadata. The graph provides three useful by-products:
  • a hierarchical data file inventory structure that can help system users find particular files,

  • methods for creating production graphs that govern job scheduling and provenance graphs that track all of the sources and transformations between raw data input and a particular output file,

  • a systematic relationship between different elements of the structure and development documentation.

Keywords

Data Product Tropical Rainfall Measure Mission Production Graph Algorithm Class Science Team 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cavalcanti, M. C., M. L. Campos, and M. Mattoso, “Managing Scientific Models in Structural Genomic Projects,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html.
  2. 2.
    Pancerella, C., J. Myers, and L. Rahn, “Data Provenance in the CMCS,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html.
  3. 3.
    Cavanaugh, R., G. Graham, and M. Wilde, “Satisfying the Tax Collector: Using Data Provenance as a way to audit data analyses in High Energy Physics,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html.
  4. 4.
    Mann, R., “Some Data Derivation and Provenance Issues in Astronomy,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html.
  5. 5.
    Fox, P., “Some Thoughts on Data Derivation and Provenance,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html.
  6. 6.
    Musick, R., and T. Critchlow, “Practical Lessons in Supporting Large Scale Computational Science,” Lawrence Livermore Report UCRL-JC-135606, 1999.Google Scholar
  7. 7.
    Baum, B., and B. R. Barkstrom, “Design and implementation of a prototype data system for Earth radiation budget, cloud, aerosol, and chemistry data,” Bull. Amer. Meteor. Soc., 74, 591–598, 1993CrossRefGoogle Scholar
  8. 8.
    Knuth, D. E., The Art of Computer Programming, Volume 1: Fundamental Algorithms, 2nd Ed., Addison-Wesley, Reading, MA, 1973.Google Scholar
  9. 9.
    Frew, J., and R. Bose, “Lineage Issues for Scientific Data and Information,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html
  10. 10.
    Mahler, A., Variants: Keeping Things Together and Telling Them Apart, Configuration Management, W. F. Tichy, ed., 73–97, J. Wiley, 1994.Google Scholar
  11. 11.
    Zeller, A., and G. Snelting, Unified Versioning through Feature Logic, ACM Trans. On Software Engineering and Methodology, 6, 398–441, 1997.CrossRefGoogle Scholar
  12. 12.
    Conradi, R. and B. Westfechtel, Version models for software configuration management, ACM Computing Surveys, 30, No. 2, 232–282, 1998.CrossRefGoogle Scholar
  13. 13.
    Estublier, J., J-M. Favre, and P. Morat, “Toward SCM/PDM integration?,” Proc. SCM8, Bruxelles, Belgium, July, 1998, Springer-Verlag, LNCS 1439, 75–95.Google Scholar
  14. 14.
    Cui, Y. Lineage Tracing in Data Warehouses, Ph.D. Dissertation, Stanford Univ., 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Bruce R. Barkstrom
    • 1
  1. 1.Atmospheric Sciences Data CenterNASA Langley Research CenterHamptonUSA

Personalised recommendations