A Multidisciplinary, Model-Driven, Distributed Science Data System Architecture
The twenty-first century has transformed the world of science by breaking the physical boundaries of distributed organizations and interconnecting them into virtual science environments, allowing for systems and systems of systems to seamlessly access and share information and resources across highly geographically distributed areas. This e-science transformation is enabling new scientific discoveries by allowing for greater collaboration as well as by enabling systems to combine and correlate disparate data sets. At the Jet Propulsion Laboratory in Pasadena, California, we have been developing science data systems for highly distributed communities in physical and life sciences that require extensive sharing of distributed services and common information models based on common architectures. The common architecture contributes a set of atomic functions, interfaces, and information models that support sharing and distributed processing. Additionally, the architecture provides a blueprint for a software product line known as the Object Oriented Data Technology (OODT) framework. OODT has enabled reuse of software for science data generation, capture and management, and delivery across highly distributed organizations for planetary science, earth science, and cancer research. Our experience to date shows that a well-defined architecture and set of accompanied software vastly improves our ability to develop road maps for and to construct virtual science environments.
KeywordsResource Description Framework Software Product Line Information Architecture Microwave Limb Sound Planetary Science
The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
- 1.D. Crichton, J.S. Hughes, S. Kelly and J. Hyon. “Science Search and Retrieval using XML”. In Proceedings of the 2 nd National Conference on Scientific and Technical Data, Washington D.C., National Academy of Sciences. March 2000. http://oodt.jpl.nasa.gov/doc/papers/codata/paper.pdf
- 2.D. Crichton, et al., “Creating a National Virtual Knowledge Environment for Proteomics and Information Management,” in Informatics and Proteomics: Marcel Dekker Publishers, 2005. Google Scholar
- 3.D. Crichton, et al., “Facilitating Climate Modeling Research and Analysis via the Climate Data eXchange,” In Proc. Workshop on Global Organization for Earth System Science Portals (GO-ESSP), Seattle, WA, 2008.Google Scholar
- 4.J. S. Hughes, et al., “The Semantic Planetary Data System,” In Proc. 3rd Symposium on Ensuring Long-term Preservation and Adding Value to Scientific and Technical Data, The Royal Society, Edinburgh, UK, 2005.Google Scholar
- 5.C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. “A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications”. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721–730, Shanghai, China, May 20th–28th, 2006.Google Scholar
- 6.J. S. Hughes, et al., “Intelligent Resource Discovery using Ontology-based Resource Profiles,” Data Science Journal, 2005.Google Scholar
- 7.C. Mattmann, et al., “A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions,” in Submitted to 3rd IEEE Intl’ Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), 2009.Google Scholar
- 8.“Reference Model for an Open Archival Information System (OAIS),” CCSDS 650.0-B-1, 2002.Google Scholar
- 9.D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. “A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer”. Accepted for publication at the 2nd IEEE International Conference on e-Science and Grid Computing, Amsterdam, the Netherlands, December 4th-6th, 2006.Google Scholar
- 10.R. N. Taylor, N. Medvidovic and E. Dashofy. Software Architecture: Foundations, Theory and Practice. Wiley Press, 2009.Google Scholar
- 11.Apache Tika. http://lucene.apache.org/tika/, 2010.
- 12.ISO/IEC CD 11179–3 Information Technology – Data Management and Interchange – Metadata Registries (MDR) – Part 3: Registry Metamodel (MDR3) (2002). http://www.jtc1sc32.org/sc32/jtc1sc32.nsf/Attachments/00DEC39D41D17B1288256A5300603FED
- 13.S. Weibel, J. Kunze, C. Lagoze, M. Wolf. Dublin Core Metadata for Resource Discovery. Internet Engineering Task Force RFC, 1998.Google Scholar
- 15.M. Cook. Building Enterprise Information Architectures: Reengineering Information Systems. Prentice-Hall, 1996.Google Scholar
- 16.Apache Lucene, http://lucene.apache.org/, 2010.
- 18.Gorton, P. Greenfield, A. Szalay and R. Williams. Data-Intensive Computing in the 21st Century. IEEE Computer, vol. 41, no. 4., p. 30, 2008.Google Scholar
- 19.R. T. Kouzes, G. A. Anderson, S. T. Elbert, I. Gorton, and D. K. Gracio. The changing paradigm of data-intensive computing. IEEE Computer, vol. 42, no. 1, pp. 26–34, 2009.Google Scholar
- 20.S. S. Laurent, J. Johnston and E. Dumbill. Programming web services with XML-RPC. O’Reilly Media, 2001.Google Scholar
- 21.O. Lassila and R. R. Swick. Resource description framework (RDF) model and syntax, World Wide Web Consortium, http://www.w3.org/TR/WD-rdf-syntax, 2010.
- 24.Y. Tina Lee (1999). “Information modeling from design to implementation” National Institute of Standards and Technology.Google Scholar
- 25.M. Uschold and G. M., “Ontologies and Semantics for Seamless Connectivity,” SIGMOD Record, vol. 33, 2004.Google Scholar
- 26.CODMAC, Data Management and Computation, Vol. 1: Issues and Recommendations. Committee on Data Management and Computation, Space Sciences Board. Assembly of Mathematical and Physical Sciences, National Research Council, 1982. Google Scholar
- 27.D. Crichton. Core Standards and Implementation of the International Planetary Data Alliance. 37th COSPAR Scientific Assembly. vol. 37, pp. 600, 2008.Google Scholar
- 28.IPCC Intergovernmental Panel on Climate Change, http://www.ipcc.ch/, 2010.
- 29.C. Mattmann, A. Braverman, D. Crichton. Understanding Architectural Tradeoffs Necessary to Increase Climate Model Intercomparison Efficiency. ACM SIGSOFT Software Engineering Notes, vol. 35, no. 3, July 2010.Google Scholar
- 30.B Fortner. Hdf: The hierarchical data format. Dr Dobb’s J. Software Tools and Professional Programming, 1998.Google Scholar
- 33.Hart, C. Mattmann, J. Tran, D. Crichton, H. Kincaid, J. S. Hughes, S. Kelly, K. Anton, D. Johnsey, C. Patriotis. Enabling Effective Curation of Cancer Biomarker Research Data. In Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS), Albuquerque, NM, August 3rd-4th, 2009.Google Scholar
- 34.T. Hey and A. Trefethen. The UK e-Science Core Programme and the Grid. Computational Science, vol. 2329/2002, pp. 3-21, 2002.Google Scholar