Abstract
The growth of data volumes in science is reaching epidemic proportions. Consequently, the status of data-oriented science as a research methodology needs to be elevated to that of the more established scientific approaches of experimentation, theoretical modeling, and simulation. Data-oriented scientific discovery is sometimes referred to as the new science of X-Informatics, where X refers to any science (e.g., Bio-, Geo-, Astro-) and informatics refers to the discipline of organizing, describing, accessing, integrating, mining, and analyzing diverse data resources for scientific discovery. Many scientific disciplines are developing formal sub-disciplines that are information-rich and data-based, to such an extent that these are now stand-alone research and academic programs recognized on their own merits. These disciplines include bioinformatics and geoinformatics, and will soon include astroinformatics. We introduce Astroinformatics, the new data-oriented approach to 21st century astronomy research and education. In astronomy, petascale sky surveys will soon challenge our traditional research approaches and will radically transform how we train the next generation of astronomers, whose experiences with data are now increasingly more virtual (through online databases) than physical (through trips to mountaintop observatories). We describe Astroinformatics as a rigorous approach to these challenges. We also describe initiatives in science education (not only in astronomy) through which students are trained to access large distributed data repositories, to conduct meaningful scientific inquiries into the data, to mine and analyze the data, and to make data-driven scientific discoveries. These are essential skills for all 21st century scientists, particularly in astronomy as major new multi-wavelength sky surveys (that produce petascale databases and image archives) and grand-scale simulations (that generate enormous outputs for model universes, such as the Millennium Simulation) become core research components for a significant fraction of astronomical researchers.
Similar content being viewed by others
Notes
References
Agresti W (2003) Discovery Informatics. CACM 46:25
Atkins D et al (2003) Revolutionizing Science and Engineering through Cyberinfrastructure. Downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdf
Baker DN (2008) Informatics and the electronic geophysical year. EOS 89:485
Ball NM, Brunner RJ (2009) Data mining and machine learning in astronomy. arXiv:0906.2173v1
Becla J, Hanushevsky A, Nikolaev S, Abdulla G, Szalay A, Nieto-Santisteban M, Thakar A, Gray J (2006) Designing a multi-petabyte database for LSST. arXiv:cs/0604112v1
Bell G, Gray J, Szalay A (2007) Petascale computational systems. arXiv:cs/0701165v1
Bloom J, Starr DL, Butler NR, Nugent P, Rischard M, Eads D, Poznanksi D (2008) Towards a real-time transient classification engine. Astron Nachr 329:284
Borne K (2001a) Science user scenarios for a VO design reference mission: science requirements for data mining, in virtual observatories of the future, p 333
Borne K (2001b) Data mining in astronomical databases, in mining the sky, p 671
Borne KD (2006) Data-driven discovery through e-science technologies. 2nd IEEE Conference on Space Mission Challenges for Information Technology
Borne KD (2007) Astroinformatics: the new escience paradigm for astronomy research and education. Microsoft eScience Workshop at RENCI
Borne K (2008a) A machine learning classification broker for the LSST transient database. Astron Nachr 329:255
Borne K (2008b) Data science challenges from distributed petascale astronomical sky surveys, in the DOE Workshop on Mathematical Analysis of Petascale Data, downloaded from http://www.orau.gov/mathforpetascale/slides/Borne.pdf
Borne K (2009a) Scientific data mining in astronomy. In: Next generation data mining. Chapman & Hall, pp 91–114
Borne K (2009b) Astroinformatics: a 21st century approach to astronomy. arXiv:0909.3892v1
Borne K (2009c) The VO and large surveys: what more do we need? Downloaded from http://www.astro.caltech.edu/~george/AIworkshop/Borne.pdf
Borne K (2009d) The zooniverse: advancing science through user-guided learning in massive data streams. Downloaded from http://www.kd2u.org/NGDM09/schedule_NGDM/schedule.htm
Borne K, Eastman T (2006) A paradigm for space science informatics. AGU, IN51A-05
Borne K, Jacoby S, Carney K, Connolly A, Eastman T, Raddick MJ, Tyson JA, Wallin J (2009a) The revolution in astronomy education: data science for the masses. Downloaded from arXiv:0909.3895v1
Borne K, Wallin J, Weigel R (2009b) The new computational and data sciences undergraduate program at George Mason University, ICCS 2009, Part II, LNCS 5545, 74
Brunner R, Djorgovski SG, Prince TA, Szalay AS (2001) Massive datasets in astronomy. Downloaded from arXiv:astro-ph/0106481v1
Butler D (2007) Agencies join forces to share data. Nature 446:354
Cleveland W (2007) Data science: an action plan. Int Stat Rev 69:21
Djorgovski SG, Mahabal A, Brunner R, Williams R, Granat R, Curkendall D, Jacob J, Stolorz P (2001) Exploration of parameter spaces in a virtual observatory. arXiv:astro-ph/0108346v1
Dolensky M (2004) Applicability of emerging resource discovery standards to the VO. In: Toward an international virtual observatory. Berlin, Springer, p 265
Dunham M (2002) Data mining introductory and advanced topics. Prentice-Hall
Eastman T, Borne K, Green J, Grayzeck E, McGuire R, Sawyer D (2005) eScience and archiving for space science. Data Sci J 4:67–76
Graham M, Fitzpatrick M, McGlynn T (2007) The National Virtual Observatory: tools and techniques for astronomical research. ASP Conference Series, Vol. 382
Gray J (2003) Online Science. Downloaded from http://research.microsoft.com/en-us/um/people/gray/JimGrayTalks.htm
Gray J, Szalay A (2004) Where the rubber meets the sky: bridging the gap between databases and science. Microsoft technical report MSR-TR-2004-110
Gray J, Szalay A, Thakar A, Kunszt P, Stoughton C, Slutz D, vandenBerg J (2002) Data Mining in the SDSS SkyServer Database, arXiv:cs/0202014v1
Gray J, Liu D, Nieto-Santisteban M, Szalay A, Dewitt D, Beger G (2005) Scientific data management in the coming decade, arXiv:cs/0502008v1
Hey J, Trefethen A (2002) The UK e-Science core programme and the grid. Future Gener Comput Syst 18:1017–1031
Hey T, Tansley S, Tolle K (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Downloaded from http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Iwata S (2008) Scientific “Agenda” of data science. Data Sci J 7:54
Kegelmeyer P, Calderbank R, Critchlow T, Jameson L, Kamath C, Meza J, Samatova N, Wilson A (2008) Mathematics for Analysis of Petascale Data: Report on a DOE Workshop. Downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf
Mahootian F, Eastman T (2009) Complementary frameworks of scientific inquiry. World Futures 65:61
Millar AH (2004) Location, location, location: surveying the intracellular real estate through proteomics in plants. Funct Plant Biol 31(6):563
Mould J (2004) LSST Followup, http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
National Academies of Science (NAS 1997) Bits of Power: Issues in Global Access to Scientific Data, downloaded from http://www.nap.edu/catalog.php?record_id=5504
NSF (National Science Foundation) report (2003) Knowledge lost in information: research directions for digital libraries, downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf
NSF/JISC Repositories Workshop (2007) Downloaded from http://www.sis.pitt.edu/~repwkshop/
NSTC Interagency Working Group on Digital Data (2009) Harnessing the power of digital data for science and society, downloaded from http://www.nitrd.gov/about/Harnessing_Power_Web.pdf
Rutherford FJ, Ahlgren A (1991) Science for all Americans, Chapter 12, downloaded from http://www.project2061.org/publications/sfaa/online/chap12.htm
Schwartz MS, Sadler PM, Sonnert G, Tai RH (2008) Depth versus breadth: how content coverage in high school science courses relates to later success. Sci Educ. doi:10.1002/sce.20328
Seni G, Elder J (2010) Ensemble methods in data mining: improving accuracy through combining predictions. Morgan & Claypool Publishers
Smith F (2006) Data science as an academic discipline. Data Sci J 5:163
Springel V et al (2005) Simulations of the formation, evolution and clustering of galaxies and quasars. Nature 435:629
Strauss M (2004) Towards a design reference mission for the LSST. Downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
Szalay A (2008) Preserving digital data for the future of eScience. Science News, August 30, 2008
Szalay AS, Gray J, vandenBerg J (2002) Petabyte scale data mining: dream or reality? Downloaded from arXiv:cs/0208013v1
Tyson JA (2004) The large synoptic survey telescope: science & design, downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
Tyson JA, Pike R, Stein M, Szalay A, The LSST collaboration (2002) LSST Data Challenges. Downloaded from http://universe.ucdavis.edu/docs/data-challenge.pdf
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Yager RE (1982) What research says to the science teacher, Volume 4, p 117
Acknowledgments
We thank the National Science Foundation (NSF) for partial support of this work by the Division of Undergraduate Education (DUE) Course and Curriculum, and Laboratory Improvement (CCLI) program, through award #0737091. The author thanks numerous colleagues for their significant and invaluable contributions to the ideas expressed in this paper: Jogesh Babu, Douglas Burke, Andrew Connolly, Timothy Eastman, Eric Feigelson, Matthew Graham, Alexander Gray, Norman Gray, Suzanne Jacoby, Thomas Loredo, Ashish Mahabal, Robert Mann, Bruce McCollum, Misha Pesenson, M. Jordan Raddick, Alex Szalay, Tony Tyson, and John Wallin. Finally, the author wishes to express deep gratitude and appreciation to Keivan Stassun for his thorough and thoughtful review of an earlier version of this paper, and for his numerous helpful comments and suggestions, which considerably improved the final product.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Thomas Narock
Appendix A National Study Groups Face the Data Flood
Appendix A National Study Groups Face the Data Flood
Several national study groups have issued reports on the urgency of establishing scientific and educational programs to face the data flood challenges, including:
-
1.
NAS (National Academies of Science) report: Bits of Power: Issues in Global Access to Scientific Data, (1997) downloaded from http://www.nap.edu/catalog.php?record_id=5504
-
2.
NSF report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf
-
3.
NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from http://www.ncar.ucar.edu/cyber/cyberreport.pdf
-
4.
NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure, (2003) downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdf
-
5.
NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf
-
6.
NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from http://www.cra.org/reports/cyberinfrastructure.pdf
-
7.
NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from http://www.arl.org/bm~doc/digdatarpt.pdf
-
8.
National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from http://www.nap.edu/catalog.php?record_id=11019
-
9.
NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from http://www.nsf.gov/od/oci/ci_v5.pdf
-
10.
JISC/NSF Workshop report on Data-Driven Science & Repositories (2007) downloaded from http://www.sis.pitt.edu/~repwkshop/NSF-JISC-report.pdf
-
11.
DOE (Department of Energy) report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, (2007) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/DOE-Visualization-Report-2007.pdf
-
12.
DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf
-
13.
NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from http://www.nitrd.gov/about/Harnessing_Power_Web.pdf
-
14.
National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from http://www.nap.edu/catalog.php?record_id=12615
Rights and permissions
About this article
Cite this article
Borne, K.D. Astroinformatics: data-oriented astronomy research and education. Earth Sci Inform 3, 5–17 (2010). https://doi.org/10.1007/s12145-010-0055-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-010-0055-2