Skip to main content

Advertisement

Log in

Astroinformatics: data-oriented astronomy research and education

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

The growth of data volumes in science is reaching epidemic proportions. Consequently, the status of data-oriented science as a research methodology needs to be elevated to that of the more established scientific approaches of experimentation, theoretical modeling, and simulation. Data-oriented scientific discovery is sometimes referred to as the new science of X-Informatics, where X refers to any science (e.g., Bio-, Geo-, Astro-) and informatics refers to the discipline of organizing, describing, accessing, integrating, mining, and analyzing diverse data resources for scientific discovery. Many scientific disciplines are developing formal sub-disciplines that are information-rich and data-based, to such an extent that these are now stand-alone research and academic programs recognized on their own merits. These disciplines include bioinformatics and geoinformatics, and will soon include astroinformatics. We introduce Astroinformatics, the new data-oriented approach to 21st century astronomy research and education. In astronomy, petascale sky surveys will soon challenge our traditional research approaches and will radically transform how we train the next generation of astronomers, whose experiences with data are now increasingly more virtual (through online databases) than physical (through trips to mountaintop observatories). We describe Astroinformatics as a rigorous approach to these challenges. We also describe initiatives in science education (not only in astronomy) through which students are trained to access large distributed data repositories, to conduct meaningful scientific inquiries into the data, to mine and analyze the data, and to make data-driven scientific discoveries. These are essential skills for all 21st century scientists, particularly in astronomy as major new multi-wavelength sky surveys (that produce petascale databases and image archives) and grand-scale simulations (that generate enormous outputs for model universes, such as the Millennium Simulation) become core research components for a significant fraction of astronomical researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://serc.carleton.edu/usingdata/ and http://serc.carleton.edu/files/usingdata/UsingData.pdf

  2. http://www.vanderbilt.edu/gradschool/bridge

  3. http://grants.nih.gov/grants/guide/pa-files/PA-06-094.html

  4. http://esto.nasa.gov/info_technologies_aist.html and http://reason-projects.gsfc.nasa.gov/

  5. http://research.microsoft.com/en-us/um/cambridge/projects/towards2020science/

  6. http://universe.ucdavis.edu/docs/LSST_petascale_challenge.pdf

  7. http://www.nsf.gov/events/event_summ.jsp?cntn_id=116644&org=MPS

References

  • Agresti W (2003) Discovery Informatics. CACM 46:25

    Google Scholar 

  • Atkins D et al (2003) Revolutionizing Science and Engineering through Cyberinfrastructure. Downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdf

  • Baker DN (2008) Informatics and the electronic geophysical year. EOS 89:485

    Article  Google Scholar 

  • Ball NM, Brunner RJ (2009) Data mining and machine learning in astronomy. arXiv:0906.2173v1

  • Becla J, Hanushevsky A, Nikolaev S, Abdulla G, Szalay A, Nieto-Santisteban M, Thakar A, Gray J (2006) Designing a multi-petabyte database for LSST. arXiv:cs/0604112v1

  • Bell G, Gray J, Szalay A (2007) Petascale computational systems. arXiv:cs/0701165v1

  • Bloom J, Starr DL, Butler NR, Nugent P, Rischard M, Eads D, Poznanksi D (2008) Towards a real-time transient classification engine. Astron Nachr 329:284

    Article  Google Scholar 

  • Borne K (2001a) Science user scenarios for a VO design reference mission: science requirements for data mining, in virtual observatories of the future, p 333

  • Borne K (2001b) Data mining in astronomical databases, in mining the sky, p 671

  • Borne KD (2006) Data-driven discovery through e-science technologies. 2nd IEEE Conference on Space Mission Challenges for Information Technology

  • Borne KD (2007) Astroinformatics: the new escience paradigm for astronomy research and education. Microsoft eScience Workshop at RENCI

  • Borne K (2008a) A machine learning classification broker for the LSST transient database. Astron Nachr 329:255

    Article  Google Scholar 

  • Borne K (2008b) Data science challenges from distributed petascale astronomical sky surveys, in the DOE Workshop on Mathematical Analysis of Petascale Data, downloaded from http://www.orau.gov/mathforpetascale/slides/Borne.pdf

  • Borne K (2009a) Scientific data mining in astronomy. In: Next generation data mining. Chapman & Hall, pp 91–114

  • Borne K (2009b) Astroinformatics: a 21st century approach to astronomy. arXiv:0909.3892v1

  • Borne K (2009c) The VO and large surveys: what more do we need? Downloaded from http://www.astro.caltech.edu/~george/AIworkshop/Borne.pdf

  • Borne K (2009d) The zooniverse: advancing science through user-guided learning in massive data streams. Downloaded from http://www.kd2u.org/NGDM09/schedule_NGDM/schedule.htm

  • Borne K, Eastman T (2006) A paradigm for space science informatics. AGU, IN51A-05

  • Borne K, Jacoby S, Carney K, Connolly A, Eastman T, Raddick MJ, Tyson JA, Wallin J (2009a) The revolution in astronomy education: data science for the masses. Downloaded from arXiv:0909.3895v1

  • Borne K, Wallin J, Weigel R (2009b) The new computational and data sciences undergraduate program at George Mason University, ICCS 2009, Part II, LNCS 5545, 74

  • Brunner R, Djorgovski SG, Prince TA, Szalay AS (2001) Massive datasets in astronomy. Downloaded from arXiv:astro-ph/0106481v1

  • Butler D (2007) Agencies join forces to share data. Nature 446:354

    Article  Google Scholar 

  • Cleveland W (2007) Data science: an action plan. Int Stat Rev 69:21

    Article  Google Scholar 

  • Djorgovski SG, Mahabal A, Brunner R, Williams R, Granat R, Curkendall D, Jacob J, Stolorz P (2001) Exploration of parameter spaces in a virtual observatory. arXiv:astro-ph/0108346v1

  • Dolensky M (2004) Applicability of emerging resource discovery standards to the VO. In: Toward an international virtual observatory. Berlin, Springer, p 265

  • Dunham M (2002) Data mining introductory and advanced topics. Prentice-Hall

  • Eastman T, Borne K, Green J, Grayzeck E, McGuire R, Sawyer D (2005) eScience and archiving for space science. Data Sci J 4:67–76

    Article  Google Scholar 

  • Graham M, Fitzpatrick M, McGlynn T (2007) The National Virtual Observatory: tools and techniques for astronomical research. ASP Conference Series, Vol. 382

  • Gray J (2003) Online Science. Downloaded from http://research.microsoft.com/en-us/um/people/gray/JimGrayTalks.htm

  • Gray J, Szalay A (2004) Where the rubber meets the sky: bridging the gap between databases and science. Microsoft technical report MSR-TR-2004-110

  • Gray J, Szalay A, Thakar A, Kunszt P, Stoughton C, Slutz D, vandenBerg J (2002) Data Mining in the SDSS SkyServer Database, arXiv:cs/0202014v1

  • Gray J, Liu D, Nieto-Santisteban M, Szalay A, Dewitt D, Beger G (2005) Scientific data management in the coming decade, arXiv:cs/0502008v1

  • Hey J, Trefethen A (2002) The UK e-Science core programme and the grid. Future Gener Comput Syst 18:1017–1031

    Article  Google Scholar 

  • Hey T, Tansley S, Tolle K (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Downloaded from http://research.microsoft.com/en-us/collaboration/fourthparadigm/

  • Iwata S (2008) Scientific “Agenda” of data science. Data Sci J 7:54

    Article  Google Scholar 

  • Kegelmeyer P, Calderbank R, Critchlow T, Jameson L, Kamath C, Meza J, Samatova N, Wilson A (2008) Mathematics for Analysis of Petascale Data: Report on a DOE Workshop. Downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf

  • Mahootian F, Eastman T (2009) Complementary frameworks of scientific inquiry. World Futures 65:61

    Article  Google Scholar 

  • Millar AH (2004) Location, location, location: surveying the intracellular real estate through proteomics in plants. Funct Plant Biol 31(6):563

    Article  Google Scholar 

  • Mould J (2004) LSST Followup, http://www.lsst.org/Meetings/CommAccess/abstracts.shtml

  • National Academies of Science (NAS 1997) Bits of Power: Issues in Global Access to Scientific Data, downloaded from http://www.nap.edu/catalog.php?record_id=5504

  • NSF (National Science Foundation) report (2003) Knowledge lost in information: research directions for digital libraries, downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf

  • NSF/JISC Repositories Workshop (2007) Downloaded from http://www.sis.pitt.edu/~repwkshop/

  • NSTC Interagency Working Group on Digital Data (2009) Harnessing the power of digital data for science and society, downloaded from http://www.nitrd.gov/about/Harnessing_Power_Web.pdf

  • Rutherford FJ, Ahlgren A (1991) Science for all Americans, Chapter 12, downloaded from http://www.project2061.org/publications/sfaa/online/chap12.htm

  • Schwartz MS, Sadler PM, Sonnert G, Tai RH (2008) Depth versus breadth: how content coverage in high school science courses relates to later success. Sci Educ. doi:10.1002/sce.20328

    Google Scholar 

  • Seni G, Elder J (2010) Ensemble methods in data mining: improving accuracy through combining predictions. Morgan & Claypool Publishers

  • Smith F (2006) Data science as an academic discipline. Data Sci J 5:163

    Article  Google Scholar 

  • Springel V et al (2005) Simulations of the formation, evolution and clustering of galaxies and quasars. Nature 435:629

    Article  Google Scholar 

  • Strauss M (2004) Towards a design reference mission for the LSST. Downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml

  • Szalay A (2008) Preserving digital data for the future of eScience. Science News, August 30, 2008

  • Szalay AS, Gray J, vandenBerg J (2002) Petabyte scale data mining: dream or reality? Downloaded from arXiv:cs/0208013v1

  • Tyson JA (2004) The large synoptic survey telescope: science & design, downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml

  • Tyson JA, Pike R, Stein M, Szalay A, The LSST collaboration (2002) LSST Data Challenges. Downloaded from http://universe.ucdavis.edu/docs/data-challenge.pdf

  • Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Yager RE (1982) What research says to the science teacher, Volume 4, p 117

Download references

Acknowledgments

We thank the National Science Foundation (NSF) for partial support of this work by the Division of Undergraduate Education (DUE) Course and Curriculum, and Laboratory Improvement (CCLI) program, through award #0737091. The author thanks numerous colleagues for their significant and invaluable contributions to the ideas expressed in this paper: Jogesh Babu, Douglas Burke, Andrew Connolly, Timothy Eastman, Eric Feigelson, Matthew Graham, Alexander Gray, Norman Gray, Suzanne Jacoby, Thomas Loredo, Ashish Mahabal, Robert Mann, Bruce McCollum, Misha Pesenson, M. Jordan Raddick, Alex Szalay, Tony Tyson, and John Wallin. Finally, the author wishes to express deep gratitude and appreciation to Keivan Stassun for his thorough and thoughtful review of an earlier version of this paper, and for his numerous helpful comments and suggestions, which considerably improved the final product.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kirk D. Borne.

Additional information

Communicated by Thomas Narock

Appendix A National Study Groups Face the Data Flood

Appendix A National Study Groups Face the Data Flood

Several national study groups have issued reports on the urgency of establishing scientific and educational programs to face the data flood challenges, including:

  1. 1.

    NAS (National Academies of Science) report: Bits of Power: Issues in Global Access to Scientific Data, (1997) downloaded from http://www.nap.edu/catalog.php?record_id=5504

  2. 2.

    NSF report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf

  3. 3.

    NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from http://www.ncar.ucar.edu/cyber/cyberreport.pdf

  4. 4.

    NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure, (2003) downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdf

  5. 5.

    NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf

  6. 6.

    NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from http://www.cra.org/reports/cyberinfrastructure.pdf

  7. 7.

    NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from http://www.arl.org/bm~doc/digdatarpt.pdf

  8. 8.

    National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from http://www.nap.edu/catalog.php?record_id=11019

  9. 9.

    NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from http://www.nsf.gov/od/oci/ci_v5.pdf

  10. 10.

    JISC/NSF Workshop report on Data-Driven Science & Repositories (2007) downloaded from http://www.sis.pitt.edu/~repwkshop/NSF-JISC-report.pdf

  11. 11.

    DOE (Department of Energy) report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, (2007) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/DOE-Visualization-Report-2007.pdf

  12. 12.

    DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf

  13. 13.

    NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from http://www.nitrd.gov/about/Harnessing_Power_Web.pdf

  14. 14.

    National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from http://www.nap.edu/catalog.php?record_id=12615

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borne, K.D. Astroinformatics: data-oriented astronomy research and education. Earth Sci Inform 3, 5–17 (2010). https://doi.org/10.1007/s12145-010-0055-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-010-0055-2

Keywords

Navigation