Skip to main content

Statistical Methods for Astronomy

  • Reference work entry
Planets, Stars and Stellar Systems

Abstract

Statistical methodology, with deep roots in probability theory, provides quantitative procedures for extracting scientific knowledge from astronomical data and for testing astrophysical theory. In recent decades, statistics has enormously increased in scope and sophistication. After a historical perspective, this review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests, and point estimation. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are outlined. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and goodness of fit are considered.

Applied statistics relevant to astronomical research are briefly discussed. Nonparametric methods are valuable when little is known about the behavior of the astronomical populations or processes. Data smoothing can be achieved with kernel density estimation and nonparametric regression. Samples measured in many variables can be divided into distinct groups using unsupervised clustering or supervised classification procedures. Many classification and data mining techniques are available. Astronomical surveys subject to nondetections can be treated with survival analysis for censored data, with a few related procedures for truncated data. Astronomical light curves can be investigated using time-domain methods involving the autoregressive models, frequency-domain methods involving Fourier transforms, and state-space modeling. Methods for interpreting the spatial distributions of points in some space have been independently developed in astronomy and other fields.

Two types of resources for astronomers needing statistical information and tools are presented. First, about 40 recommended texts and monographs are listed covering various fields of statistics. Second, the public domain R statistical software system has recently emerged as a highly capable environment for statistical analysis. Together with its ∼ 3,000 (and growing) add-on CRAN packages, R implements a vast range of statistical procedures in a coherent high-level language with advanced graphics. Two illustrations of R’s capabilities for astronomical data analysis are given: an adaptive kernel estimator with bootstrap errors applied to a quasar dataset, and the second-order J function (related to the two-point correlation function) with three edge corrections applied to a galaxy redshift survey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 449.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 599.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adler, J. 2010, R in a Nutshell: A Desktop Quick Reference (O’Reilly Media)

    Google Scholar 

  • Babu, G. J. 1984, Bootstrapping statistics with linear combinations of chi-squares as weak limit, Sankhya, Ser A, 46, 85–93

    MathSciNet  MATH  Google Scholar 

  • Babu, G. J., & Singh, K. 1983, Inference on means using the bootstrap, Ann Stat, 11, 999–1003

    Article  MathSciNet  MATH  Google Scholar 

  • Babu, G. J., & Singh, K. 1984, On one term Edgeworth correction by Efron’s bootstrap, Sankhya, Ser A, 46, 219–232

    MathSciNet  MATH  Google Scholar 

  • Baddeley, A. 2010, Analysing Spatial Point Patterns in R, http://www.spatstat.org

  • Bevington, P. R. 1969, Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill)

    Google Scholar 

  • Bivand, R. S., Pebesma, E. J., & Gómez-Rubio, V. 2008, Applied Spatial Data Analysis with R (New York: Springer)

    MATH  Google Scholar 

  • Bowman, A. W., & Azzalini, A. 1997, Applied Smoothing Techniques for Data Analysis (Clarendon)

    Google Scholar 

  • Brown, L. D., Cai, T. T., & DasGupta, A. 2001, Interval estimation for a binomial proportion (with discussion), Stat Sci, 16, 101–133

    MathSciNet  MATH  Google Scholar 

  • Chatfield, C. 2004, The Analysis of Time Series: An Introduction (6th ed.; London: Chapman and Hall)

    Google Scholar 

  • Chen, C.-h., Härdle, W., & Unwin, A. (eds.), 2008, Handbook of Data Visualization (New York: Springer)

    Google Scholar 

  • Conover, W. J. 1999, Practical Nonparametric Statistics (New York: Wiley)

    Google Scholar 

  • Cowan, G. 2006, The small-N problem in high energy physics, in Statistical Challenges in Modern Astronomy IV, ed. G. J. Babu, & E. D. Feigelson (New York: Springer), 75–86

    Google Scholar 

  • Cowpertwait, P. S. P., & Metcalfe, A. V. 2009, Introductory Time Series with R (New York: Springer)

    MATH  Google Scholar 

  • Cox, D. R. 2006, Principles of Statistical Inference (Cambridge, UK: Cambridge University Press)

    Book  MATH  Google Scholar 

  • Dalgaard, P. 2008, Introductory Statistics with R (New York: Springer)

    Book  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. 1977, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc B, 39, 1–38

    MathSciNet  MATH  Google Scholar 

  • Drinkwater, M. J., Parker, Q. A., Proust, D. Slezak, E., & Quintana, H., 2004, Publ Astron Soc Aust, 21, 89

    Article  ADS  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. 2001, Pattern Classification (2nd ed.; New York: Wiley)

    Google Scholar 

  • Edelson, R. A., & Krolik, J. H. 1988, The discrete correlation function: A new method for analyzing unevenly sampled variability data, Astrophys J 333, 646–659

    Article  ADS  Google Scholar 

  • Efron, B., & Tibshirani, R. J. 1993, An Introduction to the Bootstrap (London: Chapman and Hall)

    MATH  Google Scholar 

  • Evans, M., Hastings, N., & Peacock, B. 2000, Statistical Distributions (3rd ed.; New York: Wiley)

    Google Scholar 

  • Everitt, B. S., Landau, S., & Leese, M. 2001, Cluster Analysis (4th ed.; Arnold)

    Google Scholar 

  • Feigelson, E. D., & Babu, G. J. 2012, Modern Statistical Methods for Astronomy with R Applications (Cambridge, UK: Cambridge University Press)

    Book  Google Scholar 

  • Feigelson, E. D., & Nelson, P. I. 1985, Statistical methods for astronomical data with upper limits. I – Univariate distributions, Astrophys J 293, 192–206

    Article  MathSciNet  ADS  Google Scholar 

  • Fortin, M.-J., & Dale, M. R. T. 2005, Spatial Analysis: A Guide for Ecologists (Cambridge, UK: Cambridge University Press)

    Google Scholar 

  • Fisher, R. A. 1922, On the mathematical foundations of theoretical statistics, Philos Trans R Soc A, 222, 309–368

    Article  ADS  MATH  Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. 2004, Bayesian Data Analysis (2nd ed.; London: Chapman and Hall)

    Google Scholar 

  • Gregory, P. C. 2005, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support (Cambridge, UK: Cambridge University Press)

    Book  MATH  Google Scholar 

  • Hall, P., Li, Q., & Racine, J. S. 2007, Nonparametric estimation of regression functions in the presence of irrelevant regressors, Rev Econ Stat, 89, 784–789

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.; New York: Springer)

    Google Scholar 

  • Hayfield, T., & Racine, J. S. 2008, Nonparametric Econometrics: The np Package, J Stat Softw, 27(5), 1–32

    Google Scholar 

  • Helsel, D. R. 2004, Nondetects and Data Analysis: Statistics for Censored Environmental Data (Wiley-Interscience)

    Google Scholar 

  • Hogg, R. V., & Tanis, E. 2009, Probability and Statistical Inference (10th ed.; Prentice Hall)

    Google Scholar 

  • Hou, A., Parker, L. C., Harris, W. E., & Wilman, D. J. 2009, Statistical tools for classifying galaxy group dynamics, Astrophys J, 702, 1199–1210

    Article  ADS  Google Scholar 

  • Illian, J., Penttinen, A., Stoyan, H., & Stoyan, D. 2008, Statistical Analysis and Modeling of Spatial Point Processes (Wiley-Interscience)

    Google Scholar 

  • James, F. 2006, Statistical Methods in Experimental Physics (2nd ed.; World Scientific)

    Google Scholar 

  • Johnson, N. L., Kotz, S., & Balakrishnan, N. 1994, Continuous Univariate Distributions, Vols. 1 and 2 (2nd ed.; Wiley-Interscience)

    Google Scholar 

  • Johnson, R. A., & Wichern, D. W. 2007, Applied Multivariate Statistical Analysis (6th ed.; Prentice-Hall)

    Google Scholar 

  • Kashyap, V. L., van Dyk, D. A., Connors, A., Freeman, P. E., Siemiginowska, A., Xu, J., & Zezas, A. 2010, On computing upper limits to source intensities, Astrophys J, 719, 900–914

    Article  ADS  Google Scholar 

  • Kelly, B. C. 2007, Some aspects of measurement error in linear regression of astronomical data, Astrophys J, 665, 1489–1506

    Article  ADS  Google Scholar 

  • Kitagawa, G., & Gersch, W. 1996, Smoothness Priors Analysis of Time Series (New York: Springer)

    Book  MATH  Google Scholar 

  • Klein, J. P., & Moeschberger, M. L. 2010, Survival Analysis: Techniques for Censored and Truncated Data (New York: Springer)

    Google Scholar 

  • Kruschke, J. K. 2011, Doing Bayesian Data Analysis: A Tutorial with R and BUGS (Academic)

    Google Scholar 

  • Kutner, M. H., Nachtsheim, C. J., & Neter, J. 2004, Applied Linear Regression Models (4th ed.; McGraw-Hill)

    Google Scholar 

  • Lawless, J. F. 2002, Statistical Models and Methods for Lifetime Data (2nd ed.; New York: Wiley)

    Google Scholar 

  • Lilliefors, H. W. 1969, On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown, J Am Stat Assoc, 64, 387–389

    Article  Google Scholar 

  • Loh, J. M. 2008, A valid and fast spatial bootstrap for correlation functions, Astrophys J, 681, 726–734

    Article  ADS  Google Scholar 

  • Lucy, L. B. 1974, An iterative technique for the rectification of observed distributions, Astron J, 79, 745–754

    Article  ADS  Google Scholar 

  • Lupton, R. 1993, Statistics in Theory and Practice (Princeton University Press)

    Google Scholar 

  • Maindonald, J., & Braun, W. J. 2010, Data Analysis and Graphics Using R: An Example-Based Approach (3rd ed.; Cambridge, UK: Cambridge University Press)

    Google Scholar 

  • Marquardt, D. W., & Acuff, S. K. 1984, Direct quadratic spectrum estimation with irregularly spaced data, in Time Series Analysis of Irregularly Observed Data, ed. E. Parzen, Lecture Notes in Statistics, Vol. 25 (New York: Springer)

    Google Scholar 

  • Martínez, V. J., & Saar, E. 2002, Statistics of the Galaxy Distribution (London: Chapman and Hall)

    Google Scholar 

  • McLachlan, G. J., & Krishnan, T. 2008, The EM Algorithm and Extensions (2nd ed.; Wiley-Interscience)

    Google Scholar 

  • Miller, C. J., Nichol, R. C., Genovese, C., & Wasserman, L. 2002, A nonparametric analysis of the cosmic microwave background power spectrum, Astrophys J, 565, L67–L70

    Article  ADS  Google Scholar 

  • Nason, G. 2008, Wavelet Methods in Statistics with R (New York: Springer)

    Book  MATH  Google Scholar 

  • Park, T., Kashyap, V. L., Siemiginowska, A., van Dyk, D. A., Zezas, A., Heinke, C., & Wargelin, B. J. 2006, Bayesian estimation of hardness ratios: Modeling and computations, Astrophys J, 652, 610–628

    Article  ADS  Google Scholar 

  • Percival, D.B., & Walden, A.T. 1993, Spectral Analysis for Physical Applications (Cambridge, UK: Cambridge University Press)

    Book  MATH  Google Scholar 

  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1986, Numerical Recipes: The Art of Scientific Computing (Cambridge, UK: Cambridge University Press)

    Google Scholar 

  • Protassov, R., van Dyk, D. A., Connors, A., Kashyap, V. L., & Siemiginowska, A. 2002, Statistics, handle with care: Detecting multiple model components with the likelihood ratio test, Astrophys J, 571, 545–559

    Article  ADS  Google Scholar 

  • R Development Core Team 2010, R: A Language and Environment for Statistical Computing (Vienna: R Foundation for Statistical Computing)

    Google Scholar 

  • Rao, C. R. 1997, Statistics and Truth: Putting Chance to Work (2nd ed.; World Scientific)

    Google Scholar 

  • Reegen, P. 2007, SigSpec. I. Frequency- and phase-resolved significance in Fourier space, Astron Astrophys, 467, 1353–1371

    Article  ADS  Google Scholar 

  • Rice, J. 1994, Mathematical Statistics and Data Analysis (2nd ed.; Duxbury Press)

    Google Scholar 

  • Ross, S. M. 2010, A First Course in Probability (10th ed.; Prentice Hall)

    Google Scholar 

  • Sarkar, D. 2008, Lattice: Multivariate Data Visualization with R (New York: Springer)

    MATH  Google Scholar 

  • Scargle, J. D. 1982, Studies in astronomical time series analysis. II: Statistical aspects of spectral analysis of unevenly spaced data, Astrophys J 263, 835–853

    Article  ADS  Google Scholar 

  • Schneider, D. P. et al. 2010, The Sloan Digital Sky Survey Quasar Catalog. V. Seventh Data Release, Astron J, 139, 2360–2373

    Article  ADS  Google Scholar 

  • Shumway, R. H., & Stoffer, D. S., 2006, Time Series Analysis and Its Applications with R Examples (2nd ed.; New York: Springer)

    Google Scholar 

  • Silverman, B. W. 1998, Density Estimation (London: Chapman and Hall)

    Google Scholar 

  • Simonetti, J. H., Cordes, J. M., & Heeschen, D. S. 1986, Flicker of extragalactic radio sources at two frequencies, Astrophys J 296, 46–59

    Article  ADS  Google Scholar 

  • Spergel, D. N., Verde, L., Peiris, H. V., Komatsu, E., Nolta, M. R., Bennett, C. L., Halpern, M., Hinshaw, G., Jarosik, N., Kogut, A., Limon, M., Meyer, S. S., Page, L., Tucker, G. S., Weiland, J. L., Wollack, E., & Wright, E. L. 2003, First-year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Determination of cosmological parameters, Astrophys J, 148, 175–194

    Article  ADS  Google Scholar 

  • Starck, J.-L., & Murtagh, F. 2006, Astronomical Image and Data Analysis (2nd ed.; New York: Springer)

    Google Scholar 

  • Stigler, S. M. 1986, The History of Statistics: The Measurement of Uncertainty Before 1900 (Harvard University Press)

    Google Scholar 

  • Takezawa, K. 2005, Introduction to Nonparametric Regression (New York: Wiley)

    Book  Google Scholar 

  • Wang, X., Woodroofe, M., Walker, M. G., Mateo, M., & Olszewski, E. 2005, Estimating Dark Matter distributions, Astrophys J, 626, 45–158

    Article  ADS  Google Scholar 

  • Way, M. J., Scargle, J. D., Ali, K., & Srivastava, A. N. (eds.), 2011, Advances in Machine Learning and Data Mining for Astronomy (London: Chapman and & Hall)

    Google Scholar 

  • Wasserman, L. 2005, All of Statistics: A Concise Course in Statistical Inference (New York: Springer)

    Google Scholar 

  • Wickham, H. 2009, ggplot2: Elegant Graphics for Data Analysis (2nd ed.; New York: Springer)

    Google Scholar 

  • Zoubir, A. M., & Iskander, D. R. 2004, Bootstrap Techniques for Signal Processing, (Cambridge, UK: Cambridge University Press)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this entry

Cite this entry

Feigelson, E.D., Babu, G.J. (2013). Statistical Methods for Astronomy. In: Oswalt, T.D., Bond, H.E. (eds) Planets, Stars and Stellar Systems. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5618-2_10

Download citation

Publish with us

Policies and ethics