Abstract
Statistical methodology, with deep roots in probability theory, provides quantitative procedures for extracting scientific knowledge from astronomical data and for testing astrophysical theory. In recent decades, statistics has enormously increased in scope and sophistication. After a historical perspective, this review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests, and point estimation. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are outlined. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and goodness of fit are considered.
Applied statistics relevant to astronomical research are briefly discussed. Nonparametric methods are valuable when little is known about the behavior of the astronomical populations or processes. Data smoothing can be achieved with kernel density estimation and nonparametric regression. Samples measured in many variables can be divided into distinct groups using unsupervised clustering or supervised classification procedures. Many classification and data mining techniques are available. Astronomical surveys subject to nondetections can be treated with survival analysis for censored data, with a few related procedures for truncated data. Astronomical light curves can be investigated using time-domain methods involving the autoregressive models, frequency-domain methods involving Fourier transforms, and state-space modeling. Methods for interpreting the spatial distributions of points in some space have been independently developed in astronomy and other fields.
Two types of resources for astronomers needing statistical information and tools are presented. First, about 40 recommended texts and monographs are listed covering various fields of statistics. Second, the public domain R statistical software system has recently emerged as a highly capable environment for statistical analysis. Together with its ∼ 3,000 (and growing) add-on CRAN packages, R implements a vast range of statistical procedures in a coherent high-level language with advanced graphics. Two illustrations of R’s capabilities for astronomical data analysis are given: an adaptive kernel estimator with bootstrap errors applied to a quasar dataset, and the second-order J function (related to the two-point correlation function) with three edge corrections applied to a galaxy redshift survey.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adler, J. 2010, R in a Nutshell: A Desktop Quick Reference (O’Reilly Media)
Babu, G. J. 1984, Bootstrapping statistics with linear combinations of chi-squares as weak limit, Sankhya, Ser A, 46, 85–93
Babu, G. J., & Singh, K. 1983, Inference on means using the bootstrap, Ann Stat, 11, 999–1003
Babu, G. J., & Singh, K. 1984, On one term Edgeworth correction by Efron’s bootstrap, Sankhya, Ser A, 46, 219–232
Baddeley, A. 2010, Analysing Spatial Point Patterns in R, http://www.spatstat.org
Bevington, P. R. 1969, Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill)
Bivand, R. S., Pebesma, E. J., & Gómez-Rubio, V. 2008, Applied Spatial Data Analysis with R (New York: Springer)
Bowman, A. W., & Azzalini, A. 1997, Applied Smoothing Techniques for Data Analysis (Clarendon)
Brown, L. D., Cai, T. T., & DasGupta, A. 2001, Interval estimation for a binomial proportion (with discussion), Stat Sci, 16, 101–133
Chatfield, C. 2004, The Analysis of Time Series: An Introduction (6th ed.; London: Chapman and Hall)
Chen, C.-h., Härdle, W., & Unwin, A. (eds.), 2008, Handbook of Data Visualization (New York: Springer)
Conover, W. J. 1999, Practical Nonparametric Statistics (New York: Wiley)
Cowan, G. 2006, The small-N problem in high energy physics, in Statistical Challenges in Modern Astronomy IV, ed. G. J. Babu, & E. D. Feigelson (New York: Springer), 75–86
Cowpertwait, P. S. P., & Metcalfe, A. V. 2009, Introductory Time Series with R (New York: Springer)
Cox, D. R. 2006, Principles of Statistical Inference (Cambridge, UK: Cambridge University Press)
Dalgaard, P. 2008, Introductory Statistics with R (New York: Springer)
Dempster, A. P., Laird, N. M., & Rubin, D. B. 1977, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc B, 39, 1–38
Drinkwater, M. J., Parker, Q. A., Proust, D. Slezak, E., & Quintana, H., 2004, Publ Astron Soc Aust, 21, 89
Duda, R. O., Hart, P. E., & Stork, D. G. 2001, Pattern Classification (2nd ed.; New York: Wiley)
Edelson, R. A., & Krolik, J. H. 1988, The discrete correlation function: A new method for analyzing unevenly sampled variability data, Astrophys J 333, 646–659
Efron, B., & Tibshirani, R. J. 1993, An Introduction to the Bootstrap (London: Chapman and Hall)
Evans, M., Hastings, N., & Peacock, B. 2000, Statistical Distributions (3rd ed.; New York: Wiley)
Everitt, B. S., Landau, S., & Leese, M. 2001, Cluster Analysis (4th ed.; Arnold)
Feigelson, E. D., & Babu, G. J. 2012, Modern Statistical Methods for Astronomy with R Applications (Cambridge, UK: Cambridge University Press)
Feigelson, E. D., & Nelson, P. I. 1985, Statistical methods for astronomical data with upper limits. I – Univariate distributions, Astrophys J 293, 192–206
Fortin, M.-J., & Dale, M. R. T. 2005, Spatial Analysis: A Guide for Ecologists (Cambridge, UK: Cambridge University Press)
Fisher, R. A. 1922, On the mathematical foundations of theoretical statistics, Philos Trans R Soc A, 222, 309–368
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. 2004, Bayesian Data Analysis (2nd ed.; London: Chapman and Hall)
Gregory, P. C. 2005, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support (Cambridge, UK: Cambridge University Press)
Hall, P., Li, Q., & Racine, J. S. 2007, Nonparametric estimation of regression functions in the presence of irrelevant regressors, Rev Econ Stat, 89, 784–789
Hastie, T., Tibshirani, R., & Friedman, J. 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.; New York: Springer)
Hayfield, T., & Racine, J. S. 2008, Nonparametric Econometrics: The np Package, J Stat Softw, 27(5), 1–32
Helsel, D. R. 2004, Nondetects and Data Analysis: Statistics for Censored Environmental Data (Wiley-Interscience)
Hogg, R. V., & Tanis, E. 2009, Probability and Statistical Inference (10th ed.; Prentice Hall)
Hou, A., Parker, L. C., Harris, W. E., & Wilman, D. J. 2009, Statistical tools for classifying galaxy group dynamics, Astrophys J, 702, 1199–1210
Illian, J., Penttinen, A., Stoyan, H., & Stoyan, D. 2008, Statistical Analysis and Modeling of Spatial Point Processes (Wiley-Interscience)
James, F. 2006, Statistical Methods in Experimental Physics (2nd ed.; World Scientific)
Johnson, N. L., Kotz, S., & Balakrishnan, N. 1994, Continuous Univariate Distributions, Vols. 1 and 2 (2nd ed.; Wiley-Interscience)
Johnson, R. A., & Wichern, D. W. 2007, Applied Multivariate Statistical Analysis (6th ed.; Prentice-Hall)
Kashyap, V. L., van Dyk, D. A., Connors, A., Freeman, P. E., Siemiginowska, A., Xu, J., & Zezas, A. 2010, On computing upper limits to source intensities, Astrophys J, 719, 900–914
Kelly, B. C. 2007, Some aspects of measurement error in linear regression of astronomical data, Astrophys J, 665, 1489–1506
Kitagawa, G., & Gersch, W. 1996, Smoothness Priors Analysis of Time Series (New York: Springer)
Klein, J. P., & Moeschberger, M. L. 2010, Survival Analysis: Techniques for Censored and Truncated Data (New York: Springer)
Kruschke, J. K. 2011, Doing Bayesian Data Analysis: A Tutorial with R and BUGS (Academic)
Kutner, M. H., Nachtsheim, C. J., & Neter, J. 2004, Applied Linear Regression Models (4th ed.; McGraw-Hill)
Lawless, J. F. 2002, Statistical Models and Methods for Lifetime Data (2nd ed.; New York: Wiley)
Lilliefors, H. W. 1969, On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown, J Am Stat Assoc, 64, 387–389
Loh, J. M. 2008, A valid and fast spatial bootstrap for correlation functions, Astrophys J, 681, 726–734
Lucy, L. B. 1974, An iterative technique for the rectification of observed distributions, Astron J, 79, 745–754
Lupton, R. 1993, Statistics in Theory and Practice (Princeton University Press)
Maindonald, J., & Braun, W. J. 2010, Data Analysis and Graphics Using R: An Example-Based Approach (3rd ed.; Cambridge, UK: Cambridge University Press)
Marquardt, D. W., & Acuff, S. K. 1984, Direct quadratic spectrum estimation with irregularly spaced data, in Time Series Analysis of Irregularly Observed Data, ed. E. Parzen, Lecture Notes in Statistics, Vol. 25 (New York: Springer)
Martínez, V. J., & Saar, E. 2002, Statistics of the Galaxy Distribution (London: Chapman and Hall)
McLachlan, G. J., & Krishnan, T. 2008, The EM Algorithm and Extensions (2nd ed.; Wiley-Interscience)
Miller, C. J., Nichol, R. C., Genovese, C., & Wasserman, L. 2002, A nonparametric analysis of the cosmic microwave background power spectrum, Astrophys J, 565, L67–L70
Nason, G. 2008, Wavelet Methods in Statistics with R (New York: Springer)
Park, T., Kashyap, V. L., Siemiginowska, A., van Dyk, D. A., Zezas, A., Heinke, C., & Wargelin, B. J. 2006, Bayesian estimation of hardness ratios: Modeling and computations, Astrophys J, 652, 610–628
Percival, D.B., & Walden, A.T. 1993, Spectral Analysis for Physical Applications (Cambridge, UK: Cambridge University Press)
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1986, Numerical Recipes: The Art of Scientific Computing (Cambridge, UK: Cambridge University Press)
Protassov, R., van Dyk, D. A., Connors, A., Kashyap, V. L., & Siemiginowska, A. 2002, Statistics, handle with care: Detecting multiple model components with the likelihood ratio test, Astrophys J, 571, 545–559
R Development Core Team 2010, R: A Language and Environment for Statistical Computing (Vienna: R Foundation for Statistical Computing)
Rao, C. R. 1997, Statistics and Truth: Putting Chance to Work (2nd ed.; World Scientific)
Reegen, P. 2007, SigSpec. I. Frequency- and phase-resolved significance in Fourier space, Astron Astrophys, 467, 1353–1371
Rice, J. 1994, Mathematical Statistics and Data Analysis (2nd ed.; Duxbury Press)
Ross, S. M. 2010, A First Course in Probability (10th ed.; Prentice Hall)
Sarkar, D. 2008, Lattice: Multivariate Data Visualization with R (New York: Springer)
Scargle, J. D. 1982, Studies in astronomical time series analysis. II: Statistical aspects of spectral analysis of unevenly spaced data, Astrophys J 263, 835–853
Schneider, D. P. et al. 2010, The Sloan Digital Sky Survey Quasar Catalog. V. Seventh Data Release, Astron J, 139, 2360–2373
Shumway, R. H., & Stoffer, D. S., 2006, Time Series Analysis and Its Applications with R Examples (2nd ed.; New York: Springer)
Silverman, B. W. 1998, Density Estimation (London: Chapman and Hall)
Simonetti, J. H., Cordes, J. M., & Heeschen, D. S. 1986, Flicker of extragalactic radio sources at two frequencies, Astrophys J 296, 46–59
Spergel, D. N., Verde, L., Peiris, H. V., Komatsu, E., Nolta, M. R., Bennett, C. L., Halpern, M., Hinshaw, G., Jarosik, N., Kogut, A., Limon, M., Meyer, S. S., Page, L., Tucker, G. S., Weiland, J. L., Wollack, E., & Wright, E. L. 2003, First-year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Determination of cosmological parameters, Astrophys J, 148, 175–194
Starck, J.-L., & Murtagh, F. 2006, Astronomical Image and Data Analysis (2nd ed.; New York: Springer)
Stigler, S. M. 1986, The History of Statistics: The Measurement of Uncertainty Before 1900 (Harvard University Press)
Takezawa, K. 2005, Introduction to Nonparametric Regression (New York: Wiley)
Wang, X., Woodroofe, M., Walker, M. G., Mateo, M., & Olszewski, E. 2005, Estimating Dark Matter distributions, Astrophys J, 626, 45–158
Way, M. J., Scargle, J. D., Ali, K., & Srivastava, A. N. (eds.), 2011, Advances in Machine Learning and Data Mining for Astronomy (London: Chapman and & Hall)
Wasserman, L. 2005, All of Statistics: A Concise Course in Statistical Inference (New York: Springer)
Wickham, H. 2009, ggplot2: Elegant Graphics for Data Analysis (2nd ed.; New York: Springer)
Zoubir, A. M., & Iskander, D. R. 2004, Bootstrap Techniques for Signal Processing, (Cambridge, UK: Cambridge University Press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this entry
Cite this entry
Feigelson, E.D., Babu, G.J. (2013). Statistical Methods for Astronomy. In: Oswalt, T.D., Bond, H.E. (eds) Planets, Stars and Stellar Systems. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5618-2_10
Download citation
DOI: https://doi.org/10.1007/978-94-007-5618-2_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5617-5
Online ISBN: 978-94-007-5618-2
eBook Packages: Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics