Skip to main content

Statistical Analysis and Modeling of Data

  • Chapter
  • First Online:
Computational Methods in Physics

Part of the book series: Graduate Texts in Physics ((GTP))

Abstract

We record the outcomes of physical measurements as signals (sequences of values), where we are not interested in each value in particular but the characteristics of the signal as a whole. Signals can be analyzed in the statistical sense, where the time ordering of the data is irrelevant, or in the functional sense, where it becomes essential: then we imagine that the signal (the measurement of a quantity) originates in the source (the dynamical system), and we may be able to infer the properties of that system from the properties of the signal. In this Chapter we introduce the basic methods of both approaches Gentle et al (eds) in Handbook of Computational Statistics, Concepts and Methods Springer, Berlin, 2004, [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J.E. Gentle, W. Härdle, Y. Mori (eds.), Handbook of Computational Statistics. Concepts and Methods (Springer, Berlin, 2004)

    MATH  Google Scholar 

  2. V. Barnett, T. Lewis, Outliers in Statistical Data, 3rd edn. (Wiley, New York, 1994)

    MATH  Google Scholar 

  3. R. Kandel, Our Changing Climate (McGraw-Hill, New York, 1991), p. 110

    Google Scholar 

  4. L. Davies, U. Gather, Robust Statistics, Chap. III.9 in [1], pp. 655–695

    Google Scholar 

  5. Analytical Methods Committee, Robust statistics – how not to reject outliers, Part 1: basic concepts. Analyst 114, 1693 (1989); Part 2: Inter-laboratory trials. Analyst 114, 1699 (1989)

    Google Scholar 

  6. V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41, art. 15 (2009)

    Article  Google Scholar 

  7. A. Patcha, J.-M. Park, An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51, 3448 (2007)

    Article  Google Scholar 

  8. M. Agyemang, K. Barker, R. Alhajj, A comprehensive survey of numeric and symbolic outlier mining techniques. Intell. Data Anal. 10, 521 (2006)

    Google Scholar 

  9. V.J. Hodge, J. Austin, A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85 (2004)

    Article  Google Scholar 

  10. L. Davies, U. Gather, The identification of multiple outliers. J. Am. Stat. Assoc. 88, 782 (1993); See also B. Iglewicz, J. Martinez, Outlier detection using robust measures of scale. J. Stat. Comput. Simul. 15, 285 (1982)

    Google Scholar 

  11. F.E. Grubbs, Procedures for detecting outlying observations in samples. Technometrics 11, 1 (1969)

    Article  Google Scholar 

  12. W.J. Dixon, Ratios involving extreme values. Ann. Math. Stat. 22, 68 (1951); W.J. Dixon, Analysis of extreme values. Ann. Math. Stat. 21, 488 (1950)

    Article  MathSciNet  Google Scholar 

  13. R.J. Beckman, R.D. Cook, Outlier..........s. Technometrics 25, 119 (1983)

    MathSciNet  MATH  Google Scholar 

  14. R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics. Theory and Methods (John Wiley & Sons, Chichester, 2006)

    Book  Google Scholar 

  15. M.R. Spiegel, Schaum’s Outline of Theory and Problems of Probability and Statistics (McGraw-Hill, New York, 1975)

    Google Scholar 

  16. S. Brandt, Data Analysis, 3rd edn. (Springer, New York, 1999)

    Book  Google Scholar 

  17. H.B. Mann, A. Wald, On the choice of the number of class intervals in the application of the chi square test. Ann. Math. Stat. 13, 306 (1942)

    Article  MathSciNet  Google Scholar 

  18. W.C.M. Kallenberg, J. Oosterhoff, B.F. Schriever, The number of classes in chi-squared goodness-of-fit tests, J. Am. Stat. Assoc. 80, 959 (1985) and references therein. See also W.C. Kallenberg, On moderate and large deviations in multinomial distributions. Ann. Stat. 13, 1554 (1985)

    Article  MathSciNet  Google Scholar 

  19. A. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione, Giornalo dell’Istituto Italiano degli Attuari 4, 461 (1933). Translated in A.N. Shiryayev (ed.), Selected Works of A.N. Kolmogorov, vol. II (Springer Science+Business Media, Dordrecht 1992), p. 139

    Google Scholar 

  20. N. Smirnov, Sur les écarts de la courbe de distribution empirique. Recreat. Math. 6, 3 (1939)

    MATH  Google Scholar 

  21. S. Facchinetti, A procedure to find exact critical values of Kolmogorov–Smirnov test. Stat. Appl. — Ital. J. Appl. Stat. 21, 337 (2009)

    Google Scholar 

  22. M.A. Stephens, Use of the Kolmogorov–Smirnov, Cramer–Von Mises and related statistics without extensive tables. J. R. Stat. Soc. B 32, 115 (1970)

    MATH  Google Scholar 

  23. S. Širca, Probability for Physicists (Springer International Publishing AG, Switzerland, 2016)

    Book  Google Scholar 

  24. A.F. Nikiforov, S.K. Suslov, V.B. Uvarov, Classical Orthogonal Polynomials of A Discrete Variable, Springer Series in Computational Physics (Springer, Berlin, 1991)

    Book  Google Scholar 

  25. W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes: The Art of Scientific Computing, 3rd edn. (Cambridge University Press, Cambridge, 2007); See also the equivalent handbooks in Fortran, Pascal and C, as well as http://www.nr.com

  26. C.A. Cantrell, Technical note: Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems. Atmos. Chem. Phys. 8, 5477 (2008)

    Article  ADS  Google Scholar 

  27. D. York et al., Unified equations for the slope, intercept, and standard errors of the best straight line. Am. J. Phys. 72, 367 (2004)

    Article  ADS  Google Scholar 

  28. K. Nakamura et al. (Particle Data Group), Review of particle physics. J. Phys. G 37, 075021 (2010). See Sect. 5 of the Introduction

    Article  ADS  Google Scholar 

  29. M.C. Ortiz, L.A. Sarabia, A. Herrero, Robust regression techniques. A useful alternative for the detection of outlier data in chemical analysis. Talanta 70, 499 (2006)

    Article  Google Scholar 

  30. J. Ferré, Regression diagnostics, Sect. 3.02 in the encyclopedia S.D. Brown, R. Tauler, B. Walczak (eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, vol. 3 (2009), p. 33

    Google Scholar 

  31. P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, Hoboken, 2003)

    MATH  Google Scholar 

  32. I. Barrodale, F.D.K. Roberts, An improved algorithm for discrete \(l_1\) linear approximation. SIAM J. Numer. Anal. 10, 839 (1973)

    Article  MathSciNet  Google Scholar 

  33. S. Portnoy, R. Koenker, The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat. Sci. 12, 279 (1997)

    Article  MathSciNet  Google Scholar 

  34. P.J. Rousseeuw, Least median of squares regression. J. Am. Stat. Assoc. 79, 871 (1984)

    Article  MathSciNet  Google Scholar 

  35. T. Bernholt, Computing the least median of squares estimator in time \({\cal{O}}(n^d)\), in Lecture Notes in Computer Science, ed. by O. Gervasi, et al., vol. 3480, (Springer, Berlin, 2005), p. 697

    Google Scholar 

  36. A. Stromberg, Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM J. Sci. Comp. 14, 1289 (1993)

    Article  Google Scholar 

  37. T.A. Boden, G. Marland, R.J. Andres, Global, regional, and national fossil-fuel \({\rm {CO}}_2\) emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, https://doi.org/10.3334/CDIAC/00001_V2015

  38. B.W. Rust, Fitting nature’s basic functions. Part I: polynomials and linear least squares, Comput. Sci. Eng. (Sep/Oct 2001), p. 84; Part II: estimating uncertainties and testing hypotheses, Comput. Sci. Eng. (Nov/Dec 2001), p. 60; Part III: exponentials, sinusoids, and nonlinear least squares, Comput. Sci. Eng. (Jul/Aug 2002), p. 72; Part IV: the variable projection algorithm, Comput. Sci. Eng. (Mar/Apr 2003), p. 74

    Google Scholar 

  39. A.J. Izenman, Modern Multivariate Statistical Techniques (Springer, Berlin, 2008)

    Book  Google Scholar 

  40. H. Swierenga, A.P. de Weijer, R.J. van Wijk, L.M.C. Buydens, Strategy for constructing robust multivariate calibration models. Chemom. Intell. Lab. Syst. 49, 1 (1999)

    Article  Google Scholar 

  41. I.T. Jolliffe, Principal Component Analysis, 2nd edn. (Springer, Berlin, 2002)

    MATH  Google Scholar 

  42. S. Roweis, Z. Ghahramani, A unifying review of linear Gaussian models. Neural Comput. 11, 305 (1999)

    Article  Google Scholar 

  43. A. Azzalini, A.W. Bowman, A look at some data on the old faithful geyser. J. R. Stat. Soc. C 39, 357 (1990)

    MATH  Google Scholar 

  44. A.K. Jain, M.N. Murty, Data clustering: a review. ACM Comput. Surv. 31, 264 (1999)

    Article  Google Scholar 

  45. W. Härdle, L. Simar, Applied Multivariate Statistical Analysis (Springer, Berlin, 2007)

    MATH  Google Scholar 

  46. R. Xu, D.C. Wunsch II, Clustering (Wiley, Hoboken, 2009)

    Google Scholar 

  47. G. Gan, C. Ma, J. Wu, Data Clustering. Theory, Algorithms, and Applications (SIAM, Philadelphia, 2007)

    Book  Google Scholar 

  48. J. Kogan, Introduction to Clustering Large and High-Dimensional Data (Cambridge University Press, Cambridge, 2007)

    MATH  Google Scholar 

  49. J. Valente de Oliveira, W. Pedrycz (eds.), Advances in Fuzzy Clustering and its Applications (Wiley, Chichester, 2007)

    Google Scholar 

  50. The R project for statistical computing, see http://www.r-project.org/. Attention: the R reference manual has approximately 3000 pages! A good introductory text for R is J. Maindonald, J. Braun, in Data Analysis and Graphics Using R, 2nd edn. (Cambridge University Press, Cambridge 2006). R is an open-source alternative to the S/S+ systems (“R is to S what Octave is to Matlab”)

  51. U. von Luxburg, A tutorial on spectral clustering, Max-Planck-Institut für biologische Kybernetik, Technical Report No. Tr-149, 2006

    Google Scholar 

  52. A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849 (2001); See also Ref. [11] in this paper

    Google Scholar 

  53. O.L. Mangasarian, W.N. Street, W.H. Wolberg, Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43, 570 (1995)

    Article  MathSciNet  Google Scholar 

  54. C. Wolf et al., A catalogue of the Chandra Deep Field South with multi-colour classification and photometric redshifts from COMBO-17. Astron. Astrophys. 421, 913 (2004); See also the update C. Wolf et al., Calibration update of the COMBO-17 CDFS catalogue. Astron. Astrophys. 492, 933 (2008)

    Article  ADS  Google Scholar 

  55. http://www.mpia.de/COMBO/combo_CDFSpublic.html. The data can be found at http://astrostatistics.psu.edu/datasets/COMBO17.html

  56. R.A. Reyment, K.G. Jöreskog, L.F. Marcus, Applied Factor Analysis in the Natural Sciences (Cambridge University Press, Cambridge, 1993)

    Book  Google Scholar 

  57. G. Pison, P.J. Rousseeuw, P. Filzmoser, C. Croux, Robust factor analysis. J. Multivar. Anal. 84, 145 (2003)

    Article  MathSciNet  Google Scholar 

  58. P. Filzmoser, K. Hron, C. Reimann, R. Garrett, Robust factor analysis for compositional data. Comput. Geosci. 35, 1854 (2009)

    Article  ADS  Google Scholar 

  59. C. Reimann, P. Filzmoser, R.G. Garrett, Factor analysis applied to regional geochemical data: problems and possibilities. Appl. Geochem. 17, 185 (2002)

    Article  Google Scholar 

  60. See http://lib.stat.cmu.edu/datasets/bodyfat, where all data is collected and the corresponding original literature is cited

  61. http://astro.temple.edu/~alan/MMST/datasets.html

  62. http://www.ntwrks.com/~mikev/chart1.html

  63. V.G. Sigillito, S.P. Wing, L.V. Hutton, K.B. Baker, Classification of radar returns from the ionosphere using neural networks, Johns Hopkins APL Tech. Dig. 10, 262 (1989). The corresponding data file can be found at http://archive.ics.uci.edu/ml/datasets.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Širca .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Širca, S., Horvat, M. (2018). Statistical Analysis and Modeling of Data. In: Computational Methods in Physics. Graduate Texts in Physics. Springer, Cham. https://doi.org/10.1007/978-3-319-78619-3_5

Download citation

Publish with us

Policies and ethics