Abstract
We record the outcomes of physical measurements as signals (sequences of values), where we are not interested in each value in particular but the characteristics of the signal as a whole. Signals can be analyzed in the statistical sense, where the time ordering of the data is irrelevant, or in the functional sense, where it becomes essential: then we imagine that the signal (the measurement of a quantity) originates in the source (the dynamical system), and we may be able to infer the properties of that system from the properties of the signal. In this Chapter we introduce the basic methods of both approaches Gentle et al (eds) in Handbook of Computational Statistics, Concepts and Methods Springer, Berlin, 2004, [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J.E. Gentle, W. Härdle, Y. Mori (eds.), Handbook of Computational Statistics. Concepts and Methods (Springer, Berlin, 2004)
V. Barnett, T. Lewis, Outliers in Statistical Data, 3rd edn. (Wiley, New York, 1994)
R. Kandel, Our Changing Climate (McGraw-Hill, New York, 1991), p. 110
L. Davies, U. Gather, Robust Statistics, Chap. III.9 in [1], pp. 655–695
Analytical Methods Committee, Robust statistics – how not to reject outliers, Part 1: basic concepts. Analyst 114, 1693 (1989); Part 2: Inter-laboratory trials. Analyst 114, 1699 (1989)
V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41, art. 15 (2009)
A. Patcha, J.-M. Park, An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51, 3448 (2007)
M. Agyemang, K. Barker, R. Alhajj, A comprehensive survey of numeric and symbolic outlier mining techniques. Intell. Data Anal. 10, 521 (2006)
V.J. Hodge, J. Austin, A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85 (2004)
L. Davies, U. Gather, The identification of multiple outliers. J. Am. Stat. Assoc. 88, 782 (1993); See also B. Iglewicz, J. Martinez, Outlier detection using robust measures of scale. J. Stat. Comput. Simul. 15, 285 (1982)
F.E. Grubbs, Procedures for detecting outlying observations in samples. Technometrics 11, 1 (1969)
W.J. Dixon, Ratios involving extreme values. Ann. Math. Stat. 22, 68 (1951); W.J. Dixon, Analysis of extreme values. Ann. Math. Stat. 21, 488 (1950)
R.J. Beckman, R.D. Cook, Outlier..........s. Technometrics 25, 119 (1983)
R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics. Theory and Methods (John Wiley & Sons, Chichester, 2006)
M.R. Spiegel, Schaum’s Outline of Theory and Problems of Probability and Statistics (McGraw-Hill, New York, 1975)
S. Brandt, Data Analysis, 3rd edn. (Springer, New York, 1999)
H.B. Mann, A. Wald, On the choice of the number of class intervals in the application of the chi square test. Ann. Math. Stat. 13, 306 (1942)
W.C.M. Kallenberg, J. Oosterhoff, B.F. Schriever, The number of classes in chi-squared goodness-of-fit tests, J. Am. Stat. Assoc. 80, 959 (1985) and references therein. See also W.C. Kallenberg, On moderate and large deviations in multinomial distributions. Ann. Stat. 13, 1554 (1985)
A. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione, Giornalo dell’Istituto Italiano degli Attuari 4, 461 (1933). Translated in A.N. Shiryayev (ed.), Selected Works of A.N. Kolmogorov, vol. II (Springer Science+Business Media, Dordrecht 1992), p. 139
N. Smirnov, Sur les écarts de la courbe de distribution empirique. Recreat. Math. 6, 3 (1939)
S. Facchinetti, A procedure to find exact critical values of Kolmogorov–Smirnov test. Stat. Appl. — Ital. J. Appl. Stat. 21, 337 (2009)
M.A. Stephens, Use of the Kolmogorov–Smirnov, Cramer–Von Mises and related statistics without extensive tables. J. R. Stat. Soc. B 32, 115 (1970)
S. Širca, Probability for Physicists (Springer International Publishing AG, Switzerland, 2016)
A.F. Nikiforov, S.K. Suslov, V.B. Uvarov, Classical Orthogonal Polynomials of A Discrete Variable, Springer Series in Computational Physics (Springer, Berlin, 1991)
W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes: The Art of Scientific Computing, 3rd edn. (Cambridge University Press, Cambridge, 2007); See also the equivalent handbooks in Fortran, Pascal and C, as well as http://www.nr.com
C.A. Cantrell, Technical note: Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems. Atmos. Chem. Phys. 8, 5477 (2008)
D. York et al., Unified equations for the slope, intercept, and standard errors of the best straight line. Am. J. Phys. 72, 367 (2004)
K. Nakamura et al. (Particle Data Group), Review of particle physics. J. Phys. G 37, 075021 (2010). See Sect. 5 of the Introduction
M.C. Ortiz, L.A. Sarabia, A. Herrero, Robust regression techniques. A useful alternative for the detection of outlier data in chemical analysis. Talanta 70, 499 (2006)
J. Ferré, Regression diagnostics, Sect. 3.02 in the encyclopedia S.D. Brown, R. Tauler, B. Walczak (eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, vol. 3 (2009), p. 33
P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, Hoboken, 2003)
I. Barrodale, F.D.K. Roberts, An improved algorithm for discrete \(l_1\) linear approximation. SIAM J. Numer. Anal. 10, 839 (1973)
S. Portnoy, R. Koenker, The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat. Sci. 12, 279 (1997)
P.J. Rousseeuw, Least median of squares regression. J. Am. Stat. Assoc. 79, 871 (1984)
T. Bernholt, Computing the least median of squares estimator in time \({\cal{O}}(n^d)\), in Lecture Notes in Computer Science, ed. by O. Gervasi, et al., vol. 3480, (Springer, Berlin, 2005), p. 697
A. Stromberg, Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM J. Sci. Comp. 14, 1289 (1993)
T.A. Boden, G. Marland, R.J. Andres, Global, regional, and national fossil-fuel \({\rm {CO}}_2\) emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, https://doi.org/10.3334/CDIAC/00001_V2015
B.W. Rust, Fitting nature’s basic functions. Part I: polynomials and linear least squares, Comput. Sci. Eng. (Sep/Oct 2001), p. 84; Part II: estimating uncertainties and testing hypotheses, Comput. Sci. Eng. (Nov/Dec 2001), p. 60; Part III: exponentials, sinusoids, and nonlinear least squares, Comput. Sci. Eng. (Jul/Aug 2002), p. 72; Part IV: the variable projection algorithm, Comput. Sci. Eng. (Mar/Apr 2003), p. 74
A.J. Izenman, Modern Multivariate Statistical Techniques (Springer, Berlin, 2008)
H. Swierenga, A.P. de Weijer, R.J. van Wijk, L.M.C. Buydens, Strategy for constructing robust multivariate calibration models. Chemom. Intell. Lab. Syst. 49, 1 (1999)
I.T. Jolliffe, Principal Component Analysis, 2nd edn. (Springer, Berlin, 2002)
S. Roweis, Z. Ghahramani, A unifying review of linear Gaussian models. Neural Comput. 11, 305 (1999)
A. Azzalini, A.W. Bowman, A look at some data on the old faithful geyser. J. R. Stat. Soc. C 39, 357 (1990)
A.K. Jain, M.N. Murty, Data clustering: a review. ACM Comput. Surv. 31, 264 (1999)
W. Härdle, L. Simar, Applied Multivariate Statistical Analysis (Springer, Berlin, 2007)
R. Xu, D.C. Wunsch II, Clustering (Wiley, Hoboken, 2009)
G. Gan, C. Ma, J. Wu, Data Clustering. Theory, Algorithms, and Applications (SIAM, Philadelphia, 2007)
J. Kogan, Introduction to Clustering Large and High-Dimensional Data (Cambridge University Press, Cambridge, 2007)
J. Valente de Oliveira, W. Pedrycz (eds.), Advances in Fuzzy Clustering and its Applications (Wiley, Chichester, 2007)
The R project for statistical computing, see http://www.r-project.org/. Attention: the R reference manual has approximately 3000 pages! A good introductory text for R is J. Maindonald, J. Braun, in Data Analysis and Graphics Using R, 2nd edn. (Cambridge University Press, Cambridge 2006). R is an open-source alternative to the S/S+ systems (“R is to S what Octave is to Matlab”)
U. von Luxburg, A tutorial on spectral clustering, Max-Planck-Institut für biologische Kybernetik, Technical Report No. Tr-149, 2006
A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849 (2001); See also Ref. [11] in this paper
O.L. Mangasarian, W.N. Street, W.H. Wolberg, Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43, 570 (1995)
C. Wolf et al., A catalogue of the Chandra Deep Field South with multi-colour classification and photometric redshifts from COMBO-17. Astron. Astrophys. 421, 913 (2004); See also the update C. Wolf et al., Calibration update of the COMBO-17 CDFS catalogue. Astron. Astrophys. 492, 933 (2008)
http://www.mpia.de/COMBO/combo_CDFSpublic.html. The data can be found at http://astrostatistics.psu.edu/datasets/COMBO17.html
R.A. Reyment, K.G. Jöreskog, L.F. Marcus, Applied Factor Analysis in the Natural Sciences (Cambridge University Press, Cambridge, 1993)
G. Pison, P.J. Rousseeuw, P. Filzmoser, C. Croux, Robust factor analysis. J. Multivar. Anal. 84, 145 (2003)
P. Filzmoser, K. Hron, C. Reimann, R. Garrett, Robust factor analysis for compositional data. Comput. Geosci. 35, 1854 (2009)
C. Reimann, P. Filzmoser, R.G. Garrett, Factor analysis applied to regional geochemical data: problems and possibilities. Appl. Geochem. 17, 185 (2002)
See http://lib.stat.cmu.edu/datasets/bodyfat, where all data is collected and the corresponding original literature is cited
V.G. Sigillito, S.P. Wing, L.V. Hutton, K.B. Baker, Classification of radar returns from the ionosphere using neural networks, Johns Hopkins APL Tech. Dig. 10, 262 (1989). The corresponding data file can be found at http://archive.ics.uci.edu/ml/datasets.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Širca, S., Horvat, M. (2018). Statistical Analysis and Modeling of Data. In: Computational Methods in Physics. Graduate Texts in Physics. Springer, Cham. https://doi.org/10.1007/978-3-319-78619-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-78619-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78618-6
Online ISBN: 978-3-319-78619-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)