How to correctly apply Gaussian statistics in a non-stationary climate?

Steinacker, Reinhold

doi:10.1007/s00704-021-03601-4

How to correctly apply Gaussian statistics in a non-stationary climate?

Original Paper
Open access
Published: 05 April 2021

Volume 144, pages 1363–1374, (2021)
Cite this article

Download PDF

You have full access to this open access article

Theoretical and Applied Climatology Aims and scope Submit manuscript

How to correctly apply Gaussian statistics in a non-stationary climate?

Download PDF

Reinhold Steinacker ORCID: orcid.org/0000-0002-6868-9268^1,2

1602 Accesses
4 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

Time series with a significant trend, as is now being the case for the temperature in the course of climate change, need a careful approach for statistical evaluations. Climatological means and moments are usually taken from past data which means that the statistics does not fit to actual data anymore. Therefore, we need to determine the long-term trend before comparing actual data with the actual climate. This is not an easy task, because the determination of the signal—a climatic trend—is influenced by the random scatter of observed data. Different filter methods are tested upon their quality to obtain realistic smoothed trends of observed time series. A new method is proposed, which is based on a variational principle. It outperforms other conventional methods of smoothing, especially if periodic time series are processed. This new methodology is used to test, how extreme the temperature of 2018 in Vienna actually was. It is shown that the new annual temperature record of 2018 is not too extreme, if we consider the positive trend of the last decades. Also, the daily mean temperatures of 2018 are not found to be really extreme according to the present climate. The real extreme of the temperature record of Vienna—and many other places around the world—is the strongly increased positive temperature trend over the last years.

Structural time-series modelling for seasonal surface air temperature patterns in India 1951–2016

Article 02 March 2020

Effects of variance adjustment techniques and time-invariant transfer functions on heat wave duration indices and other metrics derived from downscaled time-series. Study case: Montreal, Canada

Article Open access 27 May 2016

Statistical Variability and Persistence Change in Daily Air Temperature Time Series from High Latitude Arctic Stations

Article 03 July 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The average temperature 2018 in Vienna resulted in 13.0 °C, an unprecedented value, nearly 1 degree higher than the warmest year in the whole 244-year time series since 1775 (Hiebl et al. 2019). If we compare this value with the mean temperature of the last climate normal period 1961–1990 ($ \hat{T}=9.7{}^{\circ}\mathrm{C} $), it yields a positive anomaly of +3.3 °C. This corresponds to more than 4 times the standard deviation (SDEV) of that period. Even if we compare the 13.0°C with the average of the last completed three decades 1981–2010 ($ \hat{T}=10.5{}^{\circ}C\Big) $, the deviation is +2.5 °C, corresponding to more than three times the SDEV of that period. Taking longer periods, e.g., the twentieth century or the whole 244-year series (including the year 2018), the deviation was even larger, roughly 5 SDEV and 4 SDEV respectively. Hence, the year 2018 is commonly be seen as extraordinary and an extreme outlier temperature-wise. But is this simple statistical evaluation adequate?

The World Meteorological Organization (WMO) has defined a climate normal period long ago as a 30-year period (WMO 1989). Starting in 1900, the periods 1901–1930, 1931–1960, and 1961–1990 have been defined and the next normal period 1991–2020 will be used from 2021 onwards. Due to the rapid global temperature increase now also the use of 30-year overlapping periods (e.g., 1981–2010) are recommended (WMO 2007). A thirty year period has been neatly chosen for temperature, because on one hand it is long enough to give a statistically reasonable mean value and statistical moments, on the other hand it is short enough, that climate trends do not have a strong impact on the statistical moments (Angel et al. 1993). This was certainly true until the eighties of the twentieth century, when the climatological global temperature trends were never larger than a few tenths of a degree within two consecutive 30-year periods. Hence, the climatological time series were close enough to stationarity. Since around 1980 however, the temperature trends, locally, regionally as well as globally have significantly increased in the course of global warming. Hence, for the statistical treatment and interpretation of recent temperature time series, e.g., the determination of statistical moments, we have to consider trends.

In Section 2, different methods to smooth (filter) time series are discussed and a new method is introduced, which allows a reasonable determination of trends from the very beginning until the very end of a time series. In Section 3, the application of the filter is used to compute the trend of the temperature time series of Vienna and to determine time dependent mean values and the corresponding statistical moments. The concluding Section 4 gives suggestions, how to treat time series and to receive adequate statistical measures in times of significant climate trends.

2 Filtering time series

There are many ways how to filter time series (Hamming 1989; Schönwiese 2013; Shumway and Stoffer 2006). The simplest—and mathematically cheapest—is the application of MOVing Average Smoothing, furthermore denoted as MOVAS. This method, however, has some unwelcome properties. Part of high frequency variations may still be present after smoothing and the spectral response may even invert the phase of some wavelike variations. A further problem is the fact, that moving averages do not allow to determine values at both ends of a time series, as long as we stay with centered averaging. If we continue the averaging until the very end of the series, we have to take one-sided averages, which usually lead to a significant underestimation of the magnitude of a trend. The problem of determining reasonable trends at the end of time series has been extensively investigated (e.g., Arguez et al. 2008; Mann 2004, 2008). Besides the determination of trends at the end of time series also trend extrapolation is applied in many scientific disciplines (Box and Jenkins 1970; Livezey et al. 2007).

A better method than MOVAS, which is frequently used to filter climatological time series (e.g., ZAMG 2019) is the Gaussian Smoothing filter, furthermore denoted by GAUSS. Here, the averaging is performed by weighting the data within an averaging window according to the Gaussian frequency distribution. Such a Gaussian filter smooths out the high frequency parts of a time series much better than MOVAS but still does not allow to get reasonable trends at the ends of a time series. Other frequently used time filters in many fields of science are the closely related Locally Estimated Scatterplot Smoothing (LOESS) and the Locally Weighted Scatterplot Smoothing (LOWESS) (Cleveland 1981; Savitzky and Golay 1964). Within an arbitrary time window usually a quadratic polynomial is used for LOESS and a linear polynomial is fitted to weighted data for LOWESS. In addition a LOWESS2 has been investigated here, which takes a weighted quadratic polynomial fit. The suggested weighting function is given by

$$ w\left(\tau \right)=\left\{\begin{array}{c}{\left(1-\left|{\left(\frac{\tau }{\varDelta \tau}\right)}^3\right|\right)}^3\kern0.5em \mathrm{if}\ \left|\frac{\tau }{\varDelta \tau}\right|\le 1\\ {}0\kern5.25em \mathrm{otherwise}\kern0.5em \end{array}\right., $$

(1)

where τ is the time difference with respect to the center of the particular time window and Δτ represents the half-width of the smoothing time window. This function has some similarity to the Gaussian function. Another smoothing filter is based on a selective cutoff frequency of harmonic waves. Only the lowest b harmonics of a time series are considered, whereas the higher harmonics are completely disregarded. For a periodic temperature time series, this can be expressed as:

$$ \overset{\sim }{T}(t)=\frac{1}{n}\sum \limits_{t=1}^nT(t)+\sum \limits_{k=1}^b{T}_{k,\mathrm{s}}\sin \left(\frac{2\pi kt}{n}\right)+\sum \limits_{k=1}^b{T}_{k,\mathrm{c}}\cos \left(\frac{2\pi kt}{n}\right), $$

(2)

where $ \overset{\sim }{T}(t) $ denotes the smoothed value of the variable at time t and T_{k, s} and T_{k, c} denote the coefficients (amplitudes) of the kth sine and cosine harmonics, which are determined by a least square method. For a non-periodic time series, the first term on the right hand side of Eq. 2 can be replaced by the intercept and slope of the linear regression of T(t) with regard to time. This type of filter is denoted by SPECtral Smoothing filter (SPECS) in the following. A new filter method proposed here is the mean value smoothing spline, furthermore denoted by MEVSS. For that, discrete values $ \tilde{T}(t) $ of the smoothest possible function are searched, which equal the mean value of T(t) of the original time series within arbitrary, non-overlapping smoothing time windows. Mathematically this can be expressed for a time series with n data points by the condition:

$$ {\displaystyle \begin{array}{l}\sum \limits_{t=2}^{n-1}{\left[\overset{\sim }{T}\left(t-1\right)-2\overset{\sim }{T}(t)+\overset{\sim }{T}\left(t+1\right)\right]}^2\to \mathit{\operatorname{Min}}\kern1em \mathrm{for}\ \mathrm{non}-\mathrm{periodic}\ \mathrm{time}\ \mathrm{series}\ \mathrm{and}\\ {}\sum \limits_{t=1}^n{\left[\overset{\sim }{T}\left(t-1\right)-2\overset{\sim }{T}(t)+\overset{\sim }{T}\left(t+1\right)\right]}^2\to \mathit{\operatorname{Min}}\kern1em \mathrm{for}\ \mathrm{periodic}\ \mathrm{time}\ \mathrm{series},\end{array}} $$

(3)

where we set for the latter $ \tilde{T}(0)=\tilde{T}(n) $ and $ \tilde{T}\left(n+1\right)=\tilde{T}(1) $.

The constraint for Eq. 3 for all i non-overlapping smoothing time intervals is

$$ {\lambda}_i\sum \limits_{t={m}_i-\varDelta {t}_i}^{m_i+\varDelta {t}_i}\overset{\sim }{T}(t)={\lambda}_i\sum \limits_{t={m}_i-\varDelta {t}_i}^{m_i+\varDelta {t}_i}T(t) $$

(4)

where m_i denotes the center data point and Δt_i is the half-width of the ith smoothing interval of the time series and λ_i are Lagrangian multipliers. For open time series, the first/last smoothing interval must start/end with the time series, whereas for periodic series no such limitation exists. The smoothed values for MEVSS are yielded by deriving Eqs. 3 and 4 by all $ \tilde{T}(t) $ and λ_i and solving the resulting set of n + i linear equations. Even if the number of data points is large, the numerical solution is very effective and fast, because the matrix of coefficients is an extremely sparse matrix. It should be noted that MEVSS is not equal to the conventional smoothing spline procedure, also known as variational interpolation (Sasaki 1955), furthermore denoted in this paper by SPLIS. There, a function is searched for by the condition

$$ \sum \limits_{t=2}^{n-1}{\left(\overset{\sim }{T}\left(t-1\right)-2\overset{\sim }{T}(t)+\overset{\sim }{T}\left(t+1\right)\right)}^2+\gamma \sum \limits_{t=1}^n{\left(\overset{\sim }{T}(t)-T(t)\right)}^2\to \mathit{\operatorname{Min}}. $$

(5)

γ determines the degree of smoothness of the filtered time series. The smaller γ, the smoother is the filtered time series.

To study the effect of different smoothing procedures, we take two synthetic temperature time series with 360 discrete points of T(t). The first is based on an analytic sinusoidal function T_s(t) superposed by a random Gaussian noise T_n(t). This is similar to an annual time series of a multi-year average of the daily mean, minimum or maximum temperature with the time increment of 1 day (d). In contrast to a single year daily temperature record, two consecutive days in an average multi-year record usually are hardly correlated. Figure 1 shows such a synthetic periodic time series for 360 days with

$$ T(t)={T}_0+{T}_s(t)+{T}_n(t)=10{}^{\circ}C+10{}^{\circ}C\ast \sin \left(\frac{2\pi t}{360d}\right)+{T}_n(t). $$

(6)

The Gaussian distributed random noise has a standard deviation of 2.

The second synthetic time series consists of an exponential function T_e(t) superposed by a random Gaussian noise with a standard deviation of 1. This is similar to a very long (360 years) time series of annual mean temperatures with an increasing positive temperature trend at the end and a time increment of 1 year (a) (see Fig. 2):

$$ T(t)={T}_0+{T}_e(t)+{T}_n(t)=8{}^{\circ}C+2{}^{\circ}C\ast \exp {\left[t/360a\right]}^4+{T}_n(t) $$

(7)

An optimal smoothing filter should keep the signal (sine or exponential) un-modified and remove or damp the noise as much as possible.

To compare the different smoothing methods, we have to select adequate time windows and parameter settings. Figure 3 shows the filter response (weights) of a time series where all data points are set to zero T(t ≠ 0) = 0 except at the center, where the value is set to unity T(t = 0) = 1. The width of the response around the center point should be similar for all applied filters to allow a meaningful comparison. To fulfill this criterion, the smoothing window for MOVAS is set to 31 time increments, the standard deviation for GAUSS is set to 10, and the whole window spans over 61 time increments. LOWESS, LOESS, and LOWESS2 use a half-width of 25 time increments which results according to (1) also in a total smoothing window of 51. For SPECS and SPLIS, the whole domain is considered, for SPECS, the first 6 harmonics are used, and the smoothing parameter for SPLIS γ = 6*10⁻⁴ is chosen. The averaging window for MEVSS has been selected as 30 time increments. In Table 1, some statistical results are shown to compare the quality of the different smoothing methods. Four different smoothing intervals were chosen, 0.5, 1, 2, and 3 times the window settings of Fig. 1. For the sine function, periodic boundaries are used and for the exponential function, reduced windows are used; i.e., for MOVAS, GAUSS, LOWESS, LOESS, and LOWESS2, one-sided smoothed values are computed at both ends. For MEVSS, it is not necessary to compute the smoothed values for all possible time increments, because the results for time windows shifted by one time increment are quite similar. Only three different smoothing settings, shifted by 1/3 of the averaging window, were used here for the full domain and the resulting values have then been averaged. For the 30 time increment window of the periodic sine (left) and the non-periodic exponential function (right), only the mean values for the following windows were computed for the constraint (Eq. 3) of MEVSS:

$$ {\displaystyle \begin{array}{l}\mathrm{Window}\ \mathrm{setting}\kern4em \mathrm{periodic}\ \mathrm{sine}\kern6em \mathrm{non}-\mathrm{periodic}\ \mathrm{exponential}\\ {}{1}^{\mathrm{st}}\kern6em \sum \limits_{t=1}^{30}{T}_t,\sum \limits_{t=31}^{60}{T}_t,\dots, \sum \limits_{t=331}^{360}{T}_t\kern4.75em \sum \limits_{t=1}^{30}{T}_t,\sum \limits_{t=31}^{60}{T}_t,\dots, \sum \limits_{t=331}^{360}{T}_t\\ {}{2}^{\mathrm{nd}}\kern4em \sum \limits_{t=1}^{20}{T}_t+\sum \limits_{t=351}^{360}{T}_t,\sum \limits_{t=21}^{50}{T}_t,\dots, \sum \limits_{t=321}^{350}{T}_t\kern1.70em \sum \limits_{t=1}^{20}{T}_t,\sum \limits_{t=21}^{50}{T}_t,\dots, \sum \limits_{t=321}^{360}{T}_t\\ {}\begin{array}{cc}{3}^{\mathrm{rd}}\kern4em \sum \limits_{t=11}^{40}{T}_t,\sum \limits_{t=41}^{70}{T}_t,\dots, \sum \limits_{t=341}^{360}{T}_t+\sum \limits_{t=1}^{10}{T}_t\kern1.7em & \sum \limits_{t=1}^{40}{T}_t,\sum \limits_{t=41}^{70}{T}_t,\dots, \sum \limits_{t=341}^{360}{T}_t\end{array}\end{array}} $$

(8)

Table 1 Comparison of the quality of different smoothing methods. All numbers are given in percent. Sine damping is defined as$ 100\ast \left[1-\frac{\left({\overset{\sim }{T}}_{s,\max }-{\overset{\sim }{T}}_{s,\min}\right)}{\left({T}_{s,\max }-{T}_{s,\min}\right)}\right] $, a value of < 1 means an amplitude error < 0.1 °C. Sine fit is defined as $ 100\ast \left[1-\frac{RMSE\left({\overset{\sim }{T}}_s(t)-{T}_s(t)\right)}{0,5\ast \left({T}_{s,\max }-{T}_{s,\min}\right)}\right] $, a value of > 99 means an RMSE < 0.1 °C. Noise damping is defined as $ 100\ast \left[1-\frac{RMSE\left(\overset{\sim }{T}(t)-{T}_s(t)\right)}{RMSE\left(T(t)-{T}_s(t)\right)}\right] $, a value > 90 means a reduction of the RMSE of the noise to < 0.2 °C. Sine noise fit is defined as $ 100\ast \left[1-\frac{RMSE\left(\overset{\sim }{T}(t)-{T}_s(t)\right)}{0,5\ast \left({T}_{s,\max }-{T}_{s,\min}\right)}\right] $, a value of > 98 means an RMSE of (T_t − T_{s, t}) < 0.2 °C. The interdiurnal RMSE is defined as $ 100\ast \left[1-\frac{RMSE\left(\overset{\sim }{T}\left(t+1\right)-\overset{\sim }{T}(t)\right)}{RMSE\left({T}_s\left(t+1\right)-{T}_s(t)\right)}\right] $, a value of ± 8.0 means an RMSE of ± 0.01 °C with respect to the interdiurnal RMSE of the pure sine function. Exp fit is defined as $ 100\ast \left[1-\frac{RMSE\left({\overset{\sim }{T}}_e(t)-{T}_e(t)\right)}{\left({T}_{e,\max }-{T}_{e.\min}\right)}\right] $, a value > 97 means an RMSE of < 0.1 °C. Exp + noise fit is defined as $ 100\ast \left[1-\frac{RMSE\left(\overset{\sim }{T}(t)-{T}_e(t)\right)}{\left({T}_{e,\max }-{T}_{e,\min}\right)}\right] $, a value > 94 corresponds to an RMSE of (T(t) − T_e(t)) < 0.2 °C. End offset is defined as $ 100\ast \frac{\overset{\sim }{T}\left(t=360\right)-{T}_e\left(t=360\right)}{\left({T}_{e,\max }-{T}_{e,\min}\right)} $, a value within ± 7 means an error of the last data point less than ± 0.2 °C against the pure exponential function. End trend is defined as $ 100\ast \left[1-\frac{\overset{\sim }{T}\left(t=360\right)-\overset{\sim }{T}\left(t=359\right)}{T_e\left(t=360\right)-{T}_e\left(t=359\right)}\right] $, a value of ± 10 means an error of ± 0.2 °C/30 years. The interannual RMSE is defined as $ 100\ast \left[1-\frac{RMSE\left(\overset{\sim }{T}\left(t+1\right)-\overset{\sim }{T}(t)\right)}{RMSE\left({T}_e\left(t+1\right)-{T}_e(t)\right)}\right] $, a value of ± 25.0 means an RMSE of ± 0.01 °C with respect to the interannual RMSE of the pure exponential function

Full size table

This means that for an open domain at both ends, time windows with different lengths are being chosen.

It comes out clearly from Table 1 that for MOVAS, GAUSS, and LOWESS, the smoothing leads to a damping of the (sine) amplitude, especially for wide smoothing intervals. Hence, for the smoothing of diurnal or annual (periodic) time series, these methods are not recommendable. LOESS, LOWESS2, SPECS, and MEVSS do much better in this perspective. The same is true for the sine fit, i.e., that the whole sine function is not much modified by the latter four smoothing methods. According to the damping of the noise, however, the latter methods behave differently. Although the noise generated for this test is only one possible realization, a comparison between the different smoothing algorithms seems to be nevertheless possible. Especially for short smoothing periods, LOESS and LOWESS2 do not filter the noise very well. It might be surprising that SPECS, which is conserving the sine function perfectly, does not filter the noise too well. This is due to random wave components, which are contained in the noise. For short smoothing windows, GAUSS and for long smoothing windows MEVSS are performing best. This can be understood, because GAUSS and MEVSS are using mean values, which are determined by setting the positive and negative deviations equal. LOESS, LOWESS, LOWESS2, SPECS and SPLIS use squared deviations (regression) and are hence more sensitive to extremes. The same is true for filtering the sine function plus noise. The interincremental RMSE, i.e., the day to day variation of the smoothed sine plus noise is much higher as compared to the pure sine especially for MOVAS, LOESS, and LOWESS2 with short averaging windows. For long windows, GAUSS and LOWESS even show a smaller RMSE than the pure sine. This is due to the significant damping of the sine function. For an open domain it comes out clearly, that the smoothing of the exponential function does not give reasonable values for the end of the time series by MOVAS and GAUSS. The offset becomes increasingly large for long smoothing intervals. If we consider the performance of the smoothing methods against noisy data at the ends of an open domain, it is surprising, how strong the noise—with an RMS of only 30% of the total exponential increase—changes the offset. The impact of noise seems to be least for the MEVSS method especially for medium smoothing windows. The trend at the end of an open domain, i.e., the difference between the last and the second last date, is reasonably captured only by LOESS, LOWESS, LOWESS2, and by MEVSS for short to medium averaging intervals. Noisy data do not allow to get reasonable estimates for the end trend of a time series, except for short to medium smoothing windows by MEVSS or for long windows by LOWESS2. For wide smoothing intervals and periodic domains, MEVSS outperforms all other methods because it smooths the noise best and keeps the signal rather unchanged. Hence, we have chosen MEVSS for smoothing the long-term annual temperature time series and for the mean daily temperature series.

3 Application of MEVSS to long-term time series

The 244-year time series of annual mean temperatures of Vienna (see Fig. 4) has been smoothed with the 30-year MEVSS method (see Fig. 5). In addition, the 25-year (half-width) LOWESS smoothing has been carried out, which agrees to less than 0.1 °C to the MEVSS curve except at the very beginning and end of the series. Due to the very high value in 2018, the LOWESS curve ends with a 0.26 °C higher value at the end and due to the two low values in 1776 and 1777 it begins with a 0.17 °C lower value. A more conservative value at the beginning and the end of an open time series, like produced by MEVSS, seems to be the preferable way to avoid an exaggerated trend.

If we compute the mean temperature of all climate normal periods 1781–1810, 1811–1840, … and ending with the only 28-year period of 1991–2018 (Table 2), we see that until 1931–1960, all values were close to 9 °C. 1961–1990 and especially 1991–2018 show an increasing mean temperature due to global warming. Looking at the SDEV of the periods, we see somewhat larger values up to 1870 and in the recent period 1990–2018. Does that mean the variability of annual mean temperatures was higher (more extreme) in the early and recent years of the series? Not necessarily, because to determine the true climatic SDEV, we need to de-trend time series. If we do so and take the deviations of the annual mean temperature from the smooth MEVSS curve (lower part of Fig. 5) and compute the root mean square differences (RMSD (MEVSS), Table 2), we see that up to 1960, they lie very close to the SDEV values. This means, we do not need to de-trend time series when trends are small. The increasing difference between SDEV and RMSD (MEVSS) from 1960 onwards tells us however, that during periods of significant trends we need to de-trend the series to get a realistic climatic measure of dispersion. Instead of taking the SDEV directly, RMSD (MEVSS) can be called de-trended SDEV and is hence the measure of choice. After de-trending, only during the first three CLNPs the variability of annual mean temperatures stays somewhat higher (around 0.9 °C) than after 1870 with roughly 0.6 °C. It is interesting to note that the station location of the Zentralanstalt für Meteorologie und Geodynamik in Vienna has changed in 1872 to its present location (Hammerl 2001). Could there be a discontinuity in the interannual temperature variation before and after the change of the station location, although the temperature series itself was carefully homogenized (Auer et al. 2001)? Climatic temperature time series and trends of the nineteenth century and earlier should be treated carefully in any case, because instrumental biases were in the order of 0.5 to 1°C (Middleton 1966; Bernhard 2004; Winkler 2009).

Table 2 Mean temperatures and measures of dispersion for climate normal periods (CLNP) in Vienna. For the last period, only the 28 years between 1991 and 2018 have been used. Besides the SDEV of the annual mean temperatures within the CLNP, the root mean square differences between the observed annual temperatures and the respective 30-year MEVSS temperatures (RMSD (MEVSS)) and the root mean square differences between the observed annual temperatures and the mean temperature of the previous CLNP (RMSD (prev CLNP)) are shown

Full size table

When we use statistics of past CLNPs to relate it to actual observations—what is normally done—measures of dispersion may become meaningless. Table 2 shows an enormous increase of RMSD (prev CLNP) above RMSD (MEVSS) in the period 1990–2018 by a factor of more than two (from 0.66 to 1.48). Now it becomes clear, why the 2018 mean temperature of 13°C, related to the statistics of the previous CLNP 1961–1990 is an outlier with more than four times the SDEV above the mean whereas it stays below two times the RMSD (MEVSS), which is still rather normal.

We can conclude, that not the 2018 mean temperature was so extreme according to the present climate but the increase of the trend of the climatological mean, which has reached almost 0.06 °C per year (or 1.8 °C per 30 years!), a value never observed before in the 244-year temperature series of Vienna. The 1.8°C increase of the trend of 1988–2018 was more than 3 times the SDEV of all 215 30-year trends between 1776 and 2018.

A problem with the determination of the actual climatic mean temperature is, that we just know the observations of the last 15 years but we do not know the observations of the next 15 to come of the 30-year period centered to the actual year. How can we proof, that the determination of actual climate statistics by the endpoint of an actual smoothed temperature series is better than taking the mean from past periods? We can do at least a hindcast. For all years from 1875 to 2003, the end values of the MEVSS, LOESS, and LOWESS curves (T_MEVSS, T_LOESS, T_LOWESS) based on a 100-year period have been computed; i.e., for 1875, the years 1776 to 1875 have been used to predict the climatic mean temperature for 1875 by the above smoothing methods. For 1876, the years 1777–1876 have been used and so forth. The values have then been compared with the 30-year mean temperature, e.g.,

$$ {\overline{T}}_{1875}=\frac{1}{30}\left(\frac{T_{1860}+{T}_{1890}}{2}+\sum \limits_{t=1861}^{1889}{T}_t\right), $$

(9)

which would not have been known in 1875. Furthermore, a comparison with the last available CLNP in the respective year (for 1875 this would have been 1841–1870) has been carried out. Finally, a comparison with the corresponding MEVSS temperature $ \tilde{T} $ derived of the whole series from 1875 to 2018 was carried out. The results are shown in Table 3. On average, the endpoint-temperature of the smoothed MEVSS, LOESS, and LOWESS temperatures fit very well to the centered 30-year mean temperatures. The maximum differences, however, all dating from the last few years with the strong positive temperature trend, is with 0.48 for MEVSS and LOESS and with 0.65 °C for LOWESS not really satisfying. If we take the standard methodology, to use the last available CLNP statistics, the result becomes much worse. Now even the mean difference is −0.2°C and the maximum, again dating from one of the last few years, gives an intolerable 1.11 °C difference. It is important to see that the difference between the MEVSS endpoint value and the temperature of the complete MEVSS series gives the best result. No mean difference and a 0.18 °C RMSD with a maximum difference of 0.4 °C is not perfect, but it is probably the best what we can do, to estimate the actual climate statistics.

Table 3 Basic statistics (mean difference, root mean square difference, and maximum absolute difference) of the differences between the endpoints of the MEVSS, LOESS, and LOWESS temperatures (T_MEVSS, T_LOESS, T_LOWESS) and the observed 30-year mean temperatures $ \overline{T} $(according to Eq. 9) centered at the endpoint for all 100-year periods between 1776–1875 and 1904–2003. T_CLNP is the last completed climate normal period (1781–1810, 1811–1840, …, 1961–1990) before the corresponding year, $ \tilde{T} $ is the MEVSS temperature for the corresponding year based on the whole temperature series from 1775 to 2018

Full size table

The steep ascent of the climatological mean temperature has also consequences with respect to the judgment of monthly or daily temperature series. Often, the deviations of these temperatures are discussed in the public, using mean values of the last climatological normal period. In Fig. 6, besides the mean, maximum, and minimum daily mean temperatures and their MEVSS-smoothed curves also 1, 2, and 3 times SDEV of the maximum daily temperature with respect to the smoothed values are plotted above the maximum and below the minimum curves. It is important to note, that the smoothed curves are not pure sine functions, albeit they are based on only 4 (three-monthly) mean values. The mean time distance between the annual minimum (on January 12) and the maximum (on July 19) is 188 days, the cooling period is only lasting 177 days. Hence, also the maximum smoothed interdiurnal cooling rate in autumn (−0.187°C/day) is somewhat larger than the smoothed maximum warming rate (+0.169°C/day) in spring. If the distribution function is Gaussian, the mean equals the median and the smooth maxima/minima curves can be used to indicate quantiles and to derive return periods. An extended Wilk-Shapiro test (Shapiro and Wilk 1965; Royston 1982) has been applied to the daily extreme temperature distribution and a proof for a Gaussian distribution was found. Hence, the smooth curve of the maximum temperature indicates the 60-year return period, because on average each second day of the 30-year series lies above/below this curve. This means that for each individual day the probability of being higher/lower than the mean smoothed maximum/minimum curve is 1/60. Statistically there should be one day within a 2-month period and roughly 6 days within a single year above/below the curves. Due to the nearly Gaussian distribution, the maximum/minimum curves plus/minus 1, 2, and 3 SDEV can also be used to indicate other return periods. The 1/2/3 SDEV curves indicate the 189/1316/ and 23,095-year return periods. For the 1/2/3 SDEV curves there should be statistically 2/0.3/0.02 days within a year above/below the respective curves. For the 30 years of a climate normal period, the statistical numbers are 60/8/0.5 days for the 1/2/3 SDEV lines. The observed values for the period 1961–1990 are 51/15/0 days for the maximum and 53/15/1 for the minimum, which are quite close to the expectation.

As has been shown by the annual mean temperature time series with a 2 °C increase of the mean over the last 40 years, the statement “above” or “below average” for a concrete current year should not be made with historic averages, unless we want to compare apples and oranges. They must be made with averages according to the actual climate. Let us take the daily mean temperature series of Vienna for 2018. If we take the MEVSS-smoothed annual curve, based on the last climate normal period (1961–1990, see Fig. 7), we come to the conclusion, that the 284 out of 365 days, having temperatures above the climatological mean 1961–1990, represent an exceptional high number of “warmer than average” days. The number of “colder than average” days (81 out of 365) is exceptional low. In Fig. 7, only the 1 in 100 years lines for the minimum and maximum are indicated, which equal the 1st and 99th percentile. The 1 in 100 years line is equal to the smoothed maximum/minimum line ± 0.476 SDEV according to a Gaussian distribution. In a whole year the statistically expected number of days, exceeding the 1 in 100 years maximum and minimum curves should be roughly 4 (0.01*365). The observed number of days for the maximum in Fig. 7, however, was 39, nearly 10 times as much as expected, an exceptional value. The observed number of days exceeding the 1 in 100 minimum line was 4. But this comparison is not adequate, we are comparing temperatures of the present climate with past climate data.

What can we do, to seriously compare the daily temperature series of 2018 with the actual climate? To get the smooth mean climatological temperature for 2018 we can first compute the three-monthly (Jan–Mar, Feb–Apr, …, Dec–Feb) smooth temperature time series from 1863–2018, averaging the time series e.g. for the periods 1863–1880, 1881–1910, …, 2001–2018, the periods 1863–1890, 1891–1920, …, 1981–2018, and the periods 1863–1900, 1901–1930, …, 1991–2018 with MEVSS. Then, we can further average the MEVSS-smoothed daily temperature series of 2018 for the periods Jan–Mar, Apr–June, …, Oct–Dec, for the periods Feb–Apr, May–Jul, …, Nov–Jan, and for the periods Mar–May, June–Aug, …, Dec–Feb. The so generated smooth daily average temperature for the year 2018 (“climate of 2018”) is plotted in Fig. 8. Now, the number of days warmer than the 2018-climate daily average is reduced to 241 and colder than average is increased to 124. The still much larger number of positive anomalies is due to the fact, that the year 2018 was warmer by 1.5 °C than the climatological mean of 2018 (compare Fig. 4). The smooth 1 in 100 years curves for the 2018 climate cannot be computed in the same manner as for the smooth mean daily temperatures, because the values of the maximum and minimum time series depend on the length of the time series. Therefore, in Fig. 7, the 1 in 100 years curves of the smooth daily maxima and daily minima have been taken with the same distance from the smooth mean curve like for the 30-year time series 1989–2018. This can be justified by the fact that the smoothed mean amplitude between the minimum and maximum curves and the STDEV of the minima and maxima did not change very much between the last climate normal period 1961–1990 and 1989–2018. Now, the number of days above the 1 in 100 years maximum curve (99th percentile) in 2018 reduces to 3 and the number of days below the 1 in 100 years minimum curve (1st percentile) in 2018 is 5, both very close to the statistical expectation of 4 days. One interesting finding of the 2018 time series with the smooth curves adjusted to the “climate of 2018” is that the most extreme event of the daily mean temperatures was not in the summer half year with the many warmer than average periods, but in winter, when the temperature at the end of February fell way below the 1 in 100 years curve. Again, the single days of 2018 were not exceptional compared to the actual climate but the increase of the mean values between the climate of 1961–1990 and 2018 was extreme, especially in summer by more than 2.5 °C. Whereas in the period 1961–1990, the climatological number of days with a mean temperature of 20°C or greater was 35; in the present climate (2018), this number has increased to 87 days. With a further temperature increase, the number of days with high temperatures will rise further sharply, because the probability to exceed a threshold will increase and the seasonal extent, when this can occur will also increase. With respect to future heat waves, this means an exponential rather than a linear increase.

4 Conclusion and outlook

In times of significant climate change, what we just experience, the comparative statistical climatology has to take into account trends of meteorological parameters. Considering trends means, that events, which would have been characterized as extreme events under a past climate, can be rather “normal” under present climatic conditions. Heatwaves, which have been experienced more often in the past summers, should not be called extreme events anymore, because they become more “normal” under present conditions. This would also improve the communication with the public, because the term “extreme events” is intuitively related to rare events. We must communicate that some extreme events of the past are becoming “normal events” nowadays. It is more than questionable, if the traditional way of using 30-year average values as “climate normals” should be continued. If the trend within 30 years is in the order of the STDEV or even higher, which is the case for the 1991–2020 CLNP, the climate of a year at the beginning and the end of this CLNP cannot be set equal. There is nothing like the “climate of 1991 to 2020”. The 30-year smoothing allows to define “climate normals” virtually for each year. In this paper, it was shown that in the frame of rapid global change this makes sense.

The methodology of MEVSS smoothing is seemingly the best way how to produce smooth climatological time series, because it smooths the short term (random) variations to a high degree, whereas the long-term variations are kept nearly undisturbed and better than with other smoothing methods. This is especially true for periodic time series like annual or daily variations. This methodology seems to allow also a wide application to other climate elements. An application to non-Gaussian distributed variables should be checked in the future.

Another concrete application of the MEVSS method, which by no other existing method may be achieved alike, is the refinement of the annual variation of an element, when only monthly mean- or extreme values are given. For many stations worldwide such information is easily available online (e.g., Weather-Online 2019). Instead of plotting an 11-month pseudo-annual temperature-“curve” by connecting the 12 monthly mean values by straight lines, the MEVSS smoothing gives a much better estimate for daily time series of the whole annual curve, including “true” minima and maxima.

References

Angel JR, Easterling WR, Kirtsch SW (1993) Towards defining appropriate averaging periods for climate normals. Clim Bulletin 27:29–44
Google Scholar
Arguez A, Yu P, O’Brien JJ (2008) A new method for time series filtering near endpoints. J Atmos Ocean Technol 25:534–546
Article Google Scholar
Auer I, Böhm R, Schöner W (2001) Austrian long-term climate 1767–2000. In: Multiple instrumental climate time series from Central Europe. Österreichische Beiträge zu Meteorologie und Geophysik 25. Zentralanstalt für Meteorologie und Geodynamik, Wien. https://doi.org/10.1002/joc.754
Chapter Google Scholar
Bernhard F (2004) Technische Temperaturmessung. Springer Vieweg. https://doi.org/10.1007/978-3-642-24506-0
Box GEP, Jenkins GM (1970) Time series analysis, forecasting and control. Cambridge University Press, Oakland
Google Scholar
Cleveland WS (1981) LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am Stat 35(1):54. https://doi.org/10.2307/2683591.JSTOR2683591
Article Google Scholar
Hammerl C (2001) Die Zentralanstalt für Meteorologie und Geodynamik 1851-2001 : 150 Jahre Meteorologie und Geophysik in Österreich. Leykam, Graz
Google Scholar
Hamming RW (1989) Digital Filters, 3rd edn. Hall, Prentice 284 pp
Google Scholar
Hiebl J, B Chimani, M Ganekind, A Orlik (2019) Österreichisches Klimabulletin 2018. Zentralanstalt für Meteorologie und Geodynamik, Wien, https://www.zamg.ac.at/zamgWeb/klima/bulletin/2018/bulletin-2018.pdf, accessed August 1, 2019
Livezey RE, Vinnikov KY, Timofeyeva MM, Tinker R, van den Dool HM (2007) Estimation and extrapolation of climate normals and climatic trends. J Appl Meteorol Climatol 46:1759–1776
Article Google Scholar
Mann ME (2004) On smoothing potentially nonstationary climate time series. Geophys Res Lett 31:L07214. https://doi.org/10.1029/2004GL019569
Article Google Scholar
Mann ME (2008) Smoothing of climate time series revisited. Geophys Res Lett 35:L16708. https://doi.org/10.1029/2008GL034716
Article Google Scholar
Middleton WEK (1966) A history of the thermometer and its use in meteorology. Johns Hopkins Press, Baltimore
Google Scholar
Royston JP (1982) An extension of Shapiro and Wilk’s W test for normality to large samples. Journal of the Royal Statistical Society. Series C (Applied Statistics) 31(2):115–124
Google Scholar
Sasaki Y (1955): A fundamental study of the numerical prediction based on the variational principle, Journal of the Meteorological Society of Japan, Ser. 2, Vol. 33, No. 6,
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry 36(8):1627–1639. https://doi.org/10.1021/ac60214a047
Article Google Scholar
Schönwiese CD (2013) Praktische Statistik für Meteorologen und Geowissenschaftler. 5. vollständig überarbeitete und erweiterte Auflage. Bornträger
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3–4):591–611. https://doi.org/10.1093/biomet/52.3-4.591
Article Google Scholar
Shumway R, Stoffer D (2006) Time series analysis and its applications: with R examples. Springer, New York
Google Scholar
Weather-Online (2019) Worldwide Climate Diagrams: https://www.weatheronline.co.uk/reports/climate/Diagrams.htm, accessed August 1, 2019
Winkler P (2009) Revision and necessary correction of the long-term temperature series of Hohenpeissenberg, 1781-2006. Theor Appl Climatol 98:259–268. https://doi.org/10.1007/s00704-009-0108-y
Article Google Scholar
WMO (1989) Calculation of monthly and annual 30-year standard normals. WCDP-No. 10, WMO-TD/No. 341, World Meteorological Organization, Geneva
WMO (2007) The role of climatological normals in a changing climate. WCDMP-No. 61, WMO-TD/No. 1377, World Meteorological Organization, Geneva
ZAMG (2019) Histalp Jahresbericht 2018, Wien, https://www.zamg.ac.at/cms/de/dokumente/klima/dok_news/dok_histalp/jahresbericht-2018/bericht, accessed August 1, 2019

Download references

Acknowledgements

Thanks are due to the Zentralanstalt für Meteorologie und Geodynamik in Vienna for providing the daily and yearly temperature data for this paper.

Availability of data and material

All data published in this article are available upon request except the raw daily and yearly data from Vienna, which have to be requested from the Zentralanstalt für Meteorologie und Geophysik, Vienna.

Code availability

Most evaluations have been carried out by own Excel programs, a few by Matlab-programs, which are available upon request from the author.

Funding

Open access funding provided by University of Vienna.

Author information

Authors and Affiliations

Department of Meteorology and Geophysics, University of Vienna, Althanstrasse 14, 1090, Vienna, Austria
Reinhold Steinacker
Karl-Innerebnerstrasse 75 b, 6020, Innsbruck, Austria
Reinhold Steinacker

Authors

Reinhold Steinacker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinhold Steinacker.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Steinacker, R. How to correctly apply Gaussian statistics in a non-stationary climate?. Theor Appl Climatol 144, 1363–1374 (2021). https://doi.org/10.1007/s00704-021-03601-4

Download citation

Received: 17 October 2019
Accepted: 17 March 2021
Published: 05 April 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00704-021-03601-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

How to correctly apply Gaussian statistics in a non-stationary climate?

Abstract

Similar content being viewed by others

Structural time-series modelling for seasonal surface air temperature patterns in India 1951–2016

Effects of variance adjustment techniques and time-invariant transfer functions on heat wave duration indices and other metrics derived from downscaled time-series. Study case: Montreal, Canada

Statistical Variability and Persistence Change in Daily Air Temperature Time Series from High Latitude Arctic Stations

1 Introduction

2 Filtering time series

3 Application of MEVSS to long-term time series

4 Conclusion and outlook

References

Acknowledgements

Availability of data and material

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Navigation

How to correctly apply Gaussian statistics in a non-stationary climate?

Abstract

Similar content being viewed by others

Structural time-series modelling for seasonal surface air temperature patterns in India 1951–2016

Effects of variance adjustment techniques and time-invariant transfer functions on heat wave duration indices and other metrics derived from downscaled time-series. Study case: Montreal, Canada

Statistical Variability and Persistence Change in Daily Air Temperature Time Series from High Latitude Arctic Stations

1 Introduction

2 Filtering time series

3 Application of MEVSS to long-term time series

4 Conclusion and outlook

References

Acknowledgements

Availability of data and material

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation