Analysis and estimation of the effects of missing values on the calculation of monthly temperature indices
- 251 Downloads
Long and complete climatic data series are a fundamental resource for scientific research on climate change. Data quality is important, and missing value or data gap management is a key process that must be dealt with carefully to produce reliable datasets. Although a large variety of techniques are available for gap-filling, a widespread strategy is to consider a dataset reliable if the rate of missing data is below a given threshold. However this strategy varies from study to study. The aim of this paper is to analyze the impact of missing daily values on the estimation of monthly average temperature indices. The relationship between the error of the estimate and the presence of random or consecutive missing values, as well as data series autocorrelation is also analyzed. A theoretical, a linear and a nonlinear model to estimate the maximum error at the 95 % confidence interval are tested on data series provided by national and worldwide networks of stations. Consecutive missing values have an important effect on error estimation due to autocorrelation of temperature data series. On our dataset, the mean and standard deviation of the error for five consecutive missing values (0.27 ± 0.05 °C) on a normalized daily series (σ = 1) was higher than for five random missing values (0.14 ± 0.006 °C). A nonlinear model taking into account the number of consecutive missing values is able to estimate the error and its performance is less affected by the presence of consecutive missing values than the other proposed models.
KeywordsRoot Mean Square Error Data Series Maximum Error Temperature Index World Meteorological Organization
The author thanks Alison Garside for the revision of the text.
- Auer I, Böhm R, Jurkovic A, Lipa W, Orlik A, Potzmann R, Schöner W, Ungersböck M, Matulla C, Briffa K, Jones PD, Efthymiadis D, Brunetti M, Nanni T, Maugeri M, Mercalli L, Mestre O, Moisselin JM, Begert M, Müller-Westermeier G, Kveton V, Bochnicek O, Stastny P, Lapin M, Szalai S, Szentimrey T, Cegnar T, Dolinar M, Gajic-Capka M, Zaninovic K, Majstorovic Z, Nieplova E (2007) HISTALP — historical instrumental climatological surface time series of the Greater Alpine Region. Int J Climatol 27:17–46. doi: 10.1002/joe.1377 CrossRefGoogle Scholar
- Bates DM, Chambers JM (1992) Statistical models in S, chapter 10 (Nonlinear models). Chapman and Hall, Boca RatonGoogle Scholar
- Hubbard KG (2001) Multiple station quality control procedures. Automated weather stations for applications in agriculture and water resources management. World Meteorological Organization Tech. Doc. AGM-3 WMO/TD No. 1074, 133–136Google Scholar
- Parry ML, Canziani OF, Palutikof JP, van der Linden PJ, Hanson CE (2007) Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, 2007 Cambridge University Press, Cambridge, UKGoogle Scholar
- Stooksbury DE, Idso CD, Hubbard KG (1999) The effects of data gaps on the calculated monthly mean maximum and minimum temperatures in the continental United States: a spatial and temporal study. J Climate 12:1524–1533. doi: 10.1175/1520-0442(1999)012<1524:TEODGO>2.0.CO;2, doi: 10.1175/1520-0442%281999%29012%3C1524:TEODGO%3E2.0.CO;2
- Trewin B (2007) The role of climatological normals in a changing climate WCDMP - No. 61 WMO — TD No. 1377 World Climate Data and Monitoring Programme World Meteorological Organization (Geneva, March 2007). Edited by: Omar Baddour and Hama KoGoogle Scholar
- WMO (2008) WMO Guide to Meteorological Instruments and Methods of Observation WMO-No. 8 (Seventh edition)Google Scholar