Data
Since January 1972, the Sri Lankan national malaria control programme, the Anti Malaria Campaign (AMC), has collected monthly confirmed malaria case data from health facilities aggregated by medical officer of health (MOH) areas (which represent sub district health administrative divisions). This data up to December 2005 was cleaned and aggregated to district resolution. For each district, for each month, the mean rainfall was extracted from monthly rainfall surfaces for the period January 1971 – December 2005. Both rainfall and malaria datasets are described in detail elsewhere [10].
Statistical analysis
The relationship between rainfall and malaria incidence was investigated using (i) cross-correlation analysis, (ii) cross-correlation analysis with pre-whitening, (iii) inter-annual analysis and (iv) seasonal inter-annual analysis allowing for temporal variability in the effect.
Cross-correlation analysis
Cross-correlations between detrended monthly malaria case count time series and monthly total rainfall [8] were analysed to detect the time lag(s) of rainfall preceding malaria at which the series show strongest correlation.
Malaria time series showed strong long term fluctuations for most districts in Sri Lanka (Figure 3). However, in rainfall time series these long term fluctuations were absent. Therefore, it was expected that rainfall could not explain the long term fluctuations in malaria, which were probably related to other factors, such as malaria control and population changes. The long term fluctuations masked the correlation between malaria and rainfall and since no information on the underlying factors was available in the data, the long term fluctuations needed to be removed prior to calculating cross-correlations. It was assumed that monthly malaria case count data y
t
, after the transformation y'
t
= log(y
t
+ 1) follow a seasonal model [15] of the form:
y'
t
= m
t
+ S
t
+ ε
t
where m
t
is the mean level in month t; S
t
is the seasonal effect in month t; and ε
t
is the Gaussian random error.
As an example, Figure 4 shows the logarithmically transformed series for Gampaha district. The long term fluctuations m
t
in the logarithmically transformed monthly district malaria case count series were calculated using a 13-point centred smoothing filter with the months at the extremes given half weight:
m
t
= 1/12(0.5 y't-6+ y't-5+ ... + y'
t
+ ... + y't+5+ 0.5 y't+6)/12.
Smoothing was performed using the function "decompose" of the package "stats" in the software R [16]. From the detrended series ζ
t
= y'
t
- m
t
(Figure 2 and Figure 5). implicitly long term trends caused by population growth were removed. Cross-correlation analysis was applied between the detrended log transformed malaria case time series and untransformed rainfall time series x
t
. The cross-correlation was estimated for malaria with a lag l of zero to twelve months behind rainfall as
where s
x
, s
ζ
are the sample standard deviations of observations on x
t
and ζ
t
, respectively. The analysis was repeated with logarithmically transformed rainfall time series x'
t
= log(x
t
+ 1).
The cross-correlation is calculated as the average over all (calendar) months, and possible varying correlation depending on the season is not accounted for, i.e. if rainfall has a strong positive effect on malaria in some months, and a strong negative in others, the average detected cross-correlation could be weak.
Even though the above approach may find strong correlations, these may not be very useful for malaria prediction if aberrations from the long term seasonal mean of rainfall are weakly linked to aberrations from the long term seasonal mean of the malaria case series. In addition, the standard cross-correlation assumes observations are independent, whereas in reality the malaria data are temporally correlated.
Cross-correlation analysis with pre-whitening
Cross-correlation with the seasonality and autocorrelation removed by simple pre-whitening allows for detection of the time lag(s) of rainfall preceding malaria, at which divergences from the long term seasonal pattern in rainfall time series show strongest correlation with such divergences in detrended malaria case count time series, while minimizing effects of spurious correlations caused by autocorrelation in the time series. This method bears some similarity to anomaly analysis, where the cross-correlation of aberrations from the long term seasonal mean of the explanatory variables is correlated with aberrations from the long term seasonal mean of the response variable. The effect of pre-whitening is to reduce unassociated autocorrelation and/or trends within time series prior to computation of their cross-correlation function (It is well established that autocorrelation within time series results can produce spurious cross-correlations [15]). Simple pre-whitening is used when there is a clear unidirectional influence such as between rainfall and malaria. First, an auto-regressive model is fit to the explanatory variable. The pre-whitened explanatory variable consists of the residuals of this fitted model, whereas the pre-whitened outcome variable consists of the residuals of the same model (with the same parameters) applied to the outcome variable. With the inclusion of seasonality in the autoregressive model, the pre-whitening procedure removes seasonality (and autocorrelation) from the explanatory variable time series, and the same amount of seasonality (and autocorrelation) from the outcome variable time series. It is thus possible that additional seasonality (and autocorrelation) remains in the pre-whitened outcome variable time series.
For each district, multiplicative seasonal auto-regressive integrated moving average (SARIMA) models [17] with all possible combinations of parameters p, q, P, Q ∈ {0, 1, 2} and with d, D ∈ {0, 1}, were evaluated using the Akaike's information criterion (AIC) on untransformed and logarithmically transformed monthly rainfall totals in the period from January 1971 to December 2005.
The selected SARIMA model was then used to pre-whiten both the rainfall time series and detrended (smoothed) logarithmically transformed malaria case count time series ζ
t
. The residuals of both series were used as input for the cross-correlation analysis. The functions "arima" and "ccf" from the R package "stats" were used.
The cross-correlation analyses above have the drawback of masking inter-annual effects of rainfall on malaria time series because of the removal of the strong long term trend fluctuations.
Inter-annual analysis
In "Inter-annual analysis", the series of differenced logarithmically transformed annual malaria cases was studied to determine if it was correlated to differenced logarithmically transformed total annual rainfall. Unlike the first two approaches, it can not account for the within year effects, but inter-annual effects are not masked.
The difference Ωt,k= log(Yt,k) - log(Yt-1,k) = log(Yt,k/Yt-1,k) reflects the relative change in case numbers between consecutive years [3], where Yt,kis the annual malaria case total for year t, and the start month k of the twelve-month period was either April (k = 4) or September (k = 9) because seasonally, malaria was lowest in either April or September, depending on the district [13]. Similarly, the relative change in rainfall over 12 month periods preceding the malaria periods with a lag l of one to three months was represented by Ξt,l,k= log(Xt,k,l) - log(Xt-1,k,l) = log(Xt,k,l/Xt-1,k,l). Malaria was regressed against rainfall in a first order auto-correlated (AR1) model:
Ωt,k= φ
k
Ωt-1,k+ βl,k(Ξt,l,k- φ
k
Ξt-1,l,k) + εt,k. The Pearson correlation coefficient between Ωt,k– φ
k
Ωt-1,kand Ξt,k,l- φ
k
Ξt-1,k,lwas then calculated. Figure 6 and Figure 7 provide an illustration for Gampaha District. The robustness of all significant (p ≤ 0.1) correlations detected was tested as follows: For each observation it was calculated whether it was influential in terms of dfbeta, dffits, covariance ratios, Cook's distances and the diagonal elements of the hat matrix. Observations which were influential with respect to any of these measures were omitted (one at a time) from the data and the correlation coefficient was recalculated. The weakest correlation among these was recorded.
Seasonal inter-annual analysis
The effect of rainfall on malaria may depend on the season; therefore, it was of interest to assess the inter-annual relationship between malaria and rainfall for each calendar month in the year. The inter-annual analysis above was modified by replacing Ωt,kwith ωt,k, and Ξt,k,lwith ξt,k,l. Here, ωt,krepresents the average logarithmically transformed malaria count over three months (e.g. January – March) differenced with the average logarithmically transformed malaria in the previous twelve months: with yt,kthe malaria count in month k (varied between January and December) and in year t. Similarly, . The seasonally varying correlation coefficients between rainfall and malaria rk,lwere transformed into zk,lvalues using the Fisher transformation and correlated to a three month centred moving average of logarithmically transformed geometric mean seasonal rainfall (similar as depicted in Figure 1, but logarithmically transformed) and its derivative (expressing the change in seasonal rainfall per month).