1 Introduction

The frequent dengue outbreaks have made dengue fever a significant health hazard in Pakistan. A recent study by [17] focused on improving the prevention measures through spatial mapping of temporal risk in the Lahore district. However, the published documents on the influence of environmental parameters on the spread of dengue fever in the country are limited in number. Dengue hemorrhagic fever is a potentially deadly infectious disease spread by mosquito bites infected with the virus. Typical species of mosquitoes can transmit viruses or parasites to humans [24]. The Aedes aegypti is the main mosquito that spreads this disease.

Dengue fever outbreaks have grown radically during the recent decades in many parts of the world [4, 32, 34]. There are more than 100 million new cases of dengue fever every year throughout the world [11]. According to the World Health Organization (WHO) statistics, about 100 tropical and subtropical countries are at risk due to the dengue virus [5, 15, 34]. The same report states that dengue hemorrhagic fever (DHF) is the primary cause of childhood mortality in several Asian countries [8, 27]. During the last two decades, the dengue incidences have increased globally from 148 per 100,000 in 1990 to 810 per 100,000 in 2013 [26]. There is a pressing need to control the spread of dengue virus at an early stage to save millions of people living in hot and humid regions of the world [19].

Weather parameters play a significant role in mosquito breeding and disease outbreaks. Therefore, in countries, where dengue spread occurs repeatedly, a variation in precipitation, temperature, and humidity can be related to dengue outbreaks [6, 12]. Alkhaldy [7] established an association of dengue fever incidences with relative humidity and temperature in a city of Saudi Arabia using a generalized linear model. Xiang et al. [35] investigated the temperature and dengue association in China. He also found a nonlinear relationship between dengue and relative humidity and extreme wind velocity. Chien and Yu [14] studied the spatiotemporal patterns of dengue cases and identified elevated risk in highly populated areas. In this study, the meteorological parameters that found to impact dengue incidences were 24-h weekly maximum rainfall and weekly minimum temperature. Atique et al. [9] studied the influence of meteorological parameters on dengue incidences in Lahore, Pakistan. He examined the association between the reported dengue cases and climatic parameters. He concluded that changing climate is causing an increase in dengue cases.

Weather parameters are easy to perceive in identifying the specific environmental conditions that support mosquito breeding [31]. An intense (35–40 °C) temperature can control the flight range of a dengue mosquito [20]. Increase in inland water temperature to about 20 °C may accelerate the breeding rate of dengue mosquito [13, 23]. According to the World Health Organization (WHO), temperatures higher than 26 °C and heavy rainfalls significantly influence dengue transmission rate. Although weather parameters are considered to be the primary factors that lead to dengue outbreaks, there are other factors as well that affect the spread of the virus including demography, topography, land use practices, and behavioral and societal features of a region.

Pakistan, with variable weather patterns and demography, had many dengue episodes during the past two decades. The dengue fever was not known in Pakistan until 1994 when the first victim was identified in its southern city, Karachi. In the Punjab province, specifically in the Lahore District, dengue cases are on the rise since 2007 [21, 28]. The 2011 dengue outbreak of Lahore that prevailed from March to December of that year is considered as the worst in history.

In our study, geospatial tools were utilized to evaluate the influence of various weather and demographic parameters on dengue incidences. Geospatial techniques, including remote sensing (RS) and geographical information system (GIS), are extensively utilized to analyze spatial and temporal data in monitoring and mapping epidemics and risk-prone areas [15, 21, 32].

2 Materials and methods

2.1 Study area

This study is based on the dengue outbreaks of Lahore in 2011. Lahore is a district of Pakistan’s Punjab province. The study area is a semiarid region with hot and long summers, heavy monsoon rains, and mild dry winters. During summers, the temperature reaches 40–48 °C in May–July. The wet season starts from the end of July with heavy rainfalls. The average annual precipitation in Lahore is 628.8 mm, and the maximum rainfall within 24 h is 250–300 mm. December, January, and February are the coldest months of the year when the temperature drops up to − 1 °C.

There are nine towns in the district of Lahore. These towns are further divided into union councils (UCs) that are the smallest administrative divisions. This study does all analysis at the UC level.

2.2 Methodology

The primary purpose of this study was to study the relationship of the dengue incidences of Lahore in 2011 with its weather and demographic variables using regressions analysis. First, the published literature was searched to identify the potential explanatory variables. Based on previous studies, the parameters selected for testing their association with dengue cases were average Normalized Difference Vegetation Index (NDVI), average Normalized Difference Water Index (NDWI), maximum land surface temperature (LST) (Celsius), average land surface temperature (LST) (Celsius), built-up area (square meters), population density, and population (in thousand) at union council (UC) level. The data of these parameters at the time of outbreak were acquired, and the information regarding the number of dengue cases was collected. The methodological workflow is illustrated in Fig. 1 that presents the schemes of different study phases.

Fig. 1
figure 1

Methodological workflow

Linear regression tools of ArcGIS software are ordinary least-squares regression (OLS) and geographically weighted regression (GWR). As its name suggests, the ordinary least-squares regression (OLS) is a standard linear regression method obtained by minimizing the squares of the errors between observed and predicted data. OLS is also called a global regression method that globally minimizes the prediction errors. Equation 1 is a multiple regression equation in which ‘Y’ is the dependent variable (dengue in our case) with X1, X2, …. Xn are ‘n’ number of explanatory variables. The y-intercept of the model is ‘a’ and b1, b2, … bn are the regressions coefficients.

$$Y \, = \, a \, + \, b1X1 + b2X2 \ldots bnXn$$
(1)

On the other hand, geographically weighted regression (GWR) is called a local regression technique based on observed spatial patterns of the variables as they change their values across the study area. GWR provides a mean of exploring spatially varying relationships. GWR models explain this relationship for every location on a map. GWR provides a mean of exploring spatially varying relationships. GWR models (Eq. 2) explain this relationship for every area on a map. GWR is also used to identify the hotspots and to help understand why hot spots are present.

$$Y_{i} = X_{i}^{t} \beta \left( {U_{i} ,V_{i} } \right) + \varepsilon_{i} = \beta_{o} \left( {U_{i} ,V_{i} } \right) + \mathop \sum \limits_{k = 1}^{p} X_{ik } \beta_{k} \left( {U_{i} ,V_{i} } \right) + \varepsilon_{i}$$
(2)

In Eq. 2, β (ui, vi) indicates the vector of the location-specific parameter estimates, (ui, vi) represents the geographic coordinates of location i in space, and εi is the error term with mean zero and common variance σ2. It should be noted that excluding the geographic coordinates (ui, vi) will make Eq. 2 a multiple regression model. GWR uses kernel-based and geographically weighted least squares on a point-wise basis to estimate these parameters. GWR is also used to identify the hot spots and to help understand why hot spots are present. Regression analyses were used in this study to build models to map variation in the dependent variable due to changes in the explanatory or independent variable(s).

2.3 Data collection

The epidemic data containing 17,330 dengue cases, registered in Lahore during March–December 2011, were acquired from the Ministry of Health (MoH) and Punjab Disaster Management Authority (PDMA). Population statistics were derived from the Government of Punjab’s Pre-investment report [16]. GIS data including the boundaries of the Lahore district, towns, and UCs were acquired from The Urban Unit, Lahore. In this study, satellite imageries of SPOT-5 and Landsat-5 (TM) were used for calculating land use/covers and land surface temperatures of the study area. Spot-5 has a high spatial resolution (5-m) multispectral data, whereas Landsat-5 (TM) has a medium-resolution (30-m) images. Google Earth high-resolution imagery was used to digitize built-up areas in which precision was not possible using the coarser-resolution images. Satellite images were selected with dates of acquisition matching with the temporal scale of the study. SPOT 5 images were obtained from the Space and Upper Atmospheric Research Commission’s (SUPARCO) satellite ground station. Landsat TM satellite images were downloaded from the Earth Explorer Web site (http://earthexplorer.usgs.gov/).

2.4 Geospatial analysis

Multiple bands ratios (indices) can identify many types of land use/land cover (LULC) from satellite data. The Normalized Difference Vegetation Index (NDVI) for vegetative cover and the Normalized Difference Water Index (NDWI) for wet area identification are the most commonly used spectral indices that were calculated in this study. For NDWI and NDVI, SPOT-5 satellite datasets were utilized, whereas land surface temperature (LST) was derived from the thermal band of Landsat-5 (TM). Maps for NDVI and LST with spatially distributed dengue cases are presented in Fig. 2. Rainfall distribution at UC level could not be estimated due to limited data from only one weather station in Lahore. The coarser-resolution satellite-based precipitation data (0.25° × 0.25°) were also not sufficient for this purpose. However, the precipitation data from the single station identified a lag time between the peaks of rainfall and dengue incidences (See Fig. 3). It was decided, therefore, to relate dengue outbreaks with wet surfaces present in the study area after the rainfall event. The NDWI was deemed sufficient for analyzing this relationship in the absence of rainfall data.

Fig. 2
figure 2

LST, NDVI, and distribution of dengue cases

Fig. 3
figure 3

Meteorological data and dengue incidences

The population at UC level was not readily available, and therefore, the town-wise population was distributed among UCs proportional to the built-up area in each UC. Built-up area was digitized using Google Earth imagery. Houses and other structures were precisely identified on high-resolution Google Earth images.

The average values of NDVI, NDWI, LST, and maximum LST were calculated at UC level using the Zonal Statistics tool of ArcGIS. In ArcGIS, the zonal statistic tool was used to calculate the statistic (averages and maximum values) for each union council in the union council dataset, using LST, NDVI, and NDWI images. A single output value for each parameter was computed for each UC in the union council dataset.

Histograms of regression variables were developed and are shown in Fig. 4. From this figure, it is evident that many study variables are not normally distributed. A very high number of dengue cases (1438) in the Cantonment UC made the data highly skewed. The natural logarithm transformation was applied to dengue cases, built-up area, maximum LST, and population. Trail regression runs were performed using transformed variables. The log-transformed data could not produce any meaningful results—either the R2 values of the models were low, or the regression analysis could not be performed successfully using ArcGIS tools. Regression tools were then applied on data without any transformation.

Fig. 4
figure 4

Histograms of regression variables

3 Results

In this study, a GIS-based analysis was done to evaluate the spatial variation of dengue spread within the study area during the rainy season. Regression analysis was used to find out the statistical significance of the association between dengue cases and the explanatory variables (Table 1).

Table 1 Study parameters

The study parameters (both with and without log-transformation) did not show their significant linear relationship with dengue cases as shown in the scatterplots (see Fig. 5). First, OLS models were run with different sets of explanatory variables but, as expected; none could produce satisfactory results with statistical parameters including adjusted R2 values. Most of the models showed bias due to the significant statistics of Jarque–Bera (Skewness–Kurtosis) test. When this test is statistically significant (P < 0.05), the residuals are not normally distributed which implies a bias in the model predictions. Some of the runs also produced results with statistically significant Koenker test that indicate non-stationary relationships between the dependent and some or all of the explanatory variables. This suggests that even some variables may be significant predictors of the dependent variable in some regions, but they may result in a weak prediction in other locations. Based on these findings, it was concluded that the study parameters were not suitable for the OLS global model. Table 2 presents the results of these models. In the next phase, GWR techniques were employed for the development of a dengue predictive model.

Fig. 5
figure 5

Study parameters relationship with dengue cases (scatterplots)

Table 2 OLS models

The GWR tool of ArcGIS was executed for modeling dengue cases. Different trials were conducted changing the independent variables using original and log-transformed variables. Results could not be computed for log-transformed variables in any of the model run. With the original data, when all study variables were inputted, the model did not run successfully and gave errors. The error message indicates the possibility of severe global or severe local multi-co-linearity (redundancy among model independent variables). In multi-co-linearity, one predictor variable in a multiple regression model can be linearly predicted from others with a substantial degree of accuracy. Table 3: (GWR Models with a successful run) illustrates a summary of the successful models runs.

Table 3 GWR models with successful run

The GWR model 3, with ‘built-up area’ and ‘population density’ as explanatory variables, gives an adjusted R2 value of 0.774 (R2 = 0.805). This outcome indicates that by using population density and built-up area as explanatory variables the model can explain 77.4% of the variance in dengue incidences. The GWR standard residual map was further examined to test the performance of the model. The residuals showed the portions of the total variability of the observed data which were unexplained by the model, due to model under or over predictions. The standard residual map in Table 3 illustrates the areas of under predicted with positive values (where actual dengue cases were higher than the model predicted) and over predicted with negative values (where dengue cases were lower than predicted). A model is considered to be performing well when there is no clustered over-/under-prediction areas, but the noise is random. Spatial clustering of over-/under-prediction areas is an indication of missing one or more key explanatory variables in the model. The standard residual map showed that the standard residual values are a little high in one place but a little low at some other place, and there was no clear structure of model over/under predictions. The spatial autocorrelation tool (Moran’s I statistic Eq. 3) of ArcGIS can also be used on the model residuals to check whether the residuals have a random spatial pattern or not.

$$I = \frac{{n\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{J = 1}^{n} W_{i,J } Z_{i} Z_{j} }}{{S_{o} \mathop \sum \nolimits_{i = 1}^{n} Z_{I}^{2} }}$$
(3)

where Zi is the deviation of an attribute for feature i from its mean (xi − X), Wi,j is the spatial weight between feature i and j, n is equal to the number of features, and So is the aggregate of the spatial weights.

$$S_{o} = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} W_{i,j}$$
(4)

Whenever there is a structure/clustering of under/over predictions, the model is not trustworthy, and it is an indication that some key explanatory variables are missing from the model. The results of spatial autocorrelation analysis showed that the regression residuals are randomly distributed since the z-score (= 1.3) is not statistically significant. The null hypothesis of complete spatial randomness was definite, and therefore, not rejected. This confirms the randomness of the residuals required for a well-performing model.

Figure 6 presents the spatial distribution of the regression coefficients of explanatory variables built-up area and population density. The relationship between the explanatory variables and the dependent variable can be seen across the study area by mapping these coefficients. In Fig. 6, the darker areas are locations where the explanatory variables; built-up area, and population density are better predictors of dengue cases, whereas the lightly shaded areas are places where these are not.

Fig. 6
figure 6

Spatial distribution of regression coefficient – U.C. Built-up area and population density

3.1 Dengue prediction map

The GWR model can help to predict the values of dependent variables using projected values of explanatory variables. The projected values of 10% and 20%, respectively, for built-up areas and population densities in all UCs, are assumed to prepare the prediction map. The predicted dengue cases are shown in Fig. 7. In a small portion of the study area, negative predicted values indicated model inaccuracy. The regions with negative values were shaded out and discarded. The majority of the study area produced positive and reliable results that predicted risk zones.

Fig. 7
figure 7

Dengue prediction map

4 Discussion

4.1 Main findings

The OLS global regression models are not suitable for the current study. These models even with statistically significant parameters were biased and hence undesirable due to the significant statistics (P < 0.05) of the Jarque–Bera (Skewness–Kurtosis) test. The GWR technique is used to develop a dengue predictive model with built-up area and population density, and the adjusted R2 value came out to be 0.774 (R2 = 0.805). This statistics implies that together population density and built-up area explain 77% of the variance in predicting dengue incidences. Another model, with built-up area and population, had adjusted R2 value of 0.764 (R2 = 0.794), but its residual maps did not show randomness (Table 3). Other parameters fail to establish a statistically significant relationship with the dengue cases.

GWR predictive model builds a spatial relationship of dengue cases with population density and built-up area. A recent study by Hira et al. [18] has similar results identifying the built-up area as the most critical factor for the likelihood of the vector’s presence than NDVI and precipitation in another region of Pakistan.

4.2 What is already known?

Many similar studies in other regions of the world, although support this study from a big perspective, deliver different relationships of study parameters. This difference is presumably due to the diverse geographical and environmental settings in locations where these studies were undertaken. One such research, conducted in Taiwan, resulted in a GWR model between dengue data and population that explained 59% of the total model variation [25, 33]. Another study done in Indonesia used GWR and OLS regression models to find out the relationship between dengue data, population, and rainfall [10, 27]. The OLS and GWR models for this study had, respectively, R2 of 0.43 and 0.45 values. Both studies concluded that GWR models were more reliable than OLS global regression models.

4.3 New additions in knowledge

In developing countries like Pakistan where information is not readily available all the time, the collection of data is difficult and time-consuming. This research is an attempt to use satellite data to overcome this problem. The following gaps are identified in the literature.

  • Although there are various similar studies for the risk mapping of dengue using remote sensing data and GIS techniques in other parts of the world, less has been found within the study area. A model built for a particular region is based on its unique environmental settings and should not be applied to other areas without proper validation [32]. Therefore, to develop a model, it is essential to use influencing parameters explicitly derived for the specific study area under consideration.

  • In Pakistan, few attempts to monitor dengue outbreaks and to establish their relationship with influencing factors have been made using GIS or other approaches for analysis purposes such as [1,2,3, 17, 22, 28, 29]. Very limited remote sensing analyses have so far been done to derive influencing factors such as [18, 30].

Owing to the research gaps mentioned above, the need for this study was established. This study could provide local, provincial, and federal officials the necessary information to better plan and prepare for emergency response and recovery and to mitigate future threats. The efforts to reduce and control the outbreak of dengue fever can be more successful with the help of risk mapping that identifies the disease-prone areas. The potential beneficiaries of this study include, but not limited to, the health department, national and provincial disaster management authorities, and vulnerable communities. This knowledge will further enhance the capacity of decision-makers in strategizing and creating future action plans to control dengue spread. The prediction model is very advantageous as discussed below.

  • Multiple prediction maps with future values of the explanatory variables can help disaster management authorities to prepare and face all risk levels of dengue outbreaks that are likely to happen in the future [6].

  • Town planning and urban development schemes usually do not consider the health impacts of urban growth. The effects of future expansion in built-up areas on dengue outbreak can be ascertained using a predictive model [20].

  • The health insurance companies can utilize this model for adjusting plan payments by determining the relative risk of insured populations.

4.4 Limitations

The dengue incidence data, collected from the Ministry of Health (MoH) and Punjab Disaster Management Authority (PDMA), are based on house addresses. Almost 40% of incidence data did not have addresses at all or had incomplete addresses and, therefore, were not included in the analysis and removed from the database. Thus, the data used in the study are not complete and may be considered as a sample dataset that represents only 60% of the actual cases.

Moreover, to investigate the distribution pattern of dengue in the study area, the study requires to acquire data at the union council level. Since the collected data are based on house addresses, conversion at the UC level was done very carefully to fulfill the requirement of the study. Several ground surveys were conducted to verify these addresses. Still there exists a possibility of wrong assignments of these cases at UC levels owing to incomplete addresses in a comparatively large database (17,330 cases).

The scope of this study was to evaluate the spatial association between dengue cases of Lahore in 2011 and other parameters using ArcGIS and therefore, was limited to the inbuilt regression models of the software. Poisson’s regression, which is considered a better model for count variables as dengue cases, could also be tested in future studies along with OLS and GWR.

5 Conclusions

The objective of this study was to develop a dengue risk model for the Lahore District by identifying the influencing factors that significantly impact dengue outbreaks. Ordinary least-square (OLS) and geographically weighted regression (GWR) analyses were employed to develop regression models for dengue cases and study parameters. The study results indicated that the study parameters are not suitable for the OLS global model. The GWR is found to be helpful in establishing a relationship between dengue incidences and two study parameters—population density and built-up area with an adjusted R2 value of 0.774 (R2 = 0.805). Other study parameters: LST, NDVI, and NDWI, could not establish any relationship with dengue cases utilizing the two regression techniques: OLS and GWR.