Improving resolution of a spatial air pollution inventory with a statistical inference approach
Abstract
This paper presents a novel approach to allocation of spatially correlated data, such as emission inventories, to finer spatial scales, conditional on covariate information observable in a fine grid. Spatial dependence is modelled with the conditional autoregressive structure introduced into a linear model as a random effect. The maximum likelihood approach to inference is employed, and the optimal predictors are developed to assess missing values in a fine grid. An example of ammonia emission inventory is used to illustrate the potential usefulness of the proposed technique. The results indicate that inclusion of a spatial dependence structure can compensate for less adequate covariate information. For the considered ammonia inventory, the fourfold allocation benefited greatly from incorporation of the spatial component, while for the ninefold allocation this advantage was limited, but still evident. In addition, the proposed method allows correction of the prediction bias encountered for the upper range emissions in the linear regression models.
Keywords
Mean Square Error Coarse Grid Fine Grid Emission Inventory Ammonia Emission1 Introduction
The development of highresolution emission inventories is essential for designing suitable abatement measures. Spatial distributions of emissions can serve as an input for atmospheric dispersion models, which in turn may produce concentration maps of pollutants contributing to the adverse health effects, like ammonia emissions. For other air pollutants, such as greenhouse gasses (GHG), spatial patterns become helpful in improving identification of distributed emission sources.
Numerous issues underlying preparation of spatially resolved GHG inventory were discussed e.g. in Boychuk and Bun (this issue), Bun et al. 2010 or Thiruchittampalam et al. 2010. In general, the task crucially depends on availability of spatially distributed activity data. For instance, at present in Poland the activity data relevant to GHG emissions can be obtained at a level of country regions (voivodships). Information of higher spatial resolution can be often obtained only for some proxy data related to GHG emissions, such as land use and linear emission sources. Recently, also nighttime lights observed by a satellite have been used for more accurate estimation of spatial distribution of CO_{2} emissions (Ghosh et al. 2010; Oda and Maksyutov 2011).
Typically, the regression models have been applied for spatial allocation of emission data (Dragosits et al. 1998; Oda and Maksyutov 2011). However, emissions in general tend to be spatially correlated, which provides opportunity for potential improvements. This idea motivated us to develop a more advanced approach for accurate disaggregation of air pollution data.
Making inference on variables at points or grid cells different from those of the data is referred to as the change of support problem (Gelfand 2010). Several approaches have been proposed to address this issue. The geostatistical solution for realignment from point to a real data is provided by block kriging (Gotway and Young 2002). Areal weighting offers a straightforward approach if the data are observed at a real units, and the inference is sought at a new level of spatial aggregation. Some improved approaches with better covariate modeling were also proposed e.g. in Mugglin and Carlin 1998 and Mugglin et al. 2000.
In this study we propose to apply methods of spatial statistics to produce higher resolution emission inventory data, taking advantage of more detailed land use information. The approach resembles to some extent the method of Chow and Lin (1971), originally proposed for disaggregation of time series based on related, higher frequency series. Here, a similar methodology is employed to disaggregate spatially correlated data.
Regarding an assumption on residual covariance, we apply the structure suitable for areal data, i.e. the conditional autoregressive (CAR) model. Although the CAR specification is typically used in epidemiology (Banerjee et al. 2004), it was also successfully applied for modelling air pollution over space (Kaiser et al. 2002; McMillan et al. 2010). Compare also Horabik and Nahorski (2010) for another application of the CAR structure to model spatial inventory of GHG emissions. The maximum likelihood approach to inference is employed, and the optimal predictors are developed to assess missing concentrations in a fine grid.
The application part of the study concerns an ammonia (NH_{3}) emission inventory in a region of Poland. Ammonia is emitted mainly by agricultural sources such as livestock production and fertilized fields. Its high concentrations can lead to acidification of soils, forest decline, and eutrophication of waterways. Ammonia emissions are also recognized for their importance in contributing to fine particulate matter; hence its spatial distribution is of great importance. However, agricultural emission sources cannot be measured directly, and spatial emission patterns need to be assessed otherwise. This issue was addressed, among others, by Dragosits et al. 1998, where agricultural and land cover data were used to disaggregate the national NH_{3} emission totals across Great Britain. We demonstrate that the straightforward approaches based on linear dependences might be improved by introducing a spatial random effect.
Nevertheless, the proposed approach is of wider applicability, and can be used in numerous situations where higher resolution of spatial data is needed. In the context of the greenhouse gasses, the method might be particularly adequate to improve resolution of these activity data which tend to be spatially correlated. The plausible sectors include agriculture, transportation and forestry. Improved resolution may in turn contribute to reduction in uncertainties underlying GHG inventories.
2 Disaggregation framework
This section presents the statistical approach to the issue of spatial disaggregation. We have available data on a spatially distributed variable (inventory of emissions) integrated in a coarse grid. The aim is to estimate the distribution of this variable in a fine grid, conditional on some explanatory variables observable in this grid. It is assumed that the variable of interest is spatially correlated. Its residual covariance structure is set and the conditional autoregressive model is applied. An additional important assumption of the method is that the covariance structure of the variable in a coarse grid is the same as that in a fine grid.
Below we specify the model and provide details on its estimation in the coarse grid as well as on prediction in the fine grid.
2.1 Model
Fine grid
Coarse grid
2.2 Estimation and prediction
Further maximisation of L(σ _{ Z } ^{2} ,τ ^{2},ρ) is performed numerically, including checks on ρ to ensure that the matrix D − ρ W is nonsingular, see Banerjee et al. (2004).
To obtain the standard errors of the estimated parameters, one needs to derive the Fisher information matrix. The asymptotic variancecovariance matrix of the ML estimators is obtained by inverting the expectation of the negative of the second derivatives (the Hessian) of the log likelihood function, and the expectation is evaluated at the ML estimates. In other words, the expected Fisher information matrix is used to obtain the standard errors of parameters. Calculation of the Hessian with respect to the regression coefficients is relatively straightforward, but it becomes more burdensome for the covariance parameters. A detailed derivation of the explicit formulas for the expected Fisher information matrix will be provided elsewhere; here we report the standard errors of the parameter estimators obtained in the case study.
3 Case study
3.1 Data

Nonirrigated arable land (211), denoted x _{1} = (x _{1,1}, …,x _{ n,1})^{ T };

Fruit tree and berry plantations (222), denoted x _{2} = (x _{1,2}, …,x _{ n,2})^{ T };

Pastures (231), denoted x _{3} = (x _{1,3}, …,x _{ n,3})^{ T };

Complex cultivation patterns (242), denoted x _{4} = (x _{1,4}, …,x _{ n,4})^{ T };

Principally agriculture, with natural vegetation (243), denoted x _{5} = (x _{1,5}, …,x _{ n,5})^{ T }.
Performance of the proposed disaggregation framework depends on a few factors. Perhaps the most crucial ones are the following two: (i) explanatory power of covariates available in the fine grid, and (ii) an extent of disaggregation, which is connected with preservation of the spatial correlation. The impact of both these features will be evaluated in our case study.

Model CAR1:  CAR errors, set 1 of covariates;

Model LM1:  iid errors, set 1 of covariates;

Model CAR2:  CAR errors, set 2 of covariates;

Model LM2:  iid errors, set 2 of covariates.
This setting of four models is intended to enable the analysis of extent to which a limited number of explanatory information can be compensated by spatial modelling.
Regarding the second factor, we test the disaggregation from 10 × 10 km to 15 × 15 km (coarse) grids into a 5 km × 5 km (fine) grid. To examine performance of the disaggregation procedure, first the original fine grid emissions are aggregated into respective coarse grid cells. Next, the proposed model is fitted and ammonia emissions are predicted for a 5 km × 5 km (fine) grid. Finally, the obtained results are checked with the original inventory emissions of a 5 km × 5 km (fine) grid. Thus, our simulation study tests the cases of a fourfold and ninefold disaggregation. The aggregated values of the two coarse grids as well as the actual inventory data in the fine grid are shown in Fig. 1.
3.2 Results of disaggregation from the 10 km grid
Maximum likelihood estimates
CAR1  LM1  CAR2  LM2  

Est.  Std.Err.  Est.  Std.Err.  Est.  Std.Err.  Est.  Std.Err.  
10 km grid  
β_{0}  –  –  –  –  0.386  9.29e02  0.452  5.45e02 
β_{1}  1.13e07  3.26e09  1.09e07  2.46e09  1.06e07  5.03e09  9.58e08  4.43e09 
β_{2}  2.56e07  1.94e07  4.48e07  1.97e07  –  –  –  – 
β_{3}  9.77e08  1.19e08  1.08e07  1.08e08  –  –  –  – 
β_{4}  1.18e07  2.13e08  1.21e07  1.76e08  1.27e07  2.72e08  1.60e07  2.22e08 
β_{5}  1.27e07  1.32e08  1.35e07  1.11e08  –  –  –  – 
σ _{ Z } ^{2}  0.334  0.073  1.165  0.109  0.522  0.111  1.95  0.184 
τ^{2}  0.536  0.082  –  –  0.807  0.124  –  – 
ρ  0.948  9.98e04  –  –  0.972  9.98e04  –  – 
15 km grid  
β_{0}  –  –  –  –  0.424  1.04e01  0.476  6.82e02 
β_{1}  1.12e07  3.95e09  1.09e07  3.42e09  1.00e07  7.01e09  9.35e08  5.79e09 
β_{2}  –  –  –  –  –  –  –  – 
β_{3}  1.07e07  1.84e08  1.16e07  1.55e08  –  –  –  – 
β_{4}  1.24e07  2.77e08  1.29e07  2.34e08  1.56e07  3.65e08  1.75e07  2.79e08 
β_{5}  1.27e07  1.65e08  1.33e07  1.49e08  –  –  –  – 
σ _{ Z } ^{2}  2.339  0.424  3.50  0.474  2.681  0.548  5.55  0.753 
τ^{2}  0.214  0.088  –  –  0.414  0.088  –  – 
ρ  0.966  4.91e04  –  –  0.982  5.55e05  –  – 
Model comparison and analysis of residuals (d _{ i } = y _{ i } − y _{ i } ^{*} )
Model  L  AIC  MSE  min(d _{ i })  max(d _{ i })  r 

10 km grid  
CAR1  312.3  640.7  0.064  −1.717  1.104  0.961 
LM1  336.5  685.1  0.186  −2.544  0.268  0.882 
CAR2  365.4  742.8  0.158  −1.917  1.362  0.901 
LM2  394.8  797.6  0.291  −2.498  1.765  0.808 
15 km grid  
CAR1  220.6  455.3  0.136  −2.428  0.646  0.915 
LM1  222.9  455.9  0.189  −2.600  0.516  0.880 
CAR2  240.4  492.8  0.190  −2.132  1.446  0.880 
LM2  248.1  504.4  0.295  −2.511  1.746  0.807 
At this point it must be stressed that the values predicted in a fine grid (y _{ i } ^{*} ) are calculated with the formula (11) based on the aggregated values of 10 km grid; the calculations are made as if the true emissions were unknown. On the other hand, recall that these true emissions in the fine grid (y _{ i }) are available; see the lefthandside map in Fig. 1. From now on, our analysis is based on a comparison between the prediction results obtained with the proposed technique and the original fine grid ammonia emissions (observations).
3.3 Results of disaggregation from the 15 km grid
Next, we present the results of disaggregation from the 15 km grid. The conducted analysis is similar to the one of the 10 km grid and, where appropriate, both settings are compared.
The lower part of Table 1 contains the maximum likelihood estimates for the 15 km grid data. In the models with set 1 of covariates, the regression coefficient β_{0} was again dropped. Moreover, in all the models at this level of aggregation the land use class “Fruit tree and berry plantations” (β_{2}) was statistically insignificant, and thus it was also dropped. The remaining land use classes were informative, with respective pvalues lower than 0.05.
As regards the error part, all the comments reported for 10 km disaggregation remain valid also here, although their degree is significantly lower. Both CAR models provide lower values of σ _{ Z } ^{2} than their linear regression counterparts. However, the reduction of unexplained variability between the models, for instance, LM1 and CAR1 is only 1.5 (3.5/2.339), while it was over 3 (1.165/0.334) for respective models of 10 km disaggregation. This suggests that the spatial correlation strength of the 15 km grid model is smaller than the 10 km grid one. Thus, here the CAR models are less competitive than the LM models, as compared to the former grid.
The values of AIC criterion and of the negative log likelihood (L) are reported in the lower part of Table 2. Similarly as for the disaggregation from a 10 km grid, also in this case the models based on set 1 of covariates provide better results. The CAR structure improves obtained linear regression results of both respective covariate sets. Note, however, that in the setting of 15 km disaggregation, the impact of the spatial component is not that substantial anymore as it was previously. Again, a bigger improvement is noted for the models with a limited number of covariates (504.4–492.8 = 11.6 in terms of the AIC criterion), and the gain from incorporation of the spatial component is only marginal for the models with set 1 of covariates (455.9–455.3 = 0.6).
Again, Table 2 (the lower part) provides further analysis of residuals. The mean squared error MSE and the correlation coefficient r yield a consistent ranking of the models. Obviously the best model is CAR1 with r = 0.915 and MSE = 0.136, while the poorest one is LM2 with r = 0.807 and MSE = 0.295. When it comes to the remaining two models, LM1 slightly outperforms CAR2 (in terms of the mean squared error). Note that this order is reversed when compared with the results of the 10 km grid disaggregation (the upper part of the table). Therefore, when disaggregating from the 10 km grid, the spatial structure is more informative than some of the covariates, but this is not true anymore when disaggregating from the 15 km grid. From this we conclude that in this particular case study, the proposed framework offers an efficient tool for a quadruple and ninetimes disaggregation, but it may become less adequate for higher order allocations.
4 Discussion and conclusions
The major objective of this study was to demonstrate how a variable of interest (here, emissions) available in a coarse grid plus information on covariates available in a finer grid can be combined together to provide the variable of interest in a finer grid, and therefore to improve its spatial resolution. We proposed a relevant disaggregation model and illustrated the approach using a real dataset of ammonia emission inventory. The idea is conceptually similar to the method of Chow and Lin (1971), originally designed for time series data; see also Polasek et al. (2010). It was applied to the spatially correlated data, and spatial dependence was modelled with the conditional autoregressive structure introduced into a linear model as a random effect.
The model allows for this part of a spatial variation which has not been explained by available covariates. Thus, if the covariate information does not correctly reflect a spatial distribution of a variable of interest, there is potential for improving the approach with a relevant model of a spatial correlation. The underlying assumption of the method is that the covariance structures of the variable in a coarse grid and in a fine grid are the same. In the present study of ammonia emissions examined in 5 km, 10 km, and 15 km grids, this assumption proved to be reasonable.
Performance of the proposed framework was evaluated with respect to the following two factors: explanatory power of covariates available in a fine grid, and the extent of disaggregation. The results indicate that inclusion of a spatial dependence structure can compensate for less adequate covariate information. For the considered ammonia inventory, the fourfold allocation benefited greatly from the incorporation of the spatial component, while for the ninefold allocation this advantage was limited, but still evident. In addition, the proposed method allowed to correct the prediction bias encountered for upper range emissions in the linear regression models.
We note that in this case study we used the original data in a fine grid to assess the quality of resulting predictions. For the purpose of potential applications, we developed also a relevant measure of prediction error (the formula 12). Although not entirely faultless, it is the first attempt to quantify the prediction error in situations, where original emissions in a fine grid are not known.
Other approaches, such as a geostatistical model, might be potentially used in the case of spatial allocation. Application of the geostatistical approach brings us to the concept of block kriging (Gelfand 2010). However, it should be stressed that geostatistics is more appropriate for point referenced data, while our proposition is dedicated to the case of emission inventories which involve a real data. Thus, the choice between these two options should be considered on a case by case basis.
Another possibility to deal with the issue of spatial disaggregation could be to use some expert knowledge and logical inference; compare Verstraete (this issue) for a fuzzy inference system to the map overlay problem.
The described method opens the way to uncertainty reduction of spatially explicit emission inventories, hence the future work will also include testing the proposed disaggregation framework for inventories of greenhouse gasses.
Notes
Acknowledgments
The study was conducted within the 7FP Marie Curie Actions IRSES project No. 247645. J. Horabik acknowledges support from the Polish Ministry of Science and Higher Education within the funds for statutory works of young scientists. This contribution is also supported by the Foundation for Polish Science under International PhD Projects in Intelligent Computing; project financed from The European Union within the Innovative Economy Operational Programme 2007–2013 and European Regional Development Fund. Z. Nahorski was financially supported by the statutory funds of the Systems Research Institute of Polish Academy of Sciences.
The authors gratefully acknowledge the provision of data for the case study from Ekometria – Biuro Studiów i Pomiarów Proekologicznych in Gdańsk, Poland.
References
 Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, Boca RatonGoogle Scholar
 Boychuk K, Bun R (this issue) Regional spatial cadastres of GHG emissions in energy sector: accounting for uncertaintyGoogle Scholar
 Bun R, Hamal K, Gusti M, Bun A (2010) Spatial GHG inventory at the regional level: accounting for uncertainty. Clim Chang 103(1–2):227–244CrossRefGoogle Scholar
 Chow GC, Lin A (1971) Best linear unbiased interpolation, distribution, and extrapolation of time series by related series. Rev Econ Stat 53(4):372–375CrossRefGoogle Scholar
 Cressie NAC (1993) Statistics for spatial data. John Wiley & Sons, New YorkGoogle Scholar
 Dragosits U, Sutton MA, Place CJ, Bayley AA (1998) Modelling the spatial distribution of agricultural ammonia emissions in the UK. Environ Pollut 102(S1):195–203CrossRefGoogle Scholar
 European Environment Agency (2010) Corine land cover 2000. http://www.eea.europa.eu/dataandmaps/data. Cited August 2010
 Gelfand AE (2010) Misaligned spatial data: the change of support problem. In: Gelfand AE, Diggle PJ, Fuentes M, Guttorp P (eds) Handbook of spatial statistics. Chapman & Hall/CRC, Boca RatonGoogle Scholar
 Ghosh T, Elvidge CD, Sutton PC et al (2010) Creating a global grid of distributed fossil fuel CO_{2} emissions from nighttime satellite imagery. Energies 3:1895–1913CrossRefGoogle Scholar
 Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97:632–648CrossRefGoogle Scholar
 Horabik J, Nahorski Z (2010) A statistical model for spatial inventory data: a case study of N_{2}O emissions in municipalities of southern Norway. Clim Chang 103(1–2):263–276CrossRefGoogle Scholar
 Kaiser MS, Daniels MJ, Furakawa K, Dixon P (2002) Analysis of particulate matter air pollution using Markov random field models of spatial dependence. Environmetrics 13:615–628CrossRefGoogle Scholar
 Lindley DV, Smith AFM (1972) Bayes estimates for the linear model. J Roy Stat Soc B 34:1–41Google Scholar
 McMillan AS, Holland DM, Morara M, Fend J (2010) Combining numerical model output and particulate data using Bayesian spacetime modeling. Environmetrics 21:48–65Google Scholar
 Mugglin AS, Carlin BP (1998) Hierarchical modeling in geographical information systems: population interpolation over incompatible zones. J Agric Biol Environ Stat 3:111–130CrossRefGoogle Scholar
 Mugglin AS, Carlin BP, Gelfand AE (2000) Fully modelbased approaches for spatially misaligned data. J Am Stat Assoc 95:877–887CrossRefGoogle Scholar
 Oda T, Maksyutov S (2011) A very highresolution (1 km × 1km) global fossil fuel CO_{2} emission inventory derived using a point source database and satellite observations of nighttime lights. Atmos Chem Phys 11:543–556CrossRefGoogle Scholar
 Polasek W, Llano C, Sellner R (2010) Bayesian methods for completing data in spatial models. Rev Econ Anal 2:194–214Google Scholar
 Thiruchittampalam B, Theloke J, Uzbasich M et al. (2010) Analysis and comparison of uncertainty assessment methodologies for high resolution Greenhouse Gas emission models. In: Proceedings of the 3rd International Workshop on Uncertainty in Greenhouse Gas Inventories, Lviv Polytechnic National University, Ukraine, 22–24 Sept 2010Google Scholar
 Verstraete J (this issue) Solving the map overlay problem with a fuzzy approachGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.