Abstract
We have developed a deterministic urban scale dispersion modelling system further by adding a road dust suspension model. The system includes both vehicular exhaust emissions and suspended road dust. The modelling system was combined with a regional scale chemical transport model for calculations of concentrations in an urban area for the year 2008, and for the year 2010 measured regional background concentration was used. The time series’ were modelled for a spatial area more extensive than before using the FORE road dust suspension model. The predictions were compared against observed concentrations of PM2.5 and PM10. The use of the index of determination (r2) is discussed. We criticize the use of r2 alone as well as in addition to an index of agreement—type measure of agreement, and review the underlying data assumptions for the use of both measures. We then suggest a strategy to develop model evaluation statistical understanding, practice and nomenclature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aarnio MA, Kukkonen J, Kangas L, Kauhaniemi M, Kousa A, Hendriks C, Yli-Tuomi T, Hoek G. Brunekreef B, Elolähde T, Karppinen A (2016) Modelling of particulate matter concentrations in the Helsinki metropolitan area in 2008 and 2010. Boreal Env Res 21: 445–460
INRO (1994). EMME/2 User’s Manual. INRO Consultants Inc. Montreal, Canada.
Kauhaniemi M, Stojiljkovic A, Pirjola L, Karppinen A, Härkönen J, Kupiainen K, Kangas L, Aarnio MA, Omstedt G, Denby BR, Kukkonen J (2014) Comparison of the predictions of two road dust emission models with the measurements of a mobile van. Atmos Chem Phys Discuss 14(4):4263–4301
Robinson WS (1957) The statistical measurement of agreement. ASR 22(1):17–25. http://www.jstor.org/stable/2088760
Willmott CJ (1981) On the validation of models. Phys Geogr 2(2):184–194
Willmott CJ, Robeson SM, Matsuura K (2011) A refined index of model performance. Int J Climatol 32:2088–2094
Acknowledgements
This study has been a part of the research projects APTA (The Influence of Air Pollution, Pollen and Ambient Temperature on Asthma and Allergies in Changing Climate), and NordicWelfAir (Project #75007: Understanding the link between Air pollution and Distribution of related Health Impacts and Welfare in the Nordic countries). The funding from the Academy of Finland and the Nordforsk Nordic Programme on Health and Welfare is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Questions and Answers
Questions and Answers
Questioner name: Sebnem Aksoyoghi
Question: LOTOS-EUROS did not have SOA in the past. Did you have them in your application? Did you have biogenic emissions?
Answer: Secondary Organic Aerosols were included in the LOTOS-EUROS calculations of this work, but biogenic emissions were not included.
Questioner name: Antti Hellsten
Question/comment: Always use more than one or two metrics in model evaluation. Different metrics measure different kind of disagreements. E.g. FAC2 tells nothing about possible bias, use e.g. FB, too.
Answer: Yes, exactly. One should report several metrics of statistic analysis, but a consistent set of them. The use of the coefficient of determination, r2, however is not a good one to use with data sets that are not normally distributed. But, even more importantly, a danger of misuse comes from when the r 2 for a linear regression model equation determined with some kind of least residual sum fitting to a set of data points, e.g. (Co bs , C pred ) is used as a measure of the goodness of the model used to calculate the C pred data. This is what, for example, the Excel software produces when one makes a scatter plot of the (Cobs, Cpred) data with added “trendline” and “coefficient of correlation”.
A consistent set of statistic parameters could include the number of data points, the means and standard deviations of Cobs and Cpred, a measure of the bias (e.g. FB), a measure of the spread of the data(e.g. F2), a measure of the exactness of Cobs,i = Cpred,i for the whole data set (e.g. an index of agreement), and a statistic that would involve estimates for the measurement uncertainty.
Questioner name: Heinke Schluenzen
Question: Thanks for an elaboration on the quality measures. Why don’t you aim at an index Ia measure that considers measurement uncertainty (e.g. hit rate)?
Answer: The Helsinki Metropolitan Area Council, the source for the observed data that we used in this work, has now stated that “the measurement uncertainty <25%”. This information was not available before. Hit rate represents the fraction of (Pi, Oi)-points from the whole evaluation data set that differ within an allowed range from the diagonal of the P, O-space. The range definition can be then done using the measurement uncertainty, if it is known. (Reference: COST 732 Model Evaluation Case Studies: Approach and Results. Edited by: Schatzmann, M., Olesen H., Franke J., 2010. COST Office, Avenue Louise 149, 1050 Brussels, Belgium. 121 pp.
Questioner name: Pius Lee
Question: In your presentation of the index of agreement:
Given that both formulations attempted to quantify the variance (or uncertainty) in each of the observation. Also now the sample size n can be much more since hand-held devices are getting attention of environmental agencies and may soon be deployed as viable observations. Would d2, d1 be still a good measure of model performance when n is ten or 100 times more numerous than the current conventional “fixed regulatory monitors”. The crux of the difficulty may also lie in the fact that there the variances are large as hand-held devices are much less well standardized and their performance is expected to vary over a much larger range.
Answer: The index of agreement statistic attempts to quantify the “exactness” of the predicted variable time series compared to the observed time series. The index of agreement—statistic is always calculated for a pair of data sets that includes observed and predicted concentrations for a specific point in space, as a time series. So in the analysis of the hourly data for e.g. a year, n would always be around 8860 (or 8884 for a leap year) for a location, regardless of the number of measurement devices. If the measurement devices were mobile, then the data set of the predicted concentrations would have to be interpolated with the movement data combined with calculation point data, but each data set to be analysed would still have the standard length of the chosen study period. But this statistic, in any of its reported incarnations, would not quantify the uncertainty of the measurements in any way.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Aarnio, M.A. et al. (2018). A Model Evaluation Strategy Applied to Modelling of PM in the Helsinki Metropolitan Area. In: Mensink, C., Kallos, G. (eds) Air Pollution Modeling and its Application XXV. ITM 2016. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-57645-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-57645-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57644-2
Online ISBN: 978-3-319-57645-9
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)