Neural Computing and Applications

, Volume 31, Issue 12, pp 9023–9039 | Cite as

Regression trees modeling of time series for air pollution analysis and forecasting

  • Snezhana Georgieva Gocheva-IlievaEmail author
  • Desislava Stoyanova Voynikova
  • Maya Plamenova Stoimenova
  • Atanas Valev Ivanov
  • Iliycho Petkov Iliev
Original Article


Solving the problems related to air pollution is crucial for human health and the ecosystems in many urban areas throughout the world. The accumulation of large arrays of data with measurements of various air pollutants makes it possible to analyze these in order to predict and control pollution. This study presents a common approach for building quality nonlinear models of environmental time series by using the powerful data mining technique of classification and regression trees (CART). Predictors for modeling are time series with meteorological, atmospheric or other data, date-time variables and lagged variables of the dependent variable and predictors, involved as groups. The proposed approach is tested in empirical studies of the daily average concentrations of atmospheric PM10 (particulate matter 10 μm in diameter) in the cities of Ruse and Pernik, Bulgaria. A 1-day-ahead forecasts are obtained. All models are cross-validated against overfitting. The best models are selected using goodness-of-fit measures, such as root-mean-square error and coefficient of determination. Relative importance of the predictors and predictor groups is obtained and interpreted. The CART models are compared with the corresponding models built by using ARIMA transfer function methodology, and the superiority of CART over ARIMA is demonstrated. The practical applicability of the models is assessed using 2 × 2 contingency tables. The results show that CART models fit well the data and correctly predict about 90% of measured values of PM10 with respect to the average daily European threshold value of 50 µg/m3.


Air pollution modeling Time series Classification and regression trees (CART) Pollution forecast 

Mathematics Subject Classification

62M10 62M20 62P12 



This work was supported by the Grant No. BG05M2OP001-1.001-0003, financed by the Science and Education for Smart Growth Operational Program (2014–2020), co-financed by the European Union through the European structural and Investment funds. We want to express our gratitude to the independent reviewers for the valuable advice and feedback, which helped improve the scientific value of this study.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Directive 2008/50/EC of the European Parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe (2008) Official Journal of the European Union L 152/1. Accessed 15 July 2019
  2. 2.
    Air Quality Standards (2015) European Commission. Environment. Accessed 15 July 2019
  3. 3.
    Box GEP, Jenkins GM, Reinsel GS (1994) Time series analysis, forecasting and control, 3rd edn. Prentice-Hall Inc., Upper Saddle RiverzbMATHGoogle Scholar
  4. 4.
    Liu PWG (2009) Simulation of the daily average PM10 concentrations at Ta-Liao with Box–Jenkins time series models and multivariate analysis. Atmos Environ 43:2104–2113. CrossRefGoogle Scholar
  5. 5.
    Pohoata A, Lungu E (2017) A complex analysis employing ARIMA model and statistical methods on air pollutants recorded in Ploiesti, Romania. Rev Chim 68(4):818–823Google Scholar
  6. 6.
    Stoimenova M (2016) Stochastic modeling of problematic air pollution with particulate matter in the city of Pernik, Bulgaria. Ecol Balk 8(2):33–41Google Scholar
  7. 7.
    Zheleva I, Veleva E, Filipova M (2017) Analysis and modeling of daily air pollutants in the city of Ruse. Bulgaria. AIP Conf Proc 1895:030007. CrossRefGoogle Scholar
  8. 8.
    Zhang PG (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175. CrossRefzbMATHGoogle Scholar
  9. 9.
    Lee NU, Shim JS, Ju YW, Park SC (2017) Design and implementation of the SARIMA–SVM time series analysis algorithm for the improvement of atmospheric environment forecast accuracy. Soft Comput. CrossRefGoogle Scholar
  10. 10.
    Nieto PJG, Lasheras FS, García-Gonzalo E, de Cos Juez FJ (2018) PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: a case study. Sci Total Environ 621:753–761. CrossRefGoogle Scholar
  11. 11.
    Zhang H, Zhang S, Wang P, Qin Y, Wang H (2017) Forecasting of particulate matter time series using wavelet analysis and wavelet-ARMA/ARIMA model in Taiyuan, China. J Air Waste Manag Assoc 67(7):776–788. CrossRefGoogle Scholar
  12. 12.
    Biancofiore F, Busilacchio M, Verdecchia M, Tomassetti B, Aruffo E, Bianco S, Di Tommaso S, Colangeli C, Rosatelli G, Di Carlo P (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos Pollut Res 8(4):652–659. CrossRefGoogle Scholar
  13. 13.
    Franceschi F, Cobo M, Figueredo M (2018) Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using Artificial Neural Networks, Principal Component Analysis, and k-means clustering. Atmos Pollut Res 9(5):912–922. CrossRefGoogle Scholar
  14. 14.
    Bougoudis I, Demertzis K, Iliadis L (2016) HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air pollution modelling in Athens. Neural Comput Appl 27(5):1191–1206. CrossRefGoogle Scholar
  15. 15.
    Abderrahim H, Chellali MR, Hamou A (2016) Forecasting PM10 in Algiers: efficacy of multilayer perceptron networks. Environ Sci Pollut Res 23(2):1634–1641. CrossRefGoogle Scholar
  16. 16.
    Prakash A, Kumar U, Kumar K, Jain V (2011) A wavelet-based neural network model to predict ambient air pollutants’ concentration. Environ Model Assess 16(5):503–517. CrossRefGoogle Scholar
  17. 17.
    Morgan JN, Sonquist JA (1963) Problems in an analysis of survey data and a proposal. J Am Stat Assoc 58:415–434CrossRefGoogle Scholar
  18. 18.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Advanced Books and Software, BelmontzbMATHGoogle Scholar
  19. 19.
    Burrows WR, Benjamin M, Beauchamp S, Lord ER, McCollor D, Thomson B (1995) CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J Appl Meteorol 34:1848–1862.<1848:CDTSAA>2.0.CO;2 CrossRefGoogle Scholar
  20. 20.
    Slini T, Kaprara A, Karatzas K, Moussiopoulos N (2006) PM10 forecasting for Thessaloniki, Greece. Environ Model Softw 21(4):559–565. CrossRefGoogle Scholar
  21. 21.
    Zickus M, Greig AJ, Niranjan M (2002) Comparison of four machine learning methods for predicting PM10 concentrations in Helsinki, Finland. Water Air Soil Pollut Focus 2:717–729. CrossRefGoogle Scholar
  22. 22.
    Choi W, Paulson SE, Casmassi J, Winer AM (2013) Evaluating meteorological comparability in air quality studies: classification and regression trees for primary pollutants in California’s South Coast Air Basin. Atmos Environ 64:150–159. CrossRefGoogle Scholar
  23. 23.
    Sayegh A, Tate JE, Ropkins K (2016) Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees. Atmos Environ 127:163–175. CrossRefGoogle Scholar
  24. 24.
    Stoimenova M, Voynikova D, Ivanov A, Gocheva-Ilieva S, Iliev I (2017) Regression trees modeling and forecasting of PM10 air pollution in urban areas. AIP Conf Proc 1895:030005. CrossRefGoogle Scholar
  25. 25.
    Lewis PAW, Stevens JG (1991) Nonlinear modeling of time series using multivariate adaptive regression splines (MARS). J Am Stat Assoc 86(416):864–877. CrossRefzbMATHGoogle Scholar
  26. 26.
    Weber G-W, Batmaz I, Köksal G, Taylan P, Yerlikaya-Özkurt F (2012) CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl Sci Eng 20(3):371–400. MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Özmen A, Weber G-W, Batmaz I (2010) The new robust CMARS (RCMARS) method. In: Kasımbeyli R, Dinçer C, Özpeynirci S, Sakalauskas L (eds) 24th mini EURO conference on continuous optimization and information-based technologies in the financial sector, MEC EurOPT 2010, pp 362–368Google Scholar
  28. 28.
    Özmen A, Weber GW (2012) Robust conic generalized partial linear models using RCMARS method—a robustification of CGPLM. AIP Conf Proc 1499:337–343. CrossRefGoogle Scholar
  29. 29.
    Özmen A, Weber G-W (2014) RMARS: Robustification of multivariate adaptive regression spline under polyhedral uncertainty. J Comput Appl Math 259(Part B):914–924. MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Özmen A, Batmaz İ, Weber G-W (2014) Precipitation modeling by polyhedral RCMARS and comparison with MARS and CMARS. Environ Model Assess 19(5):425–435. CrossRefGoogle Scholar
  31. 31.
    Kuter S, Weber G-W, Akyürek Z, Özmen A (2015) Inversion of top of atmospheric reflectance values by conic multivariate adaptive regression splines. Inverse Probl Sci Eng 23(4):651–669. CrossRefGoogle Scholar
  32. 32.
    Kartal-Koç E, Iyigun C, Batmaz I, Weber G-W (2014) Efficient adaptive regression spline algorithms based on mapping approach with a case study on finance. J Glob Optim 60(1):103–120. MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Çevik A, Weber G-W, Eyüboğlu BM, Oğuz KK (2017) Voxel-MARS: a method for early detection of Alzheimer’s disease by classification of structural brain MRI. Ann Oper Res 258(1):31–57. MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Özmen A, Yılmaz Y, Weber G-W (2018) Natural gas consumption forecast with MARS and CMARS models for residential users. Energy Econ 70:357–381. CrossRefGoogle Scholar
  35. 35.
    Roy SS, Pratyush C, Barna C (2018) Predicting ozone layer concentration using multivariate adaptive regression splines, random forest and classification and regression tree. Adv Intell Syst Comput 634:140–152. CrossRefGoogle Scholar
  36. 36.
    Nieto PJG, Álvarez JCA (2014) Nonlinear air quality modeling using multivariate adaptive regression splines in Gijón urban area (Northern Spain) at local scale. Appl Math Comput 235:50–65. CrossRefGoogle Scholar
  37. 37.
    Shahraiyni TH, Sodoudi S (2016) Statistical modeling approaches for PM10 prediction in urban areas: a review of 21st-century studies. Atmosphere 7(2):15. CrossRefGoogle Scholar
  38. 38.
    Bai L, Wang J, Ma X, Lu H (2018) Air pollution forecasts: an overview. Int J Environ Res Public Health 15(780):1–44. CrossRefGoogle Scholar
  39. 39.
    Salford Systems Data Mining and Predictive Analytics Software Modeler, SPM Version 8.0 (2016). Salford Systems, San Diego, CAGoogle Scholar
  40. 40.
  41. 41.
    Wolfram Mathematica system. Accessed 15 July 2019
  42. 42.
    Steinberg D, Golovnya M (2007) CART 6.0 user’s guide. Salford Systems, San DiegoGoogle Scholar
  43. 43.
    Death G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192.[3178:CARTAP]2.0.CO;2 CrossRefGoogle Scholar
  44. 44.
    Wu X, Kumar V (2009) The top ten algorithms in data mining. Chapman & Hall/CRC, Boca RatonCrossRefGoogle Scholar
  45. 45.
    Izenman J (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New YorkCrossRefGoogle Scholar
  46. 46.
    Burnham KP, Anderson DR (2002) Model selection and inference: a practical information-theoretic approach, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  47. 47.
    Ljung GM, Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65:297–303. CrossRefzbMATHGoogle Scholar
  48. 48.
    De Gooijer JG, Kumar K (1992) Some recent developments in non-linear time series modelling, testing, and forecasting. Int J Forecast 8:135–156. CrossRefGoogle Scholar
  49. 49.
    Wilks DS (2011) Statistical methods in the atmospheric sciences, 3rd edn. Elsevier, AmsterdamGoogle Scholar
  50. 50.
    Dockery DW, Pope CA (1994) Acute respiratory effects of particulate air pollution. Annu Rev Public Health 15:107–132. CrossRefGoogle Scholar
  51. 51.
    Yin P, He G, Fan M, Chiu KY, Fan M, Liu C, Xue A, Liu T, Pan Y, Mu Q, Zhou M (2017) Particulate air pollution and mortality in 38 of China’s largest cities: time series analysis. Brit Med J 356:j667. CrossRefGoogle Scholar
  52. 52.
    Katsouyanni K, Touloumi G, Spix C, Schwartz J, Balducci F, Medina S, Rossi G, Wojtyniak B, Sunyer J, Bacharova L (1997) Short term effects of ambient sulphur dioxide and particulate matter on mortality in 12 European cities: results from time series data from the APHEA project. Brit Med J 314:1658–1663. CrossRefGoogle Scholar
  53. 53.
    European Environment Agency (2017) Air quality in Europe—2017 report, EEA Report 13. Accessed 15 July 2019
  54. 54.
    European Environment Agency (2018) Air quality in Europe—2018 report, EEA Report 12. Accessed 15 July 2019
  55. 55.
    National System for Environmental Monitoring, Bulgaria (2013). Accessed 15 July 2019
  56. 56.
    Executive Environment Agency (ExEA), Bulgaria. Accessed 15 July 2019
  57. 57.
    Air Quality Guidelines for Europe (2000) 2nd edn, World Health Organization (WHO), Regional Office for Europe, Copenhagen. Accessed 15 July 2019
  58. 58.
    Regional Inspectorate of Environment and Water—Ruse, Reports on the state of the environment (2011–2016). Accessed 15 July 2019 (in Bulgarian)
  59. 59.
    RIOSV Pernik: Report on the state of air quality (2010–2014). (in Bulgarian). Accessed 15 July 2019
  60. 60.
    Ruse Historical Weather. Accessed 15 July 2019
  61. 61.
    Pernik Historical Weather. Accessed 15 July 2019
  62. 62.
    ALADIN Project for weather forecasts, Bulgaria (2019). Accessed 15 July 2019

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Snezhana Georgieva Gocheva-Ilieva
    • 1
    Email author
  • Desislava Stoyanova Voynikova
    • 1
  • Maya Plamenova Stoimenova
    • 1
  • Atanas Valev Ivanov
    • 1
  • Iliycho Petkov Iliev
    • 2
  1. 1.Department of Applied Mathematics and Modeling, Faculty of Mathematics and InformaticsUniversity of Plovdiv Paisii HilendarskiPlovdivBulgaria
  2. 2.Department of PhysicsTechnical University – SofiaPlovdivBulgaria

Personalised recommendations