Application of penalized linear regression and ensemble methods for drought forecasting in Northeast China

  • Zeng Li
  • Taotao Chen
  • Qi Wu
  • Guimin Xia
  • Daocai ChiEmail author
Original Paper


Effective drought prediction can be conducive to mitigating some of the effects of drought. Machine learning algorithms are increasingly used for developing drought prediction models due to their high efficiency and accuracy. This study explored the ability of several machine learning models based on penalized linear regression and decision tree (DT)-based ensemble methods to predict drought conditions represented by the Standardized Precipitation–Evapotranspiration Index (SPEI) in Northeast China. We compared the forecasting performance of the penalized linear regression models based on ridge regression (RR) and lasso regression (LR) with the ordinary least squares (OLS) regression model. In addition, the AdaBoost and Random Forests (RF) models were also used to explore the suitability of ensemble methods for improving the forecasting performance. The SPEI was forecast at the different timescales of 3, 6, 12, and 24 months using the aforementioned machine learning models and the indices were used to predict short-term and long-term drought conditions. The prediction results indicated that the penalized linear regression models provided better prediction results and the ensemble methods consistently outperformed the DT model. Overall, the LR models were the optimum models for forecasting the SPEI at different timescales in Northeast China.


Drought forecasting Standardized precipitation Evapotranspiration index (SPEI) Penalized linear regression Ensemble methods Machine learning 



This work was supported by the National Science Foundation of China (Grants Nos. 51679142 and 51709173).

Supplementary material

703_2019_675_MOESM1_ESM.pdf (411 kb)
Supplementary file1 (PDF 411 kb)


  1. Ali Z et al (2017) Forecasting drought using multilayer perceptron artificial neural network model. Adv Meteorol. Google Scholar
  2. Azad A, Manoochehri M, Kashi H, Farzin S, Karami H, Nourani V, Shiri J (2019) Comparative evaluation of intelligent algorithms to improve adaptive neuro-fuzzy inference system performance in precipitation modelling. J Hydrol 571:214–224. CrossRefGoogle Scholar
  3. Bachmair S, Svensson C, Hannaford J, Barker L, Stahl K (2016) A quantitative analysis to objectively appraise drought indicators and model drought impacts. Hydrol Earth Syst Sci 20:2589–2609CrossRefGoogle Scholar
  4. Bachmair S, Svensson C, Prosdocimi I, Hannaford J, Stahl K (2017) Developing drought impact functions for drought risk management. Nat Hazards Earth Syst Sci 17:1947–1960. CrossRefGoogle Scholar
  5. Beguería S, Vicente-Serrano SM, Reig F, Latorre B (2014) Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring. Int J Climatol 34:3001–3023. CrossRefGoogle Scholar
  6. Belayneh A, Adamowski J, Khalil B, Quilty J (2016) Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmos Res 172:37–47CrossRefGoogle Scholar
  7. Borji M, Malekian A, Salajegheh A, Ghadimi M (2016) Multi-time-scale analysis of hydrological drought forecasting using support vector regression (SVR) and artificial neural networks (ANN). Arab J Geosci 9:725CrossRefGoogle Scholar
  8. Botai C, Botai J, Dlamini L, Zwane N, Phaduli E (2016) Characteristics of droughts in South Africa: a case study of free state and north west provinces. Water 8:439CrossRefGoogle Scholar
  9. Breiman L (1996) Bagging predictors machine learning 24:123–140Google Scholar
  10. Breiman L (2001) Random forests machine learning 45:5–32CrossRefGoogle Scholar
  11. Byakatonda J, Parida B, Kenabatho P, Moalafhi D (2016) Modeling dryness severity using artificial neural network at the Okavango Delta. Botswana Glob Nest J 18:463–481CrossRefGoogle Scholar
  12. Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, ACM, pp 96–103Google Scholar
  13. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on Machine learning, ACM, pp 161–168Google Scholar
  14. Chen T, Xia G, Liu T, Chen W, Chi D (2016) Assessment of drought impact on main cereal crops using a standardized precipitation evapotranspiration index in Liaoning Province. China Sustain 8:1069CrossRefGoogle Scholar
  15. Cook BI, Smerdon JE, Seager R, Coats S (2014) Global warming and 21st century drying. Clim Dyn 43:2607–2627CrossRefGoogle Scholar
  16. Dai A (2011) Drought under global warming: a review. Wiley Interdiscip Rev Clim Change 2:45–65CrossRefGoogle Scholar
  17. Deo RC, Şahin M (2015) Application of the artificial neural network model for prediction of monthly standardized precipitation and evapotranspiration index using hydrometeorological parameters and climate indices in eastern Australia. Atmos Res 161:65–81CrossRefGoogle Scholar
  18. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15Google Scholar
  19. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Computs 121:256–285CrossRefGoogle Scholar
  20. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139CrossRefGoogle Scholar
  21. Ganguli P, Reddy MJ (2014) Ensemble prediction of regional droughts using climate inputs and the SVM–copula approach. Hydrol Process 28:4989–5009CrossRefGoogle Scholar
  22. Gessner U, Naeimi V, Klein I, Kuenzer C, Klein D, Dech S (2013) The relationship between precipitation anomalies and satellite-derived vegetation activity in Central Asia. Glob Planet Change 110:74–87CrossRefGoogle Scholar
  23. Gill MK, Asefa T, Kemblowski MW, McKee M (2006) Soil moisture prediction using support vector machines. JAWRA J Am Water Res Assoc 42:1033–1046CrossRefGoogle Scholar
  24. Gocic M, Trajkovic S (2014) Drought characterisation based on water surplus variability index water. Resour Manag 28:3179–3191. Google Scholar
  25. Gocic M, Trajkovic S (2014) Water surplus variability index as an indicator of drought. J Hydrol Eng 20:04014038CrossRefGoogle Scholar
  26. Guttman NB (1998) Comparing the palmer drought index and the standardized precipitation index JAWRA. J Am Water Resour Assoc 34:113–121CrossRefGoogle Scholar
  27. Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12:55–67CrossRefGoogle Scholar
  28. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, BerlinsCrossRefGoogle Scholar
  29. Karimi S, Sadraddini AA, Nazemi AH, Xu T, Fard AF (2018) Generalizability of gene expression programming and random forest methodologies in estimating cropland and grassland leaf area index. Comput Electron Agric 144:232–240. CrossRefGoogle Scholar
  30. Karimi S, Shiri J, Kisi O, Xu T (2018) Forecasting daily streamflow values: assessing heuristic models. Hydrol Res 49:658–669. CrossRefGoogle Scholar
  31. McKee TB, Doesken NJ, Kleist J (1993) The relationship of drought frequency and duration to time scales. In: Proceedings of the 8th conference on applied climatology, vol 22. American Meteorological Society Boston, MA, pp 179–183Google Scholar
  32. Kong Q, Ge Q, Zheng J, Xi J (2015) Prolonged dry episodes over Northeast China during the period 1961–2012. Theor Appl Climatol 122:711–719CrossRefGoogle Scholar
  33. Lantz B (2013) Machine learning with R. Packt Publishing Ltd,Google Scholar
  34. Li Z, Zhou T (2015) Responses of vegetation growth to climate change in China. Int Arch Photogramm Remote Sens Spat Inf Sci 40:225CrossRefGoogle Scholar
  35. Maca P, Pech P (2016) Forecasting SPEI and SPI drought indices using the integrated artificial neural networks. Comput Intell Neurosci 2016:14CrossRefGoogle Scholar
  36. Niemeyer S (2008) New drought indices Options. Méditerranéennes Série A: Séminaires Méditerranéens 80:267–274Google Scholar
  37. Ortegren JT, Knapp PA, Maxwell JT, Tyminski WP, Soulé PT (2011) Ocean–atmosphere influences on low-frequency warm-season drought variability in the Gulf Coast and southeastern United States. J Appl Meteorol Climatol 50:1177–1186CrossRefGoogle Scholar
  38. Park S, Im J, Jang E, Rhee J (2016) Drought assessment and monitoring through blending of multi-sensor indices using machine learning approaches for different climate regions. Agric For Meteorol 216:157–169CrossRefGoogle Scholar
  39. Park S, Seo E, Kang D, Im J, Lee MI (2018) Prediction of drought on pentad scale using remote sensing data and MJO index through random forest over East Asia. Remote Sens 10:18. Google Scholar
  40. Pedregosa F et al. (2011) Scikit-learn: Machine learning in Python Journal of machine learning research 12:2825–2830.Google Scholar
  41. Peng J, Dong W, Yuan W, Zhang Y (2012) Responses of grassland and forest to temperature and precipitation changes in Northeast China. Adv Atmos Sci 29:1063–1077CrossRefGoogle Scholar
  42. Pereira JM, Basto M, da Silva AF (2016) The logistic lasso and ridge regression in predicting corporate failure. Procedia Econ Financ 39:634–641CrossRefGoogle Scholar
  43. Reiss MA et al (2015) Improvements on coronal hole detection in SDO/AIA images using supervised classification. J Space Weather Space Clim 5:A23CrossRefGoogle Scholar
  44. Rhee J, Im J (2017) Meteorological drought forecasting for ungauged areas based on machine learning: using long-range climate forecast and remote sensing data. Agric For Meteorol 237:105–122CrossRefGoogle Scholar
  45. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288Google Scholar
  46. Touma D, Ashfaq M, Nayak MA, Kao S-C, Diffenbaugh NS (2015) A multi-model and multi-index evaluation of drought characteristics in the 21st century. J Hydrol 526:196–207sCrossRefGoogle Scholar
  47. Trevor H, Robert T, Friedman JH (2009) The elements of statistical learning: data mining, infersence, and prediction. Springer, New YorkGoogle Scholar
  48. Tsakiris G, Vangelis H (2005) Establishing a drought index incorporating evapotranspiration. Eur Water 9:3–11Google Scholar
  49. Uniejewski B, Nowotarski J, Weron R (2016) Automated variable selection and shrinkage for day-ahead electricity price forecasting. Energies 9:621CrossRefGoogle Scholar
  50. Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive to global warming: the standardized precipitation evapotranspiration index. J Clim 23:1696–1718CrossRefGoogle Scholar
  51. Vicente-Serrano SM, Van der Schrier G, Begueria S, Azorin-Molina C, Lopez-Moreno JI (2015) Contribution of precipitation and reference evapotranspiration to drought indices under different climates. J Hydrol 526:42–54. CrossRefGoogle Scholar
  52. Wang WX, Zuo DD, Feng GL (2014) Analysis of the drought vulnerability characteristics in Northeast China based on the theory of information distribution and diffusion. Acta Phys Sin 63:11. Google Scholar
  53. Wang X, Shen H, Zhang W, Cao J, Qi Y, Chen G, Li X (2015) Spatial and temporal characteristics of droughts in the Northeast China. Transect Nat Hazards 76:601–614CrossRefGoogle Scholar
  54. Wayne CP (1965) Meteorological drought US weather bureau research paper 58Google Scholar
  55. Wells N, Goddard S, Hayes MJ (2004) A self-calibrating Palmer drought severity index. J Clim 17:2335–2351CrossRefGoogle Scholar
  56. Wilhite DA (2000) Drought as a natural hazard: concepts and definitionsGoogle Scholar
  57. Wu X et al. (2008) Top 10 algorithms in data mining Knowledge and information systems 14:1–37.Google Scholar
  58. Yin X et al (2016) Adapting maize production to drought in the Northeast Farming Region of China. Eur J Agron 77:47–58CrossRefGoogle Scholar
  59. Yu X, He X, Zheng H, Guo R, Ren Z, Zhang D, Lin J (2014) Spatial and temporal analysis of drought risk during the crop-growing season over northeast China. Nat Hazards 71:275–289CrossRefGoogle Scholar
  60. Zargar A, Sadiq R, Naser B, Khan FI (2011) A review of drought indices. Environ Rev 19:333–349CrossRefGoogle Scholar
  61. Zhang Y, Xin Y, Li Q, Ma J, Li S, Lv X, Lv W (2017) Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomed Eng Online 16:125CrossRefGoogle Scholar
  62. Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, LondonCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  • Zeng Li
    • 1
  • Taotao Chen
    • 1
  • Qi Wu
    • 1
  • Guimin Xia
    • 1
  • Daocai Chi
    • 1
    Email author
  1. 1.College of Water ResourcesShenyang Agricultural UniversityShenyangChina

Personalised recommendations