Evaluation of preprocessing techniques for improving the accuracy of stochastic rainfall forecast models

  • I. Ebtehaj
  • H. BonakdariEmail author
  • M. Zeynoddin
  • B. Gharabaghi
  • A. Azari
Original Paper


Accurate rainfall forecasting is one of the most important and challenging hydrological modeling tasks with significant benefits for many sectors of the economy. This study presents novel insight into how to improve the accuracy of a new generation of stochastic monthly rainfall forecast models by examining four different preprocessing techniques: (1) time series modeling without preprocessing which is the common method in stochastic modeling as the base case, (2) preprocess using differencing, spectral analysis seasonal and non-seasonal standardization techniques, (3) two-step preprocessing including stationarization and normalization of data using 8 different transformations, and (4) two-step preprocessing, unlike scenario 3, so that the main time series was normalized and transformed to be stationary. Using the autocorrelation function and partial autocorrelation function diagrams, the parameters of the stochastic model are determined. The results indicate that the proposed data preprocessing normalization and transformation techniques can lead to major improvements in the prediction accuracy of the new monthly rainfall forecast model.


Linear modeling Normality transforms Seasonal auto-regressive integrated moving average Spectral analysis Standardization 



Authors would like the acknowledge their gratitude and appreciation for the Department of Irrigation and Drainage (DID), Malaysia, for providing the rainfall dataset of the studied case study and their admirable cooperation

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interests regarding publishing this paper.

Supplementary material

13762_2019_2361_MOESM1_ESM.doc (1.2 mb)
Supplementary material 1 (DOC 1205 kb)


  1. Abadan S, Shabri A (2014) Hybrid empirical mode decomposition-ARIMA for forecasting price of rice. Appl Math Sci 8(63):3133–3143. CrossRefGoogle Scholar
  2. Akpanta AC, Okorie IE, Okoye NN (2015) SARIMA modelling of the frequency of monthly rainfall in Umuahia, Abia state of Nigeria. Am J Math Stat 5(2):82–87. CrossRefGoogle Scholar
  3. Alias NMA (2011) Rainfall forecasting using an artificial neural network model to prevent flash floods. In: High Capacity Optical Networks and Enabling Technologies (HONET), 2011, IEEE, pp 323–328.
  4. Anderson TW, Darling DA (1952) Asymptotic theory of certain” goodness of fit” criteria based on stochastic processes. Ann Math Stat. CrossRefGoogle Scholar
  5. Asadi S, Tavakoli A, Hejazi SR (2012) A new hybrid for improvement of auto-regressive integrated moving average models applying particle swarm optimization. Expert Syst Appl 39(5):5332–5337. CrossRefGoogle Scholar
  6. Asnaashari A, Gharabaghi B, McBean ED, Mahboubi AA (2015) Reservoir management under predictable climate variability and change. J Water Clim Change 6(3):472–485. CrossRefGoogle Scholar
  7. Bonakdari H, Moeeni H, Ebtehaj I, Zeynodin M, Mohammadian M, Gharabaghi B (2018) New insights into soil temperature time series modeling: linear or nonlinear? Theore Appl Clim. CrossRefGoogle Scholar
  8. Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc S B 26:211–252Google Scholar
  9. Camara A, Feixing W, Xiuqin L (2016) Energy consumption forecasting using seasonal ARIMA with artificial neural networks models. Int J Bus Manag 11(5):231. CrossRefGoogle Scholar
  10. Conover WJ (1999) Practical nonparametric statistics, 3rd edn. Wiley, New York, pp 250–257Google Scholar
  11. Cryer J, Chan K (2008) Time series analysis. Springer, New YorkCrossRefGoogle Scholar
  12. Dagum EB, Lothian JR, Morry M (1975) A test of independence of the residuals based on the cumulative periodogram. Seasonal Adjustment Methods Unit, OttawaGoogle Scholar
  13. Ebtehaj I, Bonakdari H, Sharifi A (2014) Design criteria for sediment transport in sewers based on self-cleansing concept. J Zhejiang Univ Sci-A 15(11):914–924. CrossRefGoogle Scholar
  14. Ebtehaj I, Bonakdari H, Gharabaghi B (2019) A reliable linear method for modeling lake level fluctuations. J Hydrol 570:236–250. CrossRefGoogle Scholar
  15. Freeman BS, Taylor G, Gharabaghi B, Thé J (2018) Forecasting air quality time series using deep learning. J Air Waste Manag. CrossRefGoogle Scholar
  16. Guo Y, Zhao R, Zeng Y, Shi Z, Zhou Q (2018) Identifying scale-specific controls of soil organic matter distribution in mountain areas using anisotropy analysis and discrete wavelet transform. CATENA 160:1–9. CrossRefGoogle Scholar
  17. Hernández N, Camargo J, Moreno F, Plazas-Nossa L, Torres A (2017) Arima as a forecasting tool for water quality time series measured with UV-Vis spectrometers in a constructed wetland. Tecnología y Ciencias del Agua 8(5):127–139. CrossRefGoogle Scholar
  18. Hirsch RM, Slack JR (1984) A nonparametric trend test for seasonal data with serial dependence. Water Resour Res 20(6):727–732. CrossRefGoogle Scholar
  19. Huajun W, Lei S, Hongying L (2010) Adjustments based on wavelet transform ARIMA model for network traffic prediction. In: 2010 2nd international conference on computer engineering and technology (ICCET), vol 4, pp V4–520. IEEE.
  20. Hurst HE, Black RP, Simaika YM (1969) Long-term storage. An experimental study. Constable, LondonGoogle Scholar
  21. Isa IS, Omar S, Saad Z, Noor NM, Osman MK (2010) Weather forecasting using photovoltaic system and neural network. In 2010 2nd international conference on computational intelligence, communication systems and networks (CICSyN), IEEE, pp 96–100.
  22. Jalalkamali A, Moradi M, Moradi N (2015) Application of several artificial intelligence models and ARIMAX model for forecasting drought using the standardized precipitation index. Int J Environ Sci Technol 12(4):1201–1210. CrossRefGoogle Scholar
  23. Jarque CM, Bera AK (1980) Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Econ Lett 6(3):255–259. CrossRefGoogle Scholar
  24. John J, Draper N (1980) An alternative family of transformations. J R Stat Soc S C 29:190–197. CrossRefGoogle Scholar
  25. Johnson N (1949) Systems of frequency curves generated by methods of translation. Biometrika 36:149–176. CrossRefGoogle Scholar
  26. Kashyap RL, Rao AR (1976) Dynamic stochastic models from empirical data. Mathematics in science and engineering. Harcourt Brace Jovanovich (Academic Press): New York, p 334Google Scholar
  27. Khandelwal I, Adhikari R, Verma G (2015) Time series forecasting using hybrid ARIMA and ANN models based on DWT decomposition. Procedia Comput Sci 48:173–179. CrossRefGoogle Scholar
  28. Kullback S (1959) Information theory and statistics. Wiley, New YorkGoogle Scholar
  29. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. CrossRefGoogle Scholar
  30. Kwiatkowski D, Phillips PC, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J Econo 54(1–3):159–178. CrossRefGoogle Scholar
  31. Lee R, Liu J (2004) iJADE WeatherMAN: a weather forecasting system using intelligent multiagent-based fuzzy neuro network. IEEE T Syst Man Cyb 34(3):369–377. CrossRefGoogle Scholar
  32. Lihua N, Xiaorong C, Qian H (2010) ARIMA model for traffic flow prediction based on wavelet analysis. In: 2nd international conference on information science and engineering (ICISE), pp 1028–1031.
  33. Lilliefors H (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stati Assoc 62:399–402. CrossRefGoogle Scholar
  34. Ljung GM, Box GE (1978) On a measure of lack of fit in time series models. Biometrika 65(2):297–303. CrossRefGoogle Scholar
  35. Manly BF (1976) Exponential data transformations. Statistician 25:37–42. CrossRefGoogle Scholar
  36. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60CrossRefGoogle Scholar
  37. Marco JB, Harboe R, Salas JD (2012) Stochastic hydrology and its use in water resources systems simulation and optimization, vol 237. Springer, BerlinGoogle Scholar
  38. McLeod AI, Hipel KW, Lennox WC (1977) Advances in Box-Jenkins modeling: 2. Applications. Water Resour Res 13(3):577–586. CrossRefGoogle Scholar
  39. Meher J, Jha R (2013) Time-series analysis of monthly rainfall data for the Mahanadi River Basin, India. Sci Cold Arid Reg (SCAR) 5(1):73–84CrossRefGoogle Scholar
  40. Mills TC (2014) Time series modelling of temperatures: an example from Kefalonia. Meteor Appl 21(3):578–584. CrossRefGoogle Scholar
  41. Mishra PK, Karmakar S (2018) Performance of optimum neural network in rainfall–runoff modeling over a river basin. Int J Environ Sci Technol. CrossRefGoogle Scholar
  42. Moeeni H, Bonakdari H (2017) Forecasting monthly inflow with extreme seasonal variation using the hybrid SARIMA-ANN model. Stoch Envl Res Risk A 31(8):1997–2010. CrossRefGoogle Scholar
  43. Moeeni H, Bonakdari H, Ebtehaj I (2017a) Monthly reservoir inflow forecasting using a new hybrid SARIMA genetic programming approach. J Earth Syst Sci 126(2):18. CrossRefGoogle Scholar
  44. Moeeni H, Bonakdari H, Fatemi SE (2017b) Stochastic model stationarization by eliminating the periodic term and its effect on time series prediction. J Hydrol 547:348–364. CrossRefGoogle Scholar
  45. Moeeni H, Bonakdari H, Fatemi SE, Zaji AH (2017c) Assessment of stochastic models and a hybrid artificial neural network-genetic algorithm method in forecasting monthly reservoir inflow. INAE Lett 2(1):13–23. CrossRefGoogle Scholar
  46. Nazaripour H, Daneshvar MM (2014) Spatial contribution of one-day precipitations variability to rainy days and rainfall amounts in Iran. Int J Environ Sci Technol 11(6):1751–1758. CrossRefGoogle Scholar
  47. Pektaş AO, Cigizoglu HK (2013) ANN hybrid model versus ARIMA and ARIMAX models of runoff coefficient. J Hydrol 500:21–36. CrossRefGoogle Scholar
  48. Ranjbar M, Khaledian M (2014) Using Arima time series model in forecasting the trend of changes in qualitative parameters of Sefidrud River. Int Res J Appl Basic Sci 8(3):346–351Google Scholar
  49. Rudra RP, Dickinson WT, Ahmed SI, Patel P, Zhou J, Gharabaghi B, Khan AA (2015) Changes in rainfall extremes in Ontario. Int J Environ Res 9(4):1117–1126Google Scholar
  50. Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3):599–607. CrossRefGoogle Scholar
  51. Salas JD, Delleur JR, Yevjevich V, Lane WL (1980) Applied modeling of hydrologic time series. Water Resources Publications, LittletonGoogle Scholar
  52. Shaghaghi S, Bonakdari H, Gholami A, Ebtehaj I, Zeinolabedini M (2017) Comparative analysis of GMDH neural network based on genetic algorithm and particle swarm optimization in stable channel design. Appl Math Comput 313:271–286. CrossRefGoogle Scholar
  53. Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3–4):591–611. CrossRefGoogle Scholar
  54. Srivastava PK, Islam T, Singh SK, Petropoulos GP, Gupta M, Dai Q (2016) Forecasting Arabian Sea level rise using exponential smoothing state space models and ARIMA from TOPEX and Jason satellite radar altimeter data. Meteor Appl 23(4):633–639. CrossRefGoogle Scholar
  55. Stedinger JR, Lettenmaier DP, Vogel RM (1985) Multisite ARMA (1, 1) and disaggregation models for annual streamflow generation. Water Resour Res 21(4):497–509. CrossRefGoogle Scholar
  56. Su Z, Wang J, Lu H, Zhao G (2014) A new hybrid model optimized by an intelligent optimization algorithm for wind speed forecasting. Energ Convers Manag 85:443–452. CrossRefGoogle Scholar
  57. Tsay RS (2010) Analysis of financial time series, 3rd edn. Wiley, HobokenCrossRefGoogle Scholar
  58. Valipour M (2015) Long-term runoff study using SARIMA and ARIMA models in the United States. Meteor Appl 22(3):592–598. CrossRefGoogle Scholar
  59. Valipour M, Banihabib ME, Behbahani SMR (2012) Parameters estimate of autoregressive moving average and autoregressive integrated moving average models and compare their ability for inflow forecasting. J Math Stat 8(3):330–338CrossRefGoogle Scholar
  60. Valipour M, Banihabib ME, Behbahani SMR (2013) Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir. J Hydrol 476:433–441. CrossRefGoogle Scholar
  61. Vasiljevic B, McBean E, Gharabaghi B (2012) Trends in rainfall intensity for stormwater designs in Ontario. J Water Clim Change 3(1):1–10. CrossRefGoogle Scholar
  62. Yaseen ZM, Ghareb MI, Ebtehaj I, Bonakdari H, Siddique R, Heddam S, Yusif A, Deo R (2018) Rainfall pattern forecasting using novel hybrid intelligent model based ANFIS-FFA. Water Resour Manag 32(1):105–122. CrossRefGoogle Scholar
  63. Yeo IK, Johnson RA (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87(4):954–959. CrossRefGoogle Scholar
  64. Zaji AH, Bonakdari H, Gharabaghi B (2018) Reservoir water level forecasting using group method of data handling. Acta Geophys 66(4):717–730. CrossRefGoogle Scholar
  65. Zaji AH, Bonakdari H, Gharabaghi B (2019) Remote sensing satellite data preparation for simulating and forecasting river discharge. IEEE T Geosci Remote 56(6):3432–3441. CrossRefGoogle Scholar
  66. Zeynoddin M, Bonakdari H, Azari A, Ebtehaj I, Gharabaghi B, Madavar HR (2018) Novel hybrid linear stochastic with non-linear extreme learning machine methods for forecasting monthly rainfall a tropical climate. J Environ Manag 222:190–206. CrossRefGoogle Scholar

Copyright information

© Islamic Azad University (IAU) 2019

Authors and Affiliations

  1. 1.Department of Civil EngineeringRazi UniversityKermanshahIran
  2. 2.School of EngineeringUniversity of GuelphGuelphCanada
  3. 3.Department of Water EngineeringRazi UniversityKermanshahIran

Personalised recommendations