Biochemical oxygen demand prediction: development of hybrid wavelet-random forest and M5 model tree approach using feature selection algorithms

Abstract

Supplying adequate water to individuals and maintaining water supplies to support human life, particularly to rapidly urbanizing communities, are of paramount importance in the development of urban areas in each country worldwide. In turn, maintaining water resource quality and avoiding permanent damage as a consequence of environmental pollution and unsustainable off-take from sources such as rivers and aquifers should be considered as important as the water supply quantity. In this study, random forest (RF) and M5 model tree (M5) models were used to predict water biochemical oxygen demand (BOD). Having decomposed the input variables by wavelet transform, based on the feature selection algorithms (FS) (relief (RA), correlation (CA), principal component analysis (PCA), and ant colony optimization (ACO) algorithms), the important components were recognized and inserted into the RF and M5 models. The proposed approach was applied to Karun River in Ahvaz station on a monthly basis from 2006 to 2018. The results showed that the RF model had better performance with R = 0.872, MAE = 0.0312, and RMSE = 0.0332 values for the variable of BOD compared with the M5 model with R = 0.751, MAE = 0.0377, and RMSE = 0.0468 values. In addition, comparing RF and hybrid models, the purposed hybrid models were considered as viable options to improve the prediction accuracy of BOD. The findings also showed that, among the hybrid models, the WRF-PCA model with R = 0.927, MAE = 0.0198, and RMSE = 0.0241 values was the best model for the prediction of BOD values.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Abba SI, Hadi SJ, Abdullahi J (2017) River water modelling prediction using multi-linear regression, artificial neural network, and adaptive neuro-fuzzy inference system techniques. Proc Comput Sci 120:75–82

    Article  Google Scholar 

  2. Adeniran KA, Adelodun B, Ogunshina M (2016) Artificial neural network modelling of biochemical oxygen demand and dissolved oxygen of rivers: case study of Asa River. Environ Eng Manag J 72(3):59–74

    Google Scholar 

  3. Ahmed AAM, Shah MA (2015) Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River. J King Saud Univ Eng Sci 29:237–243

    Google Scholar 

  4. Akilandeswari S, Kavitha B (2013) Determination of biochemical oxygen demand by adaptive neuro fuzzy inference system. Adv Appl Sci Res 4(1):101–104

    CAS  Google Scholar 

  5. Alizadeh MJ, Kavianpour MR (2015) Development of wavelet-ANN models to predict water quality parameters in Hilo Bay, Pacific Ocean. Mar Pollut Bull 98(1–2):171–178

    CAS  Article  Google Scholar 

  6. Areerachakul S (2012) Comparison of ANFIS and ANN for estimation of biochemical oxygen demand parameter in surface water. Int J Chem Biol Eng 6:286–290

    CAS  Google Scholar 

  7. Bhardwaj V, Singh DS, Singh AK (2010) Water quality of the Chhoti Gandak River using principal component analysis, Ganga Plain, India. J Earth Syst Sci 119:117–127

    CAS  Article  Google Scholar 

  8. Bi J, Bennett K (2003) Regression error characteristic curves, in Proceedings of the twentieth international conference on machine learning. pp. 43–50

  9. Breiman L (2017) Classification and regression trees; Routledge: Routledge, UK.4

  10. Chen JC, Chang NB, Shieh WK (2003) Assessing wastewater reclamation potential by neural network model. Eng Appl Artif Intell 16:149–157

  11. Dara F, Devolli A, Kodra A (2018) An artificial neural networks modell for predicting BOD of Ishem River. International Agricultural, Biological & Life Science Conference, Edirne, Turkey. 225-232

  12. Dillon WR, Goldstein R (1984) Multivariate analysis methods and application. John Wiley and Sons. 453 pp

  13. Dogan E, Lent Sengorur B, Koklu R (2009) Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique. J Environ Manag 90:1219–1235

    Article  Google Scholar 

  14. Dorigo M, Stuzle T (2004) Ant colony optimization. MTI Press

  15. Dufour JM (2011) Coefficients of determination. McGil University

  16. Etemad-Shahidi A, Taghipour M (2012) Predicting longitudinal dispersion coefficient in natural streams using M5’ model tree. J Hydraul Eng 138(6):542–554

    Article  Google Scholar 

  17. Farhadian M, Haddad O, Seifollahi-Aghmiuni S, Loáiciga H (2014) Assimilative capacity and flow dilution for water quality protection in rivers. J Hazard Toxic Radioact Waste 19(2):04014027

    Article  Google Scholar 

  18. Gawali NU, Hasabe R, Vaidya A (2015) A comparison of different mother wavelet for fault detection & classification of series compensated transmission line. Int J Innov Res Sci Technol 1(9):57–63

    Google Scholar 

  19. Guo L, Chehata N, Mallet C, Boukir S (2011) Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests. ISPRS J Photogramm Remote Sens 66(1):56–66

    Article  Google Scholar 

  20. Hall MA (1999) Correlation-based feature selection for machine learning, phd thesis, University of Waikato.

  21. Hu YC (2010) Analytic network process for pattern classification problems using genetic algorithms. Inf Sci 180(13):2528–2539

    Article  Google Scholar 

  22. Kasem R, ALabdeh D, Noori R, Karbassi A (2018) A software sensor for in-situ monitoring of the 5-day biochemical oxygen demand. Mining-Geology-Petroleum Eng Bull 33(1):15–22

    CAS  Google Scholar 

  23. Khaled B, Abdellah A, Noureddine D, Salim H, Sabeha A (2018) Modelling of biochemical oxygen demand from limited water quality variable by ANFIS using two partition methods. Water Qual Res J 53:24–40

    CAS  Article  Google Scholar 

  24. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. AAAI-92 Proceedings of the tenth national conference on Artificial intelligence.129-134

  25. Kotti IP, Sylaios GK, Tsihrintzis VA (2013) Fuzzy logic models for BOD removal prediction in free-water surface constructed wetlands. Ecol Eng 51:66–74

    Article  Google Scholar 

  26. Kuo J, Hsieh M, Lung W, She N (2007) Using artificial neural network for reservoir eutriphication prediction. Ecol Model 200:171–177

    Article  Google Scholar 

  27. Kurunç A, Yürekli K, Çevik O (2005) Performance of two stochastic ap proaches for forecasting water quality and streamflow data from Yeşilιrmak River, Turkey. Env Model Software 20(9):1195–1200

  28. Mallat SG (1998) A wavelet tour of signal processing, San Diego

  29. Meglen RR (1992) Examining large databases: a chemometric approach using principal component analysis. Mar Chem 39:217–237

    CAS  Article  Google Scholar 

  30. Mellinger M (1987) Multivariate data analysis: its methods. Chemom Intell Lab Syst 2:29–36

    Article  Google Scholar 

  31. Merry RJE (2005) Wavelet theory and applications. A literature study. Eindhoven University of Technology Department of Mechanical Engineering Control Systems Technology Group

  32. Misiti M, Misiti Y, Oppenheim G, Poggi JM (1996) Wavelet Toolbox

  33. Noori R, Sabahi MS, Karbassi AR, Baghvand A, Tatti Zadeh H (2010) Multivariate statistical analysis of surface water quality based on correlations and variations in the data set. Desalination. 260:129–136

    CAS  Article  Google Scholar 

  34. Nourani V, Komasi M, Mano A (2009) A multivariate ANN-wavelet approach for rainfall–runoff modeling. Water Resour Manag 23:2877–2894

    Article  Google Scholar 

  35. Olyaie E, Banejad H, Samadi MT, Rahmani AR, Saghi MH (2010) Performance evaluation of artificial neural networks for predicting rivers water quality indices (BOD and DO) in Hamadan Morad Beik River. Water Soil Sci 20.1(3):200–210

    Google Scholar 

  36. Parinet B, Lhote A, Legube B (2004) Principal component analysis: an appropriate tool for water quality evaluation and management-application to a tropical lake system. Ecol Model 178:295–311

    CAS  Article  Google Scholar 

  37. Parmar KS, Bhardwaj R (2013) Analysis of water parameters using daubechies wavelet (level 5) (Db5). Am J Math Stat 2(3):57–63

    Article  Google Scholar 

  38. Pejman AH, Nabi Bidhendi GR, Karbassi AR, Mehrdadi N, Esmaeili B (2009) Evaluation of spatial and seasnal variation in surface water quality using multivariate statistical techniques. Int J Environ Sci Technol 6(3):467–476

  39. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  40. Quinlan JR (1992) Learning with continuous classes. Proc., 5th Australian Joint Conf. on Artificial Intelligence, World Scientific, Singapore, 343–348

  41. Radmanesh F, Golabi MR, Khodabakhshi F, Farzi S, Zeinali M (2020) Modeling aquifer hydrograph: performance review of conceptual MODFLOW and simulator models. Arab J Geosci 13:240

    Article  Google Scholar 

  42. Resnikov HL, Wells RO (1998) Wavelet analysis: the scalable structure of information. Springer

  43. Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). Sci Total Environ 476:189–206

    Article  Google Scholar 

  44. Safavi HR (2010) Prediction of river water quality by adaptive neuro fuzzy inference system (ANFIS). J Environ Stud 36(53):1–10

    Google Scholar 

  45. Salami ES, Ehteshami M (2015) Simulation, evaluation and prediction modeling of river water quality properties (case study: Ireland Rivers). Int J Environ Sci Technol 12(10):3235–3242

    CAS  Article  Google Scholar 

  46. Sarkara A, Pandeyb P (2015) River water quality modelling using artificial neural network technique. Aquat Proc 4:1070–1077

    Article  Google Scholar 

  47. Singh KP, Basant A, Malik A, Jain G (2009) Artificial neural network modeling of the river water quality—a case study. Ecol Model 220(6):888–895

  48. Solgi A, Pourhaghi A, Bahmani R, Zarei H (2017) Improving SVR and ANFIS performance using wavelet transform and PCA algorithm for modeling and predicting biochemical oxygen demand (BOD). Ecohydrol Hydrobiol 17(2):164–175

    Article  Google Scholar 

  49. Teshite TB (2018) Validation of fao-frame remote sensing based agricultural water productivity estimates in the upper Awash River basin, Ethiopia. MSc, University of Twente

  50. Wang Y, Witten IH (1997) Induction of model trees for predicting continuous classes. European Conference on Machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague, Czech Republic

  51. Wen X, Fang J, Diao M, Zhang C (2013) Artificial neural network modeling of dissolved oxygen in the Heihe River, Northwestern China. Environ Monit Assess 185(5):4361–4371

    CAS  Article  Google Scholar 

  52. Wu ML, Wang YS (2007) Using chemometrics to evaluate anthropogenic effects in Daya Bay, China. Estuar Coast Shelf Sci 72:732–742

    CAS  Article  Google Scholar 

  53. Zhang BL, Dong ZY (2001) An adaptive neural wavelet model for short term load forecasting. Electr Power Syst Res 59:121–129

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Feridon Radmanesh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible Editor: Marcus Schulz

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Golabi, M.R., Farzi, S., Khodabakhshi, F. et al. Biochemical oxygen demand prediction: development of hybrid wavelet-random forest and M5 model tree approach using feature selection algorithms. Environ Sci Pollut Res (2020). https://doi.org/10.1007/s11356-020-09457-x

Download citation

Keywords

  • BOD modeling
  • RF algorithm
  • M5 model tree
  • Wavelet transform
  • Feature selection
  • Karun River