Advertisement

Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale

  • Issoufou Ouedraogo
  • Pierre Defourny
  • Marnik Vanclooster
Paper
  • 24 Downloads

Abstract

Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional techniques, key advantages of RFR include its nonparametric nature, its high predictive accuracy, and its capability to determine variable importance. The latter can be used to better understand the individual role and the combined effect of explanatory variables in a predictive model. In the absence of a systematic groundwater monitoring program at the African continent scale, the study used the groundwater nitrate contamination database for the continent obtained from a meta-analysis to test the modeling approach; 250 groundwater nitrate pollution studies from the African continent were compiled using the literature data. A geographic information system database of 13 spatial attributes was collected, related to land use, soil type, hydrogeology, topography, climatology, type of region, and nitrogen fertilizer application rate, and these were assigned as predictors. The RFR performance was evaluated in comparison to the multiple linear regression (MLR) methods. By using RFR, it was possible to establish which explanatory variables influence the occurrence of nitrate pollution in groundwater (population density, rainfall, recharge, etc.). Both the RFR and MLR techniques identified population density as the most important variable explaining reported nitrate contamination. However, RFR has a much higher predictive power (R2 = 0.97) than a traditional linear regression model (R2 = 0.64). RFR is therefore considered a very promising technique for large-scale modeling of groundwater nitrate pollution.

Keywords

Groundwater modeling Nitrate Random forest Geographic information system Sub-Saharan Africa 

Application de la méthode de régression dite des forêts aléatoires et comparaison de ses performances avec la régression linéaire multiple pour la modélisation de la concentration en nitrates des eaux souterraines à l’échelle du continent africain

Résumé

Les décisions relatives à la gestion des eaux souterraines nécessitent des méthodes robustes qui permettent une modélisation prédictive exacte de l’occurrence d’un polluant. Dans la présente étude, la méthode de régression dite de forêts aléatoires (RFA) a été utilisée pour modéliser la contamination des eaux souterraines par les nitrates à l’échelle du continent africain. Quand on la compare à la plupart des techniques classiques, les avantages principaux de la RFA comportent: sa nature non paramétrique, sa haute précision prédictive, et sa capacité à déterminer l’importance d’une variable. Cette dernière peut être utilisée pour mieux comprendre le rôle individuel et l’effet combiné des variables explicatives dans un modèle prédictif. En l’absence d’un programme de gestion des eaux souterraines systématique à l’échelle du continent africain, l’étude a utilisé une base de données sur la contamination des eaux souterraines par les nitrates issue d’une méta-analyse, dans le but de tester une approche par modélisation; 250 études de pollution des eaux souterraines par les nitrates concernant le continent africain ont été compilées à partir de données bibliographiques. La base de données d’un système d’information géographique de 13 attributs spatiaux a été construite, relativement à l’utilisation des sols, au type de sol, à l’hydrogéologie, topographie, climatologie, au type de région et au taux d’épandage d’un engrais azotée et ceux-ci ont été désignés comme prédicteurs. La performance de la RFA a été évaluée par comparaison avec les méthodes de régression linéaire multiple (RLM). En utilisant la RFA, il a été possible d’identifier les variables explicatives influençant l’occurrence de la pollution nitratée dans les eaux souterraines (densité de la population, précipitations, recharge, etc.). Les techniques de RFA et de RLM ont identifié l’une et l’autre la densité de population comme la variable la plus importante pour expliquer la contamination par les nitrates. Cependant, la RFA a un pouvoir prédictif plus important (R2 = 0.97) qu’un modèle de régression linéaire traditionnel (R2 = 0.64). La RFA est. ainsi considérée comme une technique très prometteuse de modélisation à grande échelle de la pollution des eaux souterraines par les nitrates.

Aplicación de la regresión de bosques aleatorios y comparación de su desempeño con la regresión lineal múltiple en el modelado de la concentración de nitrato de agua subterránea a escala del continente africano

Resumen

Las decisiones de gestión del agua subterránea necesitan métodos robustos que permitan un modelado predictivo preciso de ocurrencias de contaminantes. En este estudio, se utilizó la regresión de bosques aleatorios (RFR) para modelar la contaminación por nitrato del agua subterránea a escala del continente africano. Cuando se comparan con técnicas más convencionales, las ventajas claves de la RFR incluyen su naturaleza no paramétrica, su alta precisión predictiva y su capacidad para determinar la importancia de las variables. Esta última se puede utilizar para comprender mejor el rol individual y el efecto combinado de las variables explicativas en un modelo predictivo. En ausencia de un programa sistemático de monitoreo de agua subterránea a escala del continente africano, el estudio utilizó una base de datos de contaminación de nitrato de agua subterránea obtenida de un metanálisis para probar el enfoque del modelado; Se compilaron 250 estudios de contaminación de nitrato de agua subterránea del continente africano utilizando los datos de la literatura. Se recopiló una base de datos del sistema de información geográfica de 13 atributos espaciales, relacionada con el uso del suelo, el tipo de suelo, la hidrogeología, la topografía, la climatología, el tipo de región y la tasa de aplicación de fertilizantes nitrogenados, y estos se asignaron como predictores. El rendimiento de RFR se evaluó en comparación con los métodos de regresión lineal múltiple (MLR). Mediante el uso de RFR, fue posible establecer qué variables explicativas influyen en la incidencia de la contaminación por nitratos en el agua subterránea (densidad de población, precipitación, recarga, etc.). Las técnicas RFR y MLR identificaron la densidad de población como la variable más importante que explica la contaminación por nitrato reportada. Sin embargo, la RFR tiene un poder predictivo mucho más alto (R2 = 0.97) que un modelo de regresión lineal tradicional (R2 = 0.64). Por lo tanto, la RFR se considera una técnica muy prometedora para el modelado a gran escala de la contaminación del agua subterránea por nitrato.

在模拟非洲大陆尺度上地下水硝酸盐含量中随机预测回归分析的应用及其针对多重线性回归性能的比较

摘要

地下水管理决策需要能够准确预测模拟发生污染的强劲方法。本研究中,采用随机预测回归分析模拟非洲大陆尺度上的地下水硝酸盐含量。与更常规的技术相比,随机预测回归分析的主要优点包括其非参数特性、很高的预测精度以及确定变量重要性的能力。后者可用于更好地了解预测模型中解释性变量的各自作用及综合影响。在非洲大陆尺度上缺乏系统地下水监测项目的情况下,研究利用从荟萃分析中得到的地下水硝酸盐含量数据库测试模拟方法。利用文献数据编辑了250项非洲大陆地下水硝酸盐污染方面的研究。收集了13个与土地利用、土壤类型、水文地质学、地形学、气候学、地区类型及氮肥应用量等相关的空间属性的地理信息数据库,这些属性作为预测因子。在比较多重线性回归分析法中评估了随机预测回归分析的性能。利用随机预测回归分析,就有可能确定哪种解释性变量影响地下水中的硝酸盐污染(人口密度、降雨补给等)。随机预测回归分析和多重线性回归分析都确定了人口密度是造成所报道的硝酸盐污染最重要的变量。然而,随机预测回归分析的预测能力(R2 = 0.97)比传统线性回归模型的预测能力(R2 = 0.64)要高很多。因此,随机预测回归分析被认为是大尺度模拟地下水硝酸盐污染非常有前途的一项技术。

Aplicação de regressão de floresta aleatória e comparação de seu desempenho com a regressão linear múltipla na modelagem da concentração de nitrato de águas subterrâneas na escala do continente Africano

Resumo

As decisões de gestão das águas subterrâneas precisam de métodos robustos que permitam a modelagem preditiva precisa das ocorrências de poluentes. Neste estudo, a regressão de floresta aleatória (RFA) foi usada para modelar a contaminação por nitrato em águas subterrâneas na escala do continente africano. Quando comparadas à técnicas mais convencionais, as principais vantagens do RFA incluem sua natureza não paramétrica, sua alta precisão preditiva e sua capacidade de determinar a importância da variável. Este último pode ser usado para entender melhor o papel individual e o efeito combinado de variáveis ​​explicativas em um modelo preditivo. Na ausência de um programa sistemático de monitoramento de águas subterrâneas na escala do continente Africano, o estudo utilizou um banco de dados de contaminação por nitrato em águas subterrâneas obtido a partir de uma meta-análise para testar a abordagem de modelagem; 250 estudos de poluição por nitrato em águas subterrâneas do continente Africano foram compilados usando os dados da literatura. Foi coletado um banco de dados em sistema de informações geográficas com 13 atributos espaciais, relacionados ao uso da terra, tipo de solo, hidrogeologia, topografia, climatologia, tipo de região e taxa de aplicação de fertilizantes nitrogenados, sendo estes atribuídos como preditores. O desempenho do RFA foi avaliado em comparação com os métodos de regressão linear múltipla (RLM). Através da RFA, foi possível estabelecer quais variáveis ​​explicativas influenciam a ocorrência de poluição por nitrato nas águas subterrâneas (densidade populacional, precipitação, recarga, etc.). Ambas as técnicas RFA e RLM identificaram a densidade populacional como a variável mais importante que explica a contaminação relatada por nitrato. No entanto, a RFA tem um poder preditivo muito mais alto (R2 = 0.97) do que um modelo de regressão linear tradicional (R2 = 0.64). A RFA é, portanto, considerada uma técnica muito promissora para a modelagem em grande escala da poluição das águas subterrâneas por nitrato.

Notes

Acknowledgments

This work was funded by the IDB (Islamic Development Bank) under its Ph.D. Merit Scholarship Program (MSP). GIS shape files for generating generic attributes were obtained from different sources throughout the world and also online. In this regard, special thanks go to T. Gleeson, P. Döll, N. Moosdoorf, and P. Trambauer. I would like to thank all colleagues, particularly Mr. V. Antharam, for their valuable discussions on the random forest method. We also thank Dr. Lixiang Lin and two reviewers for their constructive comments on the initial version of the paper.

References

  1. Abrahart RJ et al (2008) Practical hydroinformatics. computational intelligence and technological developments in water applications. Open Model Integration in Flood Forecasting 68Google Scholar
  2. Aljazzar TH (2010) Adjustment of DRASTIC vulnerability index to assess groundwater vulnerability for nitrate pollution using the advection-diffusion cell. Von der Fakultät für Georessourcen und Materialtechnik der Rheinisch-Westfälischen Technischen Hochschule Aachen Ph.D. thesis, 146 ppGoogle Scholar
  3. Alley WM, Healy RW, LaBaugh JW, Reilly TE (2002) Flow and storage in groundwater systems. Science 296(5575):1985–1990Google Scholar
  4. Andrade AIASS, Stigter TY (2009) Multi-method assessment of nitrate and pesticide contamination in shallow alluvial groundwater as a function of hydrogeological setting and land use. Agric Water Manag 96(12):1751–1765Google Scholar
  5. Anning DW, Paul AP, McKinney TS, Huntington JM, Bexfield LM, Thiros SA (2012) Predicted nitrate and arsenic concentrations in basin-fill aquifers of the southwestern United States. US Geological Survey Scientific Investigations Report 2012–5065Google Scholar
  6. Anuraga TS, Ruiz L, Kumar MSM, Sekhar M, Leijnse A (2006) Estimating groundwater recharge using land use and soil data: a case study in South India. Agric Water Manag 84(1–2):65–76Google Scholar
  7. Barzegar et al (2018) Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. Sci Total Environ 621(2018):697–712.  https://doi.org/10.1016/j.scitotenv.2017.11.185 Google Scholar
  8. Bauder J, Sinclair KN, Lund RE (1993) Physiographic and land use characteristics associated with nitrate-nitrogen in Montana groundwater. J Environ Qual 22(2):255–262.  https://doi.org/10.2134/jeq1993.00472425002200020004x Google Scholar
  9. BGS (2011) Depth to groundwater map. https://www.bgs.ac.uk/downloads/browse.cfm?sec=9&cat=38. Accessed 19 April 2014
  10. Bonsor HC, MacDonald AM (2011) An initial estimate of depth to groundwater across Africa. British Geological Survey Open Report OR/11/067: 26ppGoogle Scholar
  11. Boy-Roura M, Nolan BT, Menció A, Mas-Pla J (2013) Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J Hydrol 505:150–162Google Scholar
  12. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  13. Breiman L (2001a) Random forests. Mach Learn 45:5–32Google Scholar
  14. Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231Google Scholar
  15. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonGoogle Scholar
  16. Burow KR, Nolan BT, Rupert MG, Dubrovsky NM (2010) Nitrate in groundwater of the United States, 1991−2003. Environ Sci Technol 44(13):4988–4997Google Scholar
  17. Cameron KC, Di HJ, Moir JL (2013) Nitrogen losses from the soil/plant system: a review. Ann Appl Biol 162(2):145–173Google Scholar
  18. Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson JC, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792.  https://doi.org/10.1890/07-0539.1 Google Scholar
  19. Davis DB, Sylvester-Bradley R (1995) The contribution of fertiliser nitrogen to leachable nitrogen in the UK: a review. J Sci Food Agric 68:399–406.  https://doi.org/10.1002/jsfa.2740680402 Google Scholar
  20. Debernardi L, De-Luca DA, Lasahna M (2007) Correlation between nitrate concentration in groundwater and parameters affecting aquifer intrinsic vulnerability. Environ Geol 55:539–558Google Scholar
  21. Defourny P, Kirches G, Brockmann C, Boettcher M, Peters M, Bontemps S, et al (2014) Land cover CCI product user guide version 2. 2014Google Scholar
  22. Döll P, Fiedler K (2008) Global-scale modeling of groundwater recharge. Hydrol Earth Syst Sci 12:863–885.  https://doi.org/10.5194/hess-12-863-2008,2008
  23. Dubrovsky NM, Burow KR, Clark GM, Gronberg JM, Hamilton PA, Hitt KJ, Mueller DK, Munn MD, Nolan BT, Puckett LJ, Rupert MG, Short TM, Spahr NE, Sprague LA, Wilber WG (2010) The quality of our nation’s waters—nutrients in the nation’s streams and groundwater, 1992–2004. US Geological Survey Circular 1350, 174 ppGoogle Scholar
  24. ESRI (1969) ArcGIS, www.arcgis.com/home. Accessed 23 June 2015
  25. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181Google Scholar
  26. Foster S, Pulido-Bosch A, Vallejos Á, Molina L, Llop A, MacDonald AM (2018) Impact of irrigated agriculture on groundwater-recharge salinity: a major sustainability concern in semi-arid regions. Hydrogeol J.  https://doi.org/10.1007/s10040-018-1830-2
  27. Fram MS, Belitz K (2011) Probability of detecting perchlorate under natural conditions in deep groundwater in California and the southwestern United States. Environ Sci Technol 45(4):1271–1277Google Scholar
  28. Friedl MA, Brodley CE, Strahler AH (1999) Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Trans Geosci Remote Sens 37(2 II):969–977Google Scholar
  29. Gassiat C, Gleeson T, Luijendijk E (2013) The location of old groundwater in hydrogeologic basins and layered aquifer systems. Geophys Res Lett 40(12):3042–3047.  https://doi.org/10.1002/grl.50599
  30. Gemitzi A, Petalas C, Pisinaras V, Tsihrintzis VA (2009) Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to south Rhodope aquifer (Thrace, Greece). Hydrol Process 23(3):372–383.  https://doi.org/10.1002/hyp.7143 Google Scholar
  31. Genuer R, Poggi JM, Christine TM (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236Google Scholar
  32. Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300Google Scholar
  33. Gleeson T, Moosdorf N, Hartmann J, van Beek LPH (2014) A glimpse beneath earth’s surface: global HYdrogeology MaPS (GLHYMPS) of permeability and porosity. Geophys Res Lett 41(11):3891–3898.  https://doi.org/10.1002/2014GL059856 Google Scholar
  34. Golkarian A, Naghibi SA, Kalantar B, Pradhan B (2018) Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environ Monit Assess 190(3):149.  https://doi.org/10.1007/s10661-018-6507-8
  35. Greene EA, LaMotte AE, Cullinan KA (2005) Ground-water vulnerability to nitrate contamination at multiple thresholds in the Mid-Atlantic region using spatial probability models. US Geological Survey Scientific Investigations Report 2004–5118, p 24Google Scholar
  36. Graham MH (2003) Confronting multicollinearity in ecological multiple regression. Ecology 84(11) pp. 2809–2815. https://www.jstor.org/stable/3449952. Accessed 3 Feb 2016
  37. Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319.  https://doi.org/10.1198/tast.2009.08199 Google Scholar
  38. Gurdak JJ, Qi SL (2012) Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environ Sci Technol 46(11):6004–6012Google Scholar
  39. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell (10):993–1001Google Scholar
  40. Hanson CR (2002) Nitrate concentrations in Canterbury ground water – a review of existing data. Report no. R02/17. Environment Canterbury Technical Report, 87 ppGoogle Scholar
  41. Hao A, Zhang Y, Zhang E, Li Z, Yu J, Wang H, Yang J, Wang Y (2018) Review: groundwater resources and related environmental issues in China. Hydrogeol J.  https://doi.org/10.1007/s10040-018-1787-1
  42. Hartmann J, Moosdorf N (2012) The new global lithological map database GLiM: a representation of rock properties at the earth surface. Geochem Geophys Geosyst 13:Q12004.  https://doi.org/10.1029/2012GC004370 Google Scholar
  43. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. SpringerGoogle Scholar
  44. Hengl T, Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, Ribeiro E, Samuel-Rosa A, Kempen B, Leenaars JGB, Walsh MG, Gonzalez MR (2014) Soil-Grids1km – global soil information based on automated mapping. PLoS One 9:e105992.  https://doi.org/10.1371/journal.pone.0105992 Google Scholar
  45. Hoyos ICP, Krakauer N, Khanbilvardi R (2015) Random forest for identification and characterization of groundwater dependent ecosystems. WIT Trans Ecol Environ 196:89–100Google Scholar
  46. ISRIC (2014) SoilGrids – Global gridded soil information. (https://www.isric.org/explore/soilgrids, Accessed 19 July 2014). [Reference to paper: Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, et al. (2014) SoilGrids1km — global soil information based on automated mapping. PLoS ONE 9(8):e105992.  https://doi.org/10.1371/journal.pone.0105992]
  47. Jung Y-Y, Dong-Chan K, Won-Bae P, Kyoochul H (2015) Evaluation of multiple regression models using spatial variables to predict nitrate concentrations in volcanic aquifers. Hydrol Process 30(5):663–675Google Scholar
  48. Kazemi G, Lehr J, Perrochet P (2006) Groundwater age. Wiley-Interscience, Hoboken, New Jersey. 325ppGoogle Scholar
  49. Khalil A, Almasri MN, McKee M, Kaluarachchi JJ (2005) Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour Res 41(5)Google Scholar
  50. Kihumba AM, Longo JN, Vanclooster M (2015) Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body. Democratic Republic of Congo. Hydrogeol J: 1–13.  https://doi.org/10.1007/s10040-015-1337-z
  51. Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of land use and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199.  https://doi.org/10.1016/j.scitotenv.2007.03.035 Google Scholar
  52. Lapworth DJ, Nkhuwa DCW, Okotto-Okotto J, Pedley S, Stuart ME, Tijani MN, Wright J (2017) Urban groundwater quality in sub-Saharan Africa: current status and implications for water security and public health. Hydrogeol J 25(4):1093–1116.  https://doi.org/10.1007/s10040-016-1516-6
  53. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
  54. Liu CW, Wang Y-B, Jang C-S (2013) Probability-based nitrate contamination map of groundwater in Kinmen. Environ Monit Assess 185(12):10147–10156Google Scholar
  55. Loosvelt L, Petersb J, Skriverc H, Lievensa H, Van Coillied FMB, De Baetsb B, Verhoesta NEC (2012) Random forests as a tool for estimating uncertainty at pixel-level in SAR image classification. Int J Appl Earth Obs Geoinf 19:173–184Google Scholar
  56. Luo Y, Qiao X, Song J, Christie P, Wong M (2003) Use of a multi-layer column device for study on leachability of nitrate in sludge-amended soils. Chemosphere 52:1483–1488Google Scholar
  57. MacDonald AM, Calow RC, MacDonald DM, Darling WG, Dochartaigh BÉÓ (2009) What impact will climate change have on rural groundwater supplies in Africa. Hydrol Sci J 64(690–703). 18ppGoogle Scholar
  58. MacDonald AM, Taylor RG, Bonsor HC (2013) Groundwater in Africa – is there sufficient water to support the intensification of agriculture from “Land Grabs”? Hand book of land and water grabs in Africa, 9ppGoogle Scholar
  59. Mair A, El-Kadi AI (2013) Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA. J Contam Hydrol 153:1–23Google Scholar
  60. Margat J (2010) Ressources et utilisation des eaux souterraines en Afrique. Managing Shared Aquifer Resources in Africa, Third International Conférence Tripoli 25–27 may 2008. International Hydrological Programme, Division of Water Sciences, IHP-VII Series on groundwater No.1, UNESCO, p 26–34Google Scholar
  61. Masterson, JP, Hess KM, Walter DA, LeBlanc DR (2002) Simulated changes in the sources of ground water for public-supply wells, ponds, streams, and coastal areas on Western Cape Cod, Massachusetts. US Geological Survey Water Resources Investigations Report 02–4143Google Scholar
  62. Mattern S, Vanclooster M (2009) Estimating travel time of recharge water through the unsaturated zone using transfer function model. Environ Fluid Mech.  https://doi.org/10.1007/s10652-009-9148-1
  63. Mattern S, Raouafi W, Bogaert P, Fasbender D, Vanclooster M (2012) Bayesian data fusion (BDF) of monitoring data with a statistical groundwater contamination model to map groundwater quality at the regional scale. J Water Resour Prot 4(11):929–943Google Scholar
  64. Mendes MP, Rodriguez-Galiano V, Luque-Espinar JA, Ribeiro L, Chica- Olmo M (2016) Applying random forest to assess the vulnerability of groundwater to pollution by nitrates. geoENV 2016. The 11th International Conference onGeostatistics for Environmental Applications. Lisbon, Portugal. geoENV2016BookofAbstractsMPMGoogle Scholar
  65. Moreno R, Zamora R, Molina JR, Vasquez A, Herrera MÁ (2011) Predictive modeling of microhabitats for endemic birds in south Chilean temperate forests using maximum entropy (Maxent). Eco Inform 6(6):364–370Google Scholar
  66. Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12(10):1061–1068Google Scholar
  67. Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775.  https://doi.org/10.1007/s11269-017-1660-3 Google Scholar
  68. Nelson A (2004) Population Density for Africa in 2000, 4th edn. Retrieved 1/27/2011 from UNEP/GRID Sioux Falls. https://databasin.org/datasets/4d59b959e8b040688037d2fe83a3f369. Accessed 19 April 2015
  69. Nolan BT, Hitt KJ (2006) Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environ Sci Technol 40(24):7834–7840.  https://doi.org/10.1021/es060911u Google Scholar
  70. Nolan BT, Hitt KJ, Ruddy BC (2002) Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States. Environ Sci Technol 36(10):2138–2145.  https://doi.org/10.1021/es0113854 Google Scholar
  71. Nolan BT, Fienen MN, Lorenz DL (2015) A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA. J Hydrol 531:902–911.  https://doi.org/10.1016/j.jhydrol.2015.10.025
  72. Nolan BT, Gronberg JM, Faunt CC, Eberts SM, Belitz K (2014) Modeling nitrate at domestic and public-supply well depths in the Central Valley, California. Environ Sci Technol 48(10):5643–5651.  https://doi.org/10.1021/es405452q. Google Scholar
  73. Norouz H, Negar AM, Attaallah N (2016) Determining vulnerable areas of Malekan Plain aquifer for nitrate, using random forest method. Journal of Environmental Studies, vol 41, no 4 (76), pp 923–942. http://www.sid.ir/En/Journal/ViewPaper.aspx?ID=550917. Accessed online 2 August 2018
  74. Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JMC (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manag 275:117–129Google Scholar
  75. Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell AF, Miller PI, Louzao, M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104.  https://doi.org/10.1016/j.biocon.2011.11.013 Google Scholar
  76. Ouedraogo I, Vanclooster M (2016a). A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. In: Hydrology and Earth System Sciences 20(6):2353–2381Google Scholar
  77. Ouedraogo I, Vanclooster M (2016b) Shallow groundwater poses pollution problem for Africa. SciDev.Net, 4 pp, http://hdl.handle.net/2078.1/169630
  78. Ouedraogo I, Defourny P, Vanclooster M (2016) Mapping the groundwater vulnerability for pollution at the pan-African scale. In: Science of the Total Environment, 544:939–953.  https://doi.org/10.1016/j.scitotenv.2015.11.135
  79. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222Google Scholar
  80. Park N-W (2014) Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets. Environ Earth Sci 73(3):937–949Google Scholar
  81. Pearson S (2015) Identifying Groundwater Vulnerability from Nitrate Contamination: Comparison of the DRASTIC model and Environment Canterbury’s method. Degree of Master of Applied Science (Environmental Management). Lincoln University. 58 ppGoogle Scholar
  82. Peters J, Baets BD, Verhoest NEC, Samson R, Degroeve S, Becker PD, Huybrechts W (2007) Random forests as a tool for ecohydrological distribution modelling. Ecol Model 207(2–4):304–318Google Scholar
  83. Potter P, Ramankutty N, Bennett EM, Donner SD (2010) Characterizing the spatial patterns of global fertilizer application and manure production. Earth Interact 14:1–22.  https://doi.org/10.1175/2009EI288.1 Google Scholar
  84. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199.  https://doi.org/10.1007/s10021-005-0054-1 Google Scholar
  85. Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifers--a growing legacy. Environ Sci Technol 45(3):839–844.  https://doi.org/10.1021/es1038358 Google Scholar
  86. R Development Core Team (2015) A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.r-project.org/. Last accessed 6 March 2015)
  87. Ramasamy N, Krishnan P, Bernard JC, Ritter WF(2003) Modeling Nitrate Concentration in Ground Water Using Regression and Neural Networks. Department of Food and Resource Economics. College of Agriculture and Natural Resources. University of Delaware(ORES SP03–01). 10ppGoogle Scholar
  88. Rankinen K, Salo T, Granlund K, Rita H (2007) Simulated nitrogen leaching, nitrogen mass field balances and their correlation on four farms in South-Western Finland during the period 2000–2005. Agric Food Sci 16:387–406Google Scholar
  89. Ransom et al (2017). A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA.  https://doi.org/10.1016/j.scitotenv.2017.05.192
  90. Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool. Springer, Berlin. 658pGoogle Scholar
  91. Ritter A, Muñoz-Carpena R (2013) Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments. J Hydrol 480:33–45.  https://doi.org/10.1016/j.jhydrol.2012.12.004 Google Scholar
  92. Rodriguez-Galiano VF, Chica-Rivas M (2012) Evaluation of different machine learning methods for land cover mapping of a Mediterranean area using multi-seasonal Landsat images and digital terrain models. Int J Digital Earth 7(6):492–509Google Scholar
  93. Rodriguez-Galiano VF, Chica-Olmo M, Abarca-Hernandez F, Atkinson PM, Jeganathan C (2012a) Random forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sens Environ 121:93–107Google Scholar
  94. Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012b) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104Google Scholar
  95. Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476-477:189–206.  https://doi.org/10.1016/j.scitotenv.2014.01.001 Google Scholar
  96. Saffigna PG, Keeney DR (1997) Nitrate and chloride in groundwater under irrigated agriculture in Central Wisconsin. Groundwater 15(2):170–177Google Scholar
  97. Sahoo S, Russo TA, Elliott J, Foster I (2017) Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S. Water Resour Res 53:3878–3895.  https://doi.org/10.1002/2016WR019933 Google Scholar
  98. Sajedi-Hosseini F, Malekian A, Choubin B, Rahmati O, Cipullo S, Coulon F, Pradhan B (2018) A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci Total Environ 644(2018):954–962.  https://doi.org/10.1016/j.scitotenv.2018.07.054 Google Scholar
  99. Schweigert P, Pinter N, van der Ploeg R (2004) Regression analyses of weather effects on the annual concentrations of nitrate in soil and groundwater. J Plant Nutr Soil Sci 167(3):309–318Google Scholar
  100. Sesnie SE, Gessler PE, Finegan B, Thessler S (2008) Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and change detection in complex neotropical environments. Remote Sens Environ 112(5):2145–2159Google Scholar
  101. Sieling K, Kage H (2006) N balance as an indicator of N leaching in an oilseed rape – winter wheat – winter barley rotation. Agric Ecosyst Environ 115:261–269Google Scholar
  102. Sophocleous M (2004) Groundwater recharge. In: Silveira L, Wohnlich S, Usunoff EL (eds), Groundwater. Encyclopedia of Life Support Systems (EOLSS), Developed under the Auspices of the UNESCO, Eolss Publishers, Oxford, UK. http://www.eolss.net. Accessed 9 September 2015
  103. Spalding RF, Exner ME (1993) Occurrence of nitrate in groundwater- a review. J Environ Qual 22:392–402.  https://doi.org/10.2134/jeq1993.00472425002200030002x
  104. Steele BM (2000) Combining multiple classifiers: an application using spatial and remotely sensed information for land cover type mapping. Remote Sens Environ 74(3):545–556Google Scholar
  105. Stevenson FJ, Cole MA (1999) Cycles of soil carbon, nitrogen, phosphorus, sulfur, micronutrients, 2nd edn. Wiley, HobokenGoogle Scholar
  106. Stigter TY, Ribeiro L, Dill AMMC (2008) Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land. J Hydrol 357(1–2):42–56Google Scholar
  107. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources, and a solution. BMC Bioinf 8:25.  https://doi.org/10.1186/1471-2105-8-25 Google Scholar
  108. Teng Y, Hu B, Zheng J, Wang J, Zhai Y, Zhu C (2018) Water quality responses to the interaction between surface water and groundwater along the Songhua River, NE China. Hydrogeol J.  https://doi.org/10.1007/s10040-018-1738-x
  109. Tesoriero AJ, Voss FD (1997) Predicting the probability of elevated nitrate concentrations in the Puget Sound-Basin, implications for aquifer susceptibility and vulnerability. Ground Water 35(6):1029–1039Google Scholar
  110. Thayalakumaran T, Charlesworth PB, Bristow K, van Bemmelen RJ, & Jaffres J (2004) Nitrate and ferrous iron concentrations in the lower Burdekin aquifers: assessing denitrification potential. In B. Singh (Ed), SuperSoil 2004 Conference 3rd Australian New Zealand Soils Conference (pp. 1-9). Sydney: The Regional Institute Ltd. https://researchoutput.csu.edu.au/en/publications/nitrate-and-ferrous-iron-concentrations-in-the-lower-burdekin-aqu, https://www.researchgate.net/publication/228513222_Nitrate_and_ferrous_iron_concentrations_in_the_lower_Burdekin_aquifers_assessing_denitrification_potenti. Accessed 17 Feb 2016
  111. Trambauer P, Dutra E, Maskey S, Werner M, Pappenberger F, van Beek LPH, Uhlenbrook S (2014) Comparison of different evaporation estimates over the African continent. Hydrol Earth Syst Sci 18(1):193–212Google Scholar
  112. UNECA, AU, AfDB (2000) The Africa Water Vision 2025: Equitable and Sustainable Use of Water for Socioeconomic Development. http://www.afdb.org/fileadmin/uploads/afdb/Documents/Generic-Documents/african%20water%20vision%202025%20to%20be%20sent%20to%20wwf5.pdf. Accessed 11 February 2016
  113. UNEP/DEWA (2014) Sanitation and Groundwater Protection – a UNEP Perspective. http://www.bgr.bund.de/EN/Themen/Wasser/Veranstaltungen/symp_sanitat-gwprotect/present_mmayi_pdf.pdf?__blob=publicationFile&v=2. Accessed 14 August 2014
  114. Ward MH, deKok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health—recent findings and research needs. Environ Health Perspect 113(11):1607–1614.  https://doi.org/10.1289/ehp.8043 Google Scholar
  115. Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488.  https://doi.org/10.1016/j.scitotenv.2015.07.080 Google Scholar
  116. Wick K, Heumesser C, Schmid E (2012) Groundwater nitrate contamination: factors and indicators. J Environ Manag 111:178–186Google Scholar
  117. Xu Y, Usher B (2006) Groundwater pollution in Africa. Taylor & Francis/Balkema, the Netherlands, 353 ppGoogle Scholar
  118. Yost AC et al (2008) Predictive modeling and mapping sage grouse (Centrocercus urophasianus) nesting habitat using maximum entropy and a long-term dataset from southern Oregon. Eco Inform 3(6):375–386Google Scholar
  119. Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2015) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13(5):839–856Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Issoufou Ouedraogo
    • 1
    • 2
  • Pierre Defourny
    • 1
  • Marnik Vanclooster
    • 1
  1. 1.Earth and Life InstituteUniversité catholique de LouvainLouvain-la-NeuveBelgium
  2. 2.Ecole Nationale Supérieur d’Ingénieurs de Fada (ENSI-F)Université de Fada N’GourmaFada N’GourmaBurkina Faso

Personalised recommendations