Environmental Science and Pollution Research

, Volume 25, Issue 23, pp 22658–22671 | Cite as

Cyanotoxin level prediction in a reservoir using gradient boosted regression trees: a case study

  • Paulino José García NietoEmail author
  • Esperanza García-Gonzalo
  • Fernando Sánchez Lasheras
  • José Ramón Alonso Fernández
  • Cristina Díaz Muñiz
  • Francisco Javier de Cos Juez
Research Article


Cyanotoxins are a type of cyanobacteria that is poisonous and poses a health threat in waters that could be used for drinking or recreational purposes. Thus, it is necessary to predict their presence to avoid risks. This paper presents a nonparametric machine learning approach using a gradient boosted regression tree model (GBRT) for prediction of cyanotoxin contents from cyanobacterial concentrations determined experimentally in a reservoir located in the north of Spain. GBRT models seek and obtain good predictions in highly nonlinear problems, like the one treated here, where the studied variable presents low concentrations of cyanotoxins mixed with high concentration peaks. Two types of results have been obtained: firstly, the model allows the ranking or the dependent variables according to its importance in the model. Finally, the high performance and the simplicity of the model make the gradient boosted tree method attractive compared to conventional forecasting techniques.


Statistical machine learning techniques Regression trees Gradient boosting Cyanotoxins Cyanobacteria Harmful algal blooms (HABs) 



Authors wish to acknowledge Cantabrian Basin Authority (Ministry of Environment, Rural and Marine Affairs of Spain) for the dataset used in this research.


  1. Barnes DJ, Chu D (2010) Introduction to modeling for biosciences. Springer, New YorkCrossRefGoogle Scholar
  2. Boopathi T, Ki J (2014) Impact of environmental factors on the regulation of cyanotoxin production. Toxins 6:1951–1978CrossRefGoogle Scholar
  3. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks/Cole, MontereyGoogle Scholar
  4. Brönmark C, Hansson L-A (2005) The biology of lakes and ponds. Oxford University Press, New YorkGoogle Scholar
  5. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505CrossRefGoogle Scholar
  6. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, California, USA, pp 785–794Google Scholar
  7. Chorus I, Bartram J (1999) Toxic cyanobacteria in water: a guide to their public health consequences, monitoring and management. Spon Press, New YorkCrossRefGoogle Scholar
  8. David P, Fewer DP, Köykkä K, Halinen K, Jokela J, Lyra C, Sivonen K (2009) Culture-independent evidence for the persistent presence and genetic diversity of microcystin-producing Anabaena (cyanobacteria) in the Gulf of Finland. Environ Microbiol 11:855–866CrossRefGoogle Scholar
  9. de Hoyos C, Negro A, Aldasoro JJ (2004) Cyanobacteria distribution and abundance in the Spanish water reservoirs during thermal stratification. Limnetica 23:119–132Google Scholar
  10. Döpke J, Fritsche U, Pierdzioch C (2017) Predicting recessions with boosted regression trees. Int J Forecast 33:745–759CrossRefGoogle Scholar
  11. Freedman D, Pisani R, Purves R (2007) Statistics. WW Norton & Company, New YorkGoogle Scholar
  12. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232CrossRefGoogle Scholar
  13. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378CrossRefGoogle Scholar
  14. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407CrossRefGoogle Scholar
  15. Gault PM, Marler HJ (2009) Handbook on cyanobacteria: biochemistry, biotechnology and applications. Nova Science Publishers, New YorkGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinCrossRefGoogle Scholar
  17. Hillebrand H, Dürselen C–D, Kirschtel D, Pollinger U, Zohary T (1999) Biovolume calculation for pelagic and benthic microalgae. J Phycol 35:403–424CrossRefGoogle Scholar
  18. Hinners J, Hofmeister R, Hense I (2015) Modeling the role of pH on Baltic Sea cyanobacteria. Life 5(2):1204–1217CrossRefGoogle Scholar
  19. Huisman J, Matthijs HCP, Visser PM (2010) Harmful cyanobacteria. Springer, New YorkGoogle Scholar
  20. Jeppesen E, Sondergaard M, Jensen JP (2003) Climatic warming and regime shifts in lake food webs: some comments. Limnol Oceanogr 48:1346–1349CrossRefGoogle Scholar
  21. Johnson NE, Ianiuk O, Cazap D, Liu L, Starobin D, Dobler G, Ghandehari M (2017) Patterns of waste generation: a gradient boosting model for short-term waste prediction in New York City. Waste Manag 62:3–11CrossRefGoogle Scholar
  22. Józwiak T, Mazur-Marzec H, Plinski M (2008) Cyanobacterial blooms in the Gulf of Gdan'sk (southern Baltic): the main effect of eutrophication. Oceanol Hydrobiol Stud 37:115–121CrossRefGoogle Scholar
  23. Landry M, Erlinger TP, Patschke D, Varrichio C (2016) Probabilistic gradient boosting machines for GEFCom2014 wind forecasting. Int J Forecast 32(3):1061–1066CrossRefGoogle Scholar
  24. Mayr A, Binder H, Gefeller O, Schmid M (2014a) The evolution of boosting algorithms: from machine learning to statistical modelling. Methods Inf Med 6(1):419–427Google Scholar
  25. Mayr A, Binder H, Gefeller O, Schmid M (2014b) Extending statistical boosting: an overview of recent methodological developments. Method Inform Med 6(2):428–435Google Scholar
  26. Negro AI, de Hoyos C, Vega JC (2000) Phytoplankton structure and dynamics in Lake Sanabria and Valparaíso reservoir (NW Spain). Hydrobiologia 424:25–37CrossRefGoogle Scholar
  27. Persson C, Bacher P, Shiga T, Madsen H (2017) Multi-site solar power forecasting using gradient boosted regression trees. Sol Energy 150:423–436CrossRefGoogle Scholar
  28. Peschek GA, Obinger C, Renger G (2011) Bioenergetic processes of cyanobacteria: from evolutionary singularity to ecological diversity. Springer, New YorkCrossRefGoogle Scholar
  29. Picard R, Cook D (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575–583CrossRefGoogle Scholar
  30. Ploug H (2008) Cyanobacterial surface blooms formed by Aphanizomenon sp. and Nodularia spumigena in the Baltic Sea: small-scale fluxes, pH, and oxygen microenvironments. Limnol Oceanogr 53:914–921CrossRefGoogle Scholar
  31. Quesada A, Sanchis D, Carrasco D (2004) Cyanobacteria in Spanish reservoirs. How frequently are they toxic? Limnetica 23:109–118Google Scholar
  32. Quesada A, Moreno E, Carrasco D, Paniagua T, Wormer L, de Hoyos C, Sukenik A (2006) Toxicity of Aphanizomenon ovalisporum (cyanobacteria) in a Spanish water reservoir. Eur J Phycol 41:39–45CrossRefGoogle Scholar
  33. Ridgeway G (2007) Generalized boosted models: a guide to the GBM package. Accessed 3 Aug 2007
  34. Ridgeway G (2017) gbm: Generalized boosted regression models. R package version 2.1.1. Accessed 21 Mar 2017
  35. Saqrane S, Oudra B (2009) CyanoHAB occurrence and water irrigation cyanotoxin contamination: ecological impacts and potential health risks. Toxins 1:113–122CrossRefGoogle Scholar
  36. Schapire RE (2003) The boosting approach to machine learning an overview. In: Denison DD, Hansen MH, Holmes CC, Mallick B, Yu B (eds) Nonlinear estimation and classification, Lecture notes in statistics, vol 171. Springer, Germany, pp 149–171CrossRefGoogle Scholar
  37. Scheffer M (2005) Ecology of shallow lakes. Springer, New YorkGoogle Scholar
  38. Spoof L, Berg KA, Rapala J, Lahti K, Lepistö L, Metcalf JS, Codd GA, Meriluoto J (2006) First observation of cylindrospermopsin in Anabaena lapponica isolated from the boreal environment (Finland). Environ Toxicol 21:552–560CrossRefGoogle Scholar
  39. Stewart I, Webb PM, Schluter PJ, Shaw GR (2006) Recreational and occupational field exposure to freshwater cyanobacteria—a review of anecdotal and case reports, epidemiological studies and the challenges for epidemiologic assessment. Environ Health 5:1–13CrossRefGoogle Scholar
  40. Taieb SB, Hyndman RJ (2014) A gradient boosting approach to the kaggle load forecasting competition. Int J Forecast 30(2):382–394CrossRefGoogle Scholar
  41. Texeira MR, Rosa MJ (2006) Comparing dissolved air flotation and conventional sedimentation to remove cyanobacterial cells of Microcystis aeruginosa: part I: the key operating conditions. Sep Purif Technol 52:84–94CrossRefGoogle Scholar
  42. Touloupakis E, Cicchi B, Silva Benavides AM, Torzillo G (2016) Effect of high pH on growth of Synechocystis sp. PCC 6803 cultures and their contamination by golden algae (Poterioochromonas sp.). Appl Microbiol Biotechnol 100:1333–1341CrossRefGoogle Scholar
  43. van der Valk AG (2006) The biology of freshwater wetlands. Oxford University Press, New YorkGoogle Scholar
  44. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New YorkGoogle Scholar
  45. Vasconcelos V (2006) Eutrophication, toxic cyanobacteria and cyanotoxins: when ecosystems cry for help. Limnetica 25:425–432Google Scholar
  46. Whitton BA, Potts M (2000) The ecology of cyanobacteria: their diversity in time and space. Springer, New YorkGoogle Scholar
  47. World Health Organization (1998) Guidelines for drinking-water quality: health criteria and other supporting information, vol 2. World Health 408 Organization, GenevaGoogle Scholar
  48. Yamamoto Y, Nakahara H (2005) The formation and degradation of cyanobacterium Aphanizomenon flos-aquae blooms: the importance of pH, water temperature, and day length. Limnology 6:1–6CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Paulino José García Nieto
    • 1
    Email author
  • Esperanza García-Gonzalo
    • 1
  • Fernando Sánchez Lasheras
    • 2
  • José Ramón Alonso Fernández
    • 3
  • Cristina Díaz Muñiz
    • 3
  • Francisco Javier de Cos Juez
    • 4
  1. 1.Department of Mathematics, Faculty of SciencesUniversity of OviedoOviedoSpain
  2. 2.Department of Construction and Manufacturing EngineeringUniversity of OviedoGijónSpain
  3. 3.Cantabrian Basin AuthoritySpanish Ministry of Agriculture, Food and EnvironmentOviedoSpain
  4. 4.Exploitation and Prospecting DepartmentUniversity of OviedoOviedoSpain

Personalised recommendations