Skip to main content

Machine Learning for Macroscale Ecological Niche Modeling - a Multi-Model, Multi-Response Ensemble Technique for Tree Species Management Under Climate Change

  • Chapter
  • First Online:
Machine Learning for Ecology and Sustainable Natural Resource Management

Abstract

The field of machine learning has grown exponentially over the past decade, helped by faster PCs with more memory, and resulting in development of a plethora of techniques based on ensemble methods. In this chapter, I explore techniques relevant to macroscale ecological niche modelling in a regression context. I evaluate the challenges while predicting suitable habitats under future climates, and address issues related to high dimensional data like variance-bias tradeoffs, overfitting and nonlinearity. To illustrate, I choose a generalist tree species, the white oak (Quercus alba) in the eastern United States, and model its current and future-climate abundances as a multi-response blend of relative and absolute dominance and density. A novel multi-model ensemble approach is developed, using techniques of randomized decision trees and stochastic gradient boosting. I assess model performance, prediction confidence, and predictor importance in a multi-response, multi-model framework and discuss its relevance for tree species management under climate change as well as its limitations and caveats.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Anderson BJ, Chiarucci A, Williamson M (2012) How differences in plant abundance measures produce different species-abundance distributions. Methods Ecol Evol 3:783–786

    Article  Google Scholar 

  • Bell DM, Schlaepfer DR (2016) On the dangers of model complexity without ecological justification in species distribution modelling. Ecol Model 330:50–59

    Article  Google Scholar 

  • Belle A, Thiagarajan R, Soroushmehr SMR, Navidi F, Beard DA, Najarian K (2015) Big data analytics in healthcare. BioMed Res Int 370194, 16. doi:https://doi.org/10.1155/2015/370194

    Article  Google Scholar 

  • Belmaker J, Zarnetske P, Tuanmu M-N, Zonneveld S, Record S, Strecker A, Beaudrot L (2015) Empirical evidence for the scale dependence of biotic interactions. Glob Ecol Biogeogr 24:750–761

    Article  Google Scholar 

  • Bowman DM, Perry GLW, Marston JB (2015) Feedbacks and landscape-level vegetation dynamics. Trends Ecol Evol 30:255–260

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC press, Boca Raton

    Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: reliable large-scale tree boosting system. ar Xiv: 1603.02754 [cs. LG]. http://arxiv.org/pdf/1603.02754v1

  • Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GH, Curtis J, Pasteris PP (2008) Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int J Climatol 28:2031–2064

    Article  Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees. Mach Learn 40:139–157

    Article  Google Scholar 

  • Dietterich TG, Kong EB (1995) Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Mach Learn 255:0–13

    Google Scholar 

  • Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  • Elith J, Kearney M, Phillips S (2010) The art of modelling range-shifting species. Methods Ecol Evol 1:330–342

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378

    Article  Google Scholar 

  • Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954

    Article  Google Scholar 

  • Galelli S, Castelletti A (2013) Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling. Hydrol Earth Syst Sci 17:2669–2684

    Article  Google Scholar 

  • Garcia-Valdes R, Zavala MA, Araujo MB, Purves DW (2013) Chasing a moving target: projecting climate change-induced shifts in non-equilibrial tree species distributions. J Ecol 101:441–453

    Article  Google Scholar 

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42

    Article  Google Scholar 

  • Guisan A, Edwards TC Jr, Hastie T (2002) Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecol Model 157:89–100

    Article  Google Scholar 

  • Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple habitat models. Ecol Lett 8:993–1009

    Article  Google Scholar 

  • Guth PL (2006) Geomorphometry from SRTM: Comparison to NED. Photogramm Eng Remote Sens 72:269–277

    Article  Google Scholar 

  • Hampton SE, Strasser CA, Tewksbury JJ, Gram WK, Budden AE, Batcheller AL, Duke CS, Porter JH (2013) Big data and future of ecology. Front Ecol Environ 11:156–162

    Article  Google Scholar 

  • Hannemann H, Willis KJ, Macias-Fauria M (2015) The devil is in the detail: unstable response functions in species distribution models challenge bulk ensemble modelling. Glob Ecol Biogeogr 25:26–35

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer Science, New York

    Book  Google Scholar 

  • Hawkins BA (2012) Eight (and a half) deadly sins of spatial analysis. J Biogeogr 39:1–9

    Article  Google Scholar 

  • Hill L, Hector A, Hemery G, Smart S, Tanadini M, Brown N (2017) Abundance distributions for tree species in Great Britain: a two-stage approach to modeling abundance using species distribution modeling and random forest. Ecol Evol 7:1043–1056

    Article  Google Scholar 

  • Hussain K, Prieto E (2016) Big data in the finance and insurance sectors. In: Cavanillas JM et al (eds) New horizons for a data-driven economy. Springer Open. https://doi.org/10.1007/978-3-319-21569-3

    Google Scholar 

  • Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Monogr 68:465–485

    Article  Google Scholar 

  • Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. For Ecol Manag 254:390–406

    Article  Google Scholar 

  • Iverson LR, Thompson FR, Matthews S, Peters M, Prasad AM, Dijak WD, Fraser J, Wang WJ, Hanberry B, He H, Janowiak M, Butler P, Brandt L, Swanston C (2016) Multi-model comparison on the effects of climate change on tree species in the eastern U.S.: results from an enhanced niche model and process-based ecosystem and landscape models. Landsc Ecol. https://doi.org/10.1007/s10980-016-0404-8

    Article  Google Scholar 

  • Jones MC, Cheung WWL (2015) Multi-model ensemble projections of climate change effects on global marine biodiversity. ICES J Mar Sci 72:741–752

    Article  Google Scholar 

  • Jones CD, Hughes JK, Bellouin N, Hardiman SC, Jones GS, Knight J, Liddicoat S, O’Connor FM, Andres RJ, Bell C, Boo K-O, Bozzo A, Butchart N, Cadule P, Corbin KD, Doutriaux-Boucher M, Friedlingstein P, Gornall J, Gray L, Halloran PR, Hurtt G, Ingram WJ, Lamarque J-F, Law RM, Meinshausen M, Osprey S, Palin EJ, Parsons Chini L, Raddatz T, Sanderson MG, Sellar AA, Schurer A, Valdes P, Wood N, Woodward S, Yoshioka M, Zerroukat M (2011) The HadGEM2-ES implementation of CMIP5 centennial simulations. Geosci Model Dev 4:543–570

    Article  Google Scholar 

  • Kühn I, Dormann CF (2012) Less than eight (and a half) mis- conceptions of spatial analysis. J Biogeogr 39:995–998

    Article  Google Scholar 

  • Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2:18–22

    Google Scholar 

  • Loh W-Y (2011) Classification and regression trees. WIREs Data Min Knowl Discovery 1:14–23. https://doi.org/10.1002/widm.8

    Article  Google Scholar 

  • Martre P, Wallach D, Asseng S, Ewert F, Boote KJ, Ruane AC, Peter J, Cammarano D, Hatfield JL, Rosenzweig C, Aggarwal PK, Angulo C, Basso B, Bertuzzi P (2015) Multimodel ensembles of wheat growth: many models are better than one. Glob Chang Biol 21:911–925

    Article  Google Scholar 

  • McGuffie K, Henderson-Sellers A (2014) A climate modelling primer, 4th edn. Wiley, p 456. isbn:978-1-119-94336-5

    Google Scholar 

  • McNaughton SJ, Wolf LL (1970) Dominance and the niche in ecological systems. Science 167:131–139

    Article  CAS  Google Scholar 

  • Meinshausen M, Smith SJ, Calvin K, Daniel JS, Kainuma MLT, Lamarque JF, Matsumoto K, Montzka SA, Raper SCB, Riahi K, Thomson A, Velders GJM, van Vuuren DPP (2011) The RCP greenhouse gas concentrations and their extensions from 1765 to 2300. Clim Chang 109:213–241

    Article  CAS  Google Scholar 

  • Merow C, Smith MJ, Edwards TC Jr, Guisan A, McMahon SM, Normand S, Thuiller W, Wuest RO, Zimmermann NE, Elith J (2014) What do we gain from simplicity versus complexity in species distribution models? Ecography 37:1267–1281

    Article  Google Scholar 

  • Moss R, Babiker M, Brinkman S, Calvo E, Carter T et al (2008) Towards new scenarios for analysis of emissions, climate change, impacts, and response strategies. Intergovernmental Panel on Climate Change, Geneva, p 132 http://www.aimes.ucar.edu/docs/IPCC.meetingreport.final.pdf

    Google Scholar 

  • NRCS (Natural Resources Conservation Service) (2009) Soil Survey Geographic (SSURGO). Available at https://datagateway.nrcs.usda.gov/. Accessed between August 2009 and November 2010

  • Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198

    Article  Google Scholar 

  • Peters MP, Iverson LR, Prasad AM, Matthews SN (2013) Integrating fine-scale soil data into species distribution models: preparing soil survey geographic (SSURGO) data from multiple counties. US Department of Agriculture, Forest Service, Northern Research Station, Newtown Square, p 70

    Book  Google Scholar 

  • Prasad AM (2015) Macroscale intraspecific variation and environmental heterogeneity: analysis of cold and warm zone abundance, mortality, and regeneration distributions of four eastern US tree species. Ecol Evol 5:5033–5048

    Article  Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199

    Article  Google Scholar 

  • Prasad AM, Iverson LR, Matthews SN, Peters MP (2016) A multistage decision support framework to guide tree species management under climate change via habitat suitability and colonization models, and a knowledge-based scoring system. Landsc Ecol. https://doi.org/10.1007/s10980-016-0369-7

    Article  Google Scholar 

  • PRISM Climate Group. Oregon State University, http://prism.oregonstate.edu

  • Ridgeway G (1999) The state of boosting. Comput Sci Stat 31:172–181

    Google Scholar 

  • Rokach L, Maimon O (2015) Data mining with decision trees - theory and applications, 2nd edn. World Scientific

    Google Scholar 

  • R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna URL https://www.R-project.org/

    Google Scholar 

  • Slavakis K, Giannakis GB, Mateos M (2014) Modeling and optimization for big data analytics. IEEE Signal Process Mag 5:18–31

    Article  Google Scholar 

  • Tebaldi C, Knutti R (2007) The use of the multi-model ensemble in probabilistic climate projections. Phil Trans R Soc A 365:2053–2075

    Article  Google Scholar 

  • Thrasher B, Xiong J, Wang W, Melton F, Michaelis A, Nemani R (2013) Downscaled climate projections suitable for resource management. Trans Am Geophys Union 94:321–323

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso Robert Tibshirani. J R Stat Soc Ser B Stat Methodol 58:267–288

    Google Scholar 

  • Van Horn JD, Toga AW (2014) Human neuroimaging as a “big data” science. Brain Imaging Behav 8:323–331. https://doi.org/10.1007/s11682-013-9255-y

    Article  PubMed  PubMed Central  Google Scholar 

  • Vincenzia S, Zucchettab M, Franzoib P, Pellizzato M, Pranovib F, De Leo GA, Torricelli P (2011) Application of a random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecol Model 222:1471–1478

    Article  Google Scholar 

  • Woudenberg SW, Conkling BL, O’Connell BM, LaPoint EB, Turner JA, Waddell KL (2010) The forest inventory and analysis database: database description and User’s manual version 4.0 for phase 2. General Technical Report RMRS-GTR-245, USDA Forest Service, Rocky Mountain Research Station, Fort Collins, Colorado, 336 p

    Google Scholar 

  • Zhang Y, Zhao Y (2015) Astronomy in the big data era. Data Sci J 14:11. https://doi.org/10.5334/dsj-2015-011

    Article  Google Scholar 

  • Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC press, Boca Raton

    Book  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. R Stat Soc Ser B Stat Methodol 67:301–320

    Article  Google Scholar 

  • Zurell D, Thuiller W, Pagel J, Cabral JS, Münkemüller T, Gravel D, Dullinger S, Normand S, Schiffers KH, Moore KA, Zimmermann NE (2016) Benchmarking novel approaches for modelling species range dynamics. Glob Chang Biol 22:2651–2664

    Article  Google Scholar 

Download references

Acknowledgements

The author would like to thank Louis Iverson for his comprehensive review and also two anonymous reviewers for their valuable suggestions. Thanks to the Northern Research Station, USDA Forest Service, for funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anantha M. Prasad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Prasad, A.M. (2018). Machine Learning for Macroscale Ecological Niche Modeling - a Multi-Model, Multi-Response Ensemble Technique for Tree Species Management Under Climate Change. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_6

Download citation

Publish with us

Policies and ethics