Skip to main content

Boosting, Bagging and Ensembles in the Real World: An Overview, some Explanations and a Practical Synthesis for Holistic Global Wildlife Conservation Applications Based on Machine Learning with Decision Trees

  • Chapter
  • First Online:
Machine Learning for Ecology and Sustainable Natural Resource Management

Abstract

Boosting, bagging and ensembles are intellectually ‘deep’ modeling methods well-known and described for several decades. Great computing tools exist to use those methods. But with few exceptions they have not been used well for natural resource conservation management or ecology; for instance, the advanced works of Breiman (2001), Friedman (2001), and Elder (2003) still await generic recognition. Here I present on these methods, conveniently driven by binary recursive partitioning (Classification and Regression Trees CARTs), and many of their real-world aspects and usages. I elaborate on applications and on some of the implementation hurdles known. It is shown that those machine learning methods are the essential part of the new generation of quantitative reasoning. It allows for relevant progress, all while the global environmental state decays further, climate change remain unaccounted for and sustainability policies remain outdated urging for an effective change of global culture and governance.

My goal is simple. It is a complete understanding of the universe, why it is as it is and why it exists at all.

Stephen Hawkins

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aggarwal C (2015) Data mining: the textbook. Springer

    Google Scholar 

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19. Institute of Statistical Mathematics, Minato-ku, pp 716–723

    Google Scholar 

  • Alexander JC (2013) The dark side of modernity. Polity Press, Cambridge

    Google Scholar 

  • Anderson DR, Burnham KP, Thompson WL (2000) Null hypothesis testing: problems, prevalence, and an alternative. J Wildl Manag 64:912–923

    Article  Google Scholar 

  • Araujo MB, and New M (2007) Ensemble forecasting of speies distributions. Trends in Ecology and Evolution 22:42–47

    Article  Google Scholar 

  • Arnold TW (2010) Uninformative parameters and model selection using Akaike’s information criterion. J Wildl Manag 74:1175–1178

    Article  Google Scholar 

  • Baltensperger AP, Huettmann F (2015) Predicted shifts in small mammal distributions and biodiversity in the altered future environment of Alaska: an open access data and Machine Learning. PLoS One. https://doi.org/10.1371/journal.pone.0132054

    Article  CAS  Google Scholar 

  • Berthold P (2016) Mein Leben fuer die Voegel. Kosmos Publisher, Berlin

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 26:123–140

    Google Scholar 

  • Breiman L (1998) Arcing classifier (with discussion and a rejoinder by the author). Ann Stat 26(3):801–849. https://doi.org/10.1214/aos/1024691079

    Article  Google Scholar 

  • Breiman L (2001a) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breiman L (2001b) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York

    Google Scholar 

  • Cai T, Huettmann F, Guo Y (2014) Using stochastic gradient boosting to infer stopover habitat selection and distribution of hooded cranes Grus monacha during spring migration in Lindian, Northeast China. PLos ONE 9. https://doi.org/10.1371/journal.pone.0097372

  • Chunrong M, Huettmann F, Guo Y (2016) Climate envelope predictions indicate an enlarged suitable wintering distribution for great bustards (Otis tarda dybowski) in China for the 21st century. PeerJ 4:e1630. https://doi.org/10.7717/peerj.1630

    Article  CAS  Google Scholar 

  • Chunrong M, Huettmann F, Guo Y, Han X, Wen L (2017) Why choose random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ 5:e2849. https://doi.org/10.7717/peerj.2849

    Article  Google Scholar 

  • Cockburn A (2013) A colossal wreck: a road trip through political scandal, corruption and American culture. Verso Publishers, New York

    Google Scholar 

  • Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792. https://doi.org/10.1890/07-0539.1

    Article  PubMed  Google Scholar 

  • Czech B, Krausman PR, Devers PK (2000) Economic associations among causes of species endangerment in the United States. Bioscience 50:593–601

    Article  Google Scholar 

  • De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251

    Article  Google Scholar 

  • De’ath G, Fabricius K (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192 https://doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2

    Article  Google Scholar 

  • Dhar V (1998) Data mining in finance: using counterfactuals to generate knowledge from organizational information systems. Inf Syst 23:423–437

    Article  Google Scholar 

  • Drew CA, Wiersma Y, Huettmann F (eds) (2011). Predictive Species and Habitat Modeling in Landscape Ecology.  Springer, New York

    Google Scholar 

  • Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recognit Artif Intell 7:705–771

    Article  Google Scholar 

  • Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall/CRC Monographs, New York

    Book  Google Scholar 

  • Elder JF (2003) The generalization paradox of ensembles. J Comput Graph Stat 12:853–864

    Article  Google Scholar 

  • Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton J, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151

    Article  Google Scholar 

  • Evans JS, Cushman S (2009) Gradient modeling of conifer species using random forests. Landsc Ecol 24:673. https://doi.org/10.1007/s10980-009-9341-0

    Article  Google Scholar 

  • Evans JS, Murphy MA, Holden ZA, Cushman SA (2010) Modeling species distribution and change using random forest. Predictive species and habitat modeling in landscape ecology, pp 139–159

    Google Scholar 

  • Ferandez-Delgado M, Cernadas E, Barrow S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15:3133–3181

    Google Scholar 

  • Fielding A (1999) Machine learning methods for ecological applications. Springer, Boston

    Book  Google Scholar 

  • Fielding A, Bell Y (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49

    Article  Google Scholar 

  • Forman RTT (1995) Land mosaics: the ecology of landscapes and regions. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Fox CH, Huettmann, F, Harvey GKA, Morgan KH,. Robinson J, Williams R and Paquet PC (2017) Predictions from Machine Learning ensembles: marine bird distribution and density on Canada’s Pacific coast. Marine Ecology Progress Series 566:199–216

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378

    Article  Google Scholar 

  • Guthery FS, Brennan LA, Peterson MJ, Lusk LL (2005) Information theory in wildlife science: critique and viewpoint. J Wildl Manag 69:457–465

    Article  Google Scholar 

  • Hardy SM, Lindgren M, Konakanchi H, Huettmann F (2011) Predicting the distribution and ecological niche of unexploited snow crab (Chionoecetes opilio) populations in Alaskan waters: a first open-access ensemble model. Integr Comp Biol 51(4):608–622. https://doi.org/10.1093/icb/icr102

    Article  PubMed  Google Scholar 

  • Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York

    Book  Google Scholar 

  • Hastie T, Tibshirany R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics

    Google Scholar 

  • Hegel TSA, Cushman JE, Huettmann F (2010) Current state of the art for statistical modelling of species distributions. Chapter 16. In: Cushman S, Huettmann F (eds) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo, pp 273–312

    Chapter  Google Scholar 

  • Herrick KA, Huettmann F, Lindgren MA (2013) A global model of avian influenza prediction in wild birds: the importance of northern regions. Vet Res. https://doi.org/10.1186/1297-9716-44-42

    Article  Google Scholar 

  • Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton

    Google Scholar 

  • Hobbs NT, Hooten M (2015) Bayesian models: a statistical primer for ecologists. University Press, Princeton

    Book  Google Scholar 

  • Hochachka W, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data mining for discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437

    Article  Google Scholar 

  • Huettmann F (2007) Modern adaptive management: adding digital opportunities towards a sustainable world with new values. Forum on Public Policy: Clim Chang Sustain Dev 3:337–342

    Google Scholar 

  • Jiao S, Guo Y, Huettmann F, Lei G (2014) Nest-site selection analysis of hooded crane (Grus monacha) in northeastern China based on a multivariate ensemble model. Zool Sci 31:430–437

    Article  Google Scholar 

  • Johnson DS, Thomas DL, Ver Hoef JM, Christ AD (2008) A general framework for the analysis of animal resource selection from telemetry data. Biometrics 64:968–976

    Article  Google Scholar 

  • Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Ecol Inform 5:441–450

    Article  Google Scholar 

  • Kandel K, Huettmann F, Suwal MK, Regmi GR, Nijman V, Nekaris KAI, Lama ST, Thapa A, Sharma HP, Subedi TR (2015) Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region. Biol Conserv 181:150–161

    Article  Google Scholar 

  • Keating KA, Cherry S (2004) Use and interpretation of logistic regression in habitat- selection studies. Journal of Wildlife Management 68:774–789

    Article  Google Scholar 

  • Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109

    Article  CAS  Google Scholar 

  • Kurt F (1982) Naturschutz-illusion. Paul Parey Publisher, Berlin Germany

    Google Scholar 

  • Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range-shifts: model differences and model reliability. Glob Chang Biol 12:1568–1584

    Article  Google Scholar 

  • Lawler JJ, Yo W, Huettmann F (2011) Designing predictive models for increased utility: using species distribution models for conservation planning, forecasting, and risk assessment. In: Drew CA, Wiersma Y, Huettmann F (eds) Predictive modeling in landscape ecology. Chapter 5. Springer, New York, pp 271–290

    Chapter  Google Scholar 

  • Leopold A, Meine C (2013) A sand county almanac & other writings on conservation and ecology. Library of America, New York

    Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomforests. R News 2(3):18

    Google Scholar 

  • Liu J, Dou Y, Batistella M, Challies E, Conno T, Friis C, DA MJ, Parish E, CL R, Bl BS, Triezenber H, Yang H, Zhao Z, Zimmerer KS, Huettmann F, Treglia M, Basher Z, Chung MG, Herzberger A, Lenschow A, Mechiche-Alami A, Newig A, Roch J, Sun J (2018) Spillover systems in a telecoupled Anthropocene: typology, methods, and governance for global sustainability. Environ Sustain 33:58–69. https://doi.org/10.1016/j.cosust.2018.04.009

    Article  Google Scholar 

  • Loftus GR (1996) Psychology will be a much better science when we change the way we analyze data. Curr Dir Psychol 5:161–171

    Article  Google Scholar 

  • Mace G, Cramer W, Diaz S, Faith DP, Larigauderie A, Le Prestre P, Palmer M, Perrings C, Scholes RJ, Walpole M, Walter BA, Watson JEM, Mooney HA (2010) Biodiversity targets after 2010. Environ Sustain 2:3–8

    Google Scholar 

  • MacNally R (2000) Regression and model-building in conservation biology, biogeography and ecology: the distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models. Biodivers Conserv 6:655–671

    Article  Google Scholar 

  • Manly FJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies, Second edn. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  • McArdle (1988) The structural relationship: regression in biology. Can J Zool 66: 2329–2339

    Google Scholar 

  • Merow C, Silander JA (2014) A comparison of Maxlike and Maxent for modelling species distributions. Methods Ecol Evol 5:215–225

    Article  Google Scholar 

  • Mueller JP, Massaron L (2016) Machine Learning for dummies. For Dummies Publisher, 435 p

    Google Scholar 

  • Næss A (1989) Ecology, community and lifestyle: outline of an Ecosophy (trans: Rothenberg D). Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Nielsen SE, Stenhouse GB, Beyer HL, Huettmann F, Boyce MS (2008) Can natural disturbance-based forestry rescue a declining population of grizzly bears? Biol Conserv 141:2193–2207

    Article  Google Scholar 

  • O’Connor R, Jones MT, White D, Hunsacker C, Loveland T, Jones B, Preston E (1996) Spatial partitioning of environmental correlates of avian biodiversity in the Conterminuous United States. Biodivers Lett 3:97–110

    Article  Google Scholar 

  • Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell AF, Miller PI, Louzao M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104

    Article  Google Scholar 

  • Perera AH, Drew A, Johnson CJ (2010) Expert knowledge and its application in landscape ecology. Springer, New York

    Google Scholar 

  • Phillips SJ, Dudik M (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31:161–175

    Article  Google Scholar 

  • Regmi GR, Huettmann F, Suwal MK, Nijman V, Nekaris KAI, Kandel K, Sharma N and Coudrat C (2018). First Open Access Ensemble Climate Envelope Predictions of Assamese Macaque Macaca Assamensis in South and South-East Asia: A new role model and assessment of endangered species. Endangered Species Research 36:149–160 https://doi.org/10.3354/esr0088

  • Reinhart A (2015) Statistics done wrong: The woefully complete guide. No Starch Press. San Francisco

    Google Scholar 

  • Reich Y, Barai SV (1999) Evaluating Machine Learning models for engineering problems. Artif Intell Eng 13:257–272

    Article  Google Scholar 

  • Romesburg HC (1989) More on gaining reliable knowledge. J Wildl Manag 53:1177–1180

    Article  Google Scholar 

  • Schapire RE (1990) The strength of weak learnability (PDF). Machine learning, vol 5. Kluwer Academic Publishers, Boston, pp 197–227. https://doi.org/10.1007/bf00116037

    Book  Google Scholar 

  • Schapire RE (1992) The design and analysis of efficient learning algorithms. MIT Press, USA

    Google Scholar 

  • Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictors. Machine Learning 37:297–336

    Article  Google Scholar 

  • Silva NJ (2012) The wildlife techniques manual: research & management. 2 volumes. The Johns Hopkins University Press; Seventh edn

    Google Scholar 

  • Smith BD, Zeder MD (2013) The onset of the Anthropocene. Anthropocene 4:6–13

    Article  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistical analysis, 4th edn. Springer, New York

    Google Scholar 

  • Verner J, Morrison ML, Ralph CJ (1986) Wildlife 2000. Modeling habitat relationships of terrestrial vertebrates. University of Wisconsin Press, Madison

    Google Scholar 

  • Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufman Publisher, Amsterdam

    Google Scholar 

  • Yen P, Huettmann F, Cooke F (2004) Modelling abundance and distribution of marbled Murrelets (Brachyramphus marmoratus) using GIS, marine data and advanced multivariate statistics. Ecol Model 171:395–413

    Article  Google Scholar 

  • Zar JH (2010) Biostatistical analysis, 5th edn. Prentice Hall, Upper Saddle River

    Google Scholar 

Download references

Acknowledgement

I thank Profs R. O’Connor and A.W. (Tony) Diamond for an early workshop on statistics with ACWERN at UNB, Canada introducing me in the late 1990s to tree-based techniques (CART) and multivariate analysis. I thank Dan Steinberg and Salford Systems Ltd. for a workshop with U.S. IALE at Snowbird, Utah, as well as with The Wildlife Society, Alaska Chapter, for a wider debate and introduction of tree-based methods, boosting and bagging. I am indebted to U.S.IALE, the Global Primate Network in Kathmandu, Nepal, Medical University Taipeh, Taiwan, and the Wildlife Institute of India in Dheradun for their workshop promotion and support. Thanks to S. Linke, I. Presse, B. Walter, G. Regmi, M. Suwal, R. Lama, C. Cambu, H. Hera, S. Sparks, Y. Subaru, H. Berrios and the many members of the -EWHALE lab- at UAF for their discussions and partly, support. This is EWHALE lab publication #187.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falk Huettmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huettmann, F. (2018). Boosting, Bagging and Ensembles in the Real World: An Overview, some Explanations and a Practical Synthesis for Holistic Global Wildlife Conservation Applications Based on Machine Learning with Decision Trees. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_3

Download citation

Publish with us

Policies and ethics