Skip to main content

Use of Machine Learning (ML) for Predicting and Analyzing Ecological and ‘Presence Only’ Data: An Overview of Applications and a Good Outlook

  • Chapter
  • First Online:
Machine Learning for Ecology and Sustainable Natural Resource Management

Abstract

Machine learning (ML) has been established and used in science-based applications since the 1970s. The advent and maturation of mathematical algorithms and concepts like Neural Networks, Entropy, Classification and Regression Trees (CARTs), as well as the enhancement of computational power on personal computers worldwide have allowed for the development of many new applications and good approaches to analyzing highly complex systems and their data. Improvements to classical ML techniques, such as boosting, bagging and ensembles have been developed and combined with ML algorithms to yield powerful new tools for both data exploration and analysis (e.g. classification and prediction). Together with the increasing availability of online datasets (public and private), these tools have formed a new ‘science-culture’ that has yet to be fully embraced by the broader scientific community. ML can be used extremely well for data mining and classification, as well as to draw generalizable inference from powerful predictions (Breiman L, Stat Sci 16:199–231 (2001a); Breiman L, Mach Learn J 45:5–32 (2001b)). Thus, it offers a new scientific platform that can help overcome many of the earlier limitations associated with sparse field data, statistical model-fitting, p-values, parsimony (e.g., AIC), Bayesian and post-hoc studies. In contrast to conventional, statistical model-based data analysis, ML usually is non-parametric, so it does not require a priori assumptions about the structure and complexity of a model, nor is it based on just single linear algorithms. This eliminates potential biases and constraints being built into models that result from these assumptions and traditional singular algorithms. In contrast, ML techniques are classification tools of choice and convenience. They can decipher relevant relationships (‘extract the signal’) directly from virtually any data (e.g. messy, ‘gappy’, very large or rather small). Thus, ML can be seen as a new science philosophy with a newly available statistical approach that allows for faster, alternative and more encompassing results that more adequately generalize and reflect the very complex structure of ecological systems. Because ML is not only flexible but efficient, it is an ideal tool for application in the science-based wildlife and conservation management arenas as well as ecology, where decisions need to be robust but time-critical. Here we review some of the advantages and assumed application pitfalls of several key ML algorithms with published examples from the wildlife ecology and biodiversity disciplines using ‘location only’ (presence) data. We then provide a simulation case study to illustrate our key points, and evaluate how ML has the potential to change the way we use information to manage wildlife in times of a rapidly changing global environment and its ongoing crisis.

“…There is such a thing as being too late. This is no time for apathy or complacency…

Martin Luther King Jr

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Anderson D, Burnham K (2002) Avoiding pitfalls when using information-theoretic methods. J Wildl Manag 66:912–918

    Article  Google Scholar 

  • Anderson D, Burnham K, Thompson W (2000) Null hypothesis testing: problems, prevalence, and an alternative. J Wildl Manag 64:912–923

    Article  Google Scholar 

  • Anderson DR, Link WA, Johnson D, Burnham KP (2001) Suggestions for presenting the results of data analysis. USGS Northern Prairie Wildlife Research Center. Paper 227. https://digitalcommons.unl.edu/usgsnpwrc/227

  • Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260

    Article  Google Scholar 

  • Araujo M, New B (2007) Ensemble forecasting of species distributions. Trends Ecol Evol 22:42–47

    Article  PubMed  Google Scholar 

  • Arnold TW (2010) Uninformative parameters and model selection using Akaike’s information criterion. J Wildl Manag 74:1175–1178

    Article  Google Scholar 

  • Azoulay P, Fons-Rosen C, Zivin JSG (2015) Does science advance one funeral at a time? National Bureau of Economic Research Working Paper Series. No. 21788. http://www.nber.org/papers/w21788

  • Baldwin RA (2009) Use of maximum entropy modeling in wildlife research. Entropy 11:854–866. https://doi.org/10.3390/e11040854

    Article  Google Scholar 

  • Betts MG, Ganio L, Huso M, Som N, Huettmann F, Bowman J, Wintle BW (2009) Comment on “Methods to account for spatial autocorrelation in the analysis of species distributional data: a review”. Ecography 32:374–378

    Article  Google Scholar 

  • Bluhm B, Watts D, Huettmann F (2010) Free database availability, metadata and the internet: an example of two high latitude components of the census of marine life. In: Cushman SA, Huettmann F (eds) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo, pp 233–244

    Chapter  Google Scholar 

  • Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen J, Stevens MHH, White J-SS (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends EcolEvol 24:127–135

    Article  Google Scholar 

  • Booms T, Huettmann F, Schempf P (2009) Gyrfalcon nest distribution in Alaska based on a predictive GIS model. Pol Biol 33:1602–1612

    Google Scholar 

  • Booms T, Lindgren M, Huettmann F (2011) Linking Alaska's predicted climate, Gyrfalcon, and ptarmigan distributions in space and time: a unique 200-year perspective. In: Watson RT, Cade TJ, Fuller M, Hunt G, Potapov E (eds) Gyrfalcons and ptarmigan in a changing world, vol I. The Peregrine Fund, Boise, pp 177–190

    Google Scholar 

  • Boyce MS, Vernier PR, Nielsen SE, Schmiegelow FKA (2002) Evaluating resource selection functions. Ecol Model 157:281–300

    Article  Google Scholar 

  • Braun CE (ed) (2005) Techniques for wildlife investigations and management. The Wildlife Society (TWS), Bethesda

    Google Scholar 

  • Breiman L (2001a) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breiman L (2001b) Random forests. Mach Learn J 45:5–32

    Article  Google Scholar 

  • Brewer MJ, Butler A, Cooksley SL (2016) The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity. Meth Ecol Evol 7:679–692

    Article  Google Scholar 

  • Bruijning M, Visser MD, Hallmann CA, Jongejans E (2018) Trackdem: automated particle tracking to obtain population counts and size distributions from videos in R. Meth Ecol Evol 9:965–973. https://doi.org/10.1111/2041-210X.12975

    Article  Google Scholar 

  • Buechley ER, Şekercioğlu ÇH (2016) The avian scavenger crisis: looming extinctions, trophic cascades, and loss of critical ecosystem functions. Biol Conserv 198:220–228

    Article  Google Scholar 

  • Buisson L, Thuiller W, Casajus N, Sovan L, Grenouillet G (2009) Uncertainty in ensemble forecasting of species distribution. Glob Chang Biol 16:1145–1157. https://doi.org/10.1111/j.1365-2486.2009.02000.x

    Article  Google Scholar 

  • Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Van Eerdewegh P (2005) Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28:171–182

    Article  PubMed  Google Scholar 

  • Burnham K, Anderson D (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York

    Google Scholar 

  • Buchanan GM, Lachmann L, Tegetmeyer C, Oppel S, Nelson A, Flade M (2011) Identifying the potential wintering sites of the globally threatened Aquatic Warbler Acrocephalus paludicola using remote sensing, Ostrich 82:2, 81–85. https://doi.org/10.2989/00306525.2011.603461

    Article  Google Scholar 

  • Buston PM, Elith J (2011) Determinants of reproductive success in dominant pairs of clownfish: a boosted regression tree analysis. J Anim Ecol 80:528–538

    Article  PubMed  Google Scholar 

  • Clemen RT (1989) Combining forecasts: a review and annotated bibliography. Int J Forecast 5:559–583

    Article  Google Scholar 

  • Craig E, Huettmann F (2008) Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. In: Wang H-f (ed) Intelligent data analysis: developing new methodologies through pattern discovery and recovery. IGI Global, Hershey, pp 65–84

    Google Scholar 

  • Cooper GF, Aliferis CF, Ambrosino R, Aronis J, Buchanan BG, Caruana R, Fine MJ, Glymour C, Gordon G, Hanusa BH et al (1997) An evaluation of machine-learning methods for predicting pneumonia mortality. Artif Intell Med 9:107–138

    Article  CAS  PubMed  Google Scholar 

  • Crookston NL, Finley AO (2008) yaImpute: an R package for kNN imputation. J Stat Softw 23:1–14

    Article  Google Scholar 

  • Cushman S, Huettmann F (eds) (2010) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo

    Google Scholar 

  • Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792

    Article  PubMed  Google Scholar 

  • Czech B (2000) Shoveling fuel for a runaway train: errant economists, shameful spenders, and a plan to stop them all. University of California Press, Berkeley

    Google Scholar 

  • Daly H (1997) Beyond growth: the economics of sustainable development. Beacon Press, Boston

    Google Scholar 

  • Dhar V (1998) Data mining in finance: using counterfactuals to generate knowledge from organizational information systems. Inf Syst 23:423–437

    Article  Google Scholar 

  • De’ath G, Fabricius K (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192. https://doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2

    Article  Google Scholar 

  • De’ath G (2002) Multivariate regression trees: a new technique for modeling species–environment relationships. Ecology 83:1105–1117. https://doi.org/10.1890/0012-9658(2002)083[1105:MRTANT]2.0.CO;2

    Article  Google Scholar 

  • De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251

    Article  PubMed  Google Scholar 

  • Di Minin E, Fink C, Tenkanen H, Hiippala T (2018) Machine learning for tracking illegal wildlife trade on social media. Nat Ecol Evol 2:406–407. https://doi.org/10.1038/s41559-018-0466-x

    Article  PubMed  Google Scholar 

  • Dormann CF, McPherson JM, Araújo MB, Bivand R, Bolliger J, Carl G, Davies RG, Hirzel A, Jetz W, Kissling WD (2007) Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30:609–628

    Article  Google Scholar 

  • Drew CA, Yo W, Huettmann F (eds) (2011) Predictive modeling in landscape ecology. Springer, New York

    Google Scholar 

  • Edrén SMC, Wisz MS, Teilmann J, Dietz R, Söderkvist J (2010) Modelling spatial patterns in harbour porpoise satellite telemetry data using maximum entropy. Ecography 33:698–708

    Article  Google Scholar 

  • Elith J, Graham C, NCEAS working group (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151

    Article  Google Scholar 

  • Elith J, Ferrier S, Huettmann F, Leathwick J (2005) The evaluation strip: a new and robust method for plotting predicted responses from species distribution models. Ecol Model 186:280–289

    Article  Google Scholar 

  • Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x

    Article  CAS  PubMed  Google Scholar 

  • Elith J, Leathwick JR (2009) Species distribution models: ecological explanation and prediction across space and time. Ann Rev Ecol Evol Syst 40:677–697

    Article  Google Scholar 

  • Elith J, Phillips SJ, Hastie T, Dudík M, En Chee Y, Yates CCJ (2011) A statistical explanation of MaxEnt for ecologists. Div Distrib 17:43–57

    Article  Google Scholar 

  • Ellis N, Smith SJ, Pitcher JR (2012) Gradient forests: calculating importance gradients on physical predictors. Ecology 93(1):156–168. http://www.esajournals.org/doi/abs/10.1890/0012-9658(2002)083%5B1105:MRTANT%5D2.0.CO%3B2

    Article  PubMed  Google Scholar 

  • Evans J, Murphy M, Cushman S, Holden Z (2011) Modeling tree distribution and change using random forests. In: Drew CA, Wiersma Y, Huettmann F (eds) Predictive wildlife and habitat modeling in landscape ecology. Springer Publishers, New York

    Google Scholar 

  • Fox CH, Huettmann F, Harvey GKA, Morgan KH, Robinson J, Williams R, Paquet PC (2017) Predictions from machine learning ensembles: marine bird distribution and density on Canada’s Pacific coast. Mar Ecol Prog Ser 566:199–216

    Article  Google Scholar 

  • Jones-Farrand DT, Fearer TM, Thogmartin WE, Thompson FR 3rd, Nelson MD, Tirpak JM (2011) Comparison of statistical and theoretical habitat models for conservation planning: the benefit of ensemble prediction. Ecol Appl 21:2269–2282

    Article  PubMed  Google Scholar 

  • Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49

    Article  Google Scholar 

  • Fernandez-Delgado M, Cernades E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181

    Google Scholar 

  • Fielding AH (1999) Machine learning methods for ecological applications. Springer, New York

    Book  Google Scholar 

  • Fink D, Hochachka WM, Zuckerberg B, Winkle DW, Shaby B, Munson MA, Hooker G, Riedewald G, Sheldon D, Kelling S (2010) Spatiotemporal exploratory models for broad-scale survey data. Ecol Appl 20:2131–2147

    Article  PubMed  Google Scholar 

  • Fortin M-J, Dale MRT, Bertazzon S (2010) Spatial analysis of wildlife distribution and disease spread. In: Huettmann F, Cushman S (eds) Spatial complexity, informatics, and wildlife conservation. Springer, Tokyo, pp 255–273

    Chapter  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comp Stat Data Anal 38:367–378

    Article  Google Scholar 

  • Galindo J, Tamayo P (2000) Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput Econ 15:107–143

    Article  Google Scholar 

  • Galipaud M, Gillingham MAF, David M, Dechaume-Moncharmont F-X (2014) Ecologists overestimate the importance of predictor variables in model averaging: a plea for cautious interpretations. Methods Ecol Evol 5:983–991

    Article  Google Scholar 

  • Garton EO, Ratti JR, Giudice JH (2005) Research and experimental design. In: Braun CE (ed) Techniques for wildlife investigations and management. The Wildlife Society, Bethesda, pp 43–71

    Google Scholar 

  • Gillies CS, Hebblewhite M, Nielsen SE, Krawchuk M, Aldridge CL, Frair JL, Saher DJ, Stevens CE, Jerde CL (2006) Application of random effects to the study of resource selection by animals. J Anim Ecol 75:887–898

    Article  PubMed  Google Scholar 

  • Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3:95–99

    Article  Google Scholar 

  • Guilford T, Meade J, Willis J, Phillips RA, Boyle D, Roberts S, Collett M, Freeman R, Perrins, C (2009) Migration and stopover in a small pelagic seabird, the Manx shearwater Puffinus puffinus: insights from machine learning. Proc R Soc Lond B Biol Sci: rspb 2008.1577

    Google Scholar 

  • Guthery FS (2008) Statistical ritual; versus knowledge accrual in wildlife science. J Wildl Manag 72:1872–1875

    Article  Google Scholar 

  • Guthery FS, Lusk JJ, Peterson MJ (2001) The fall of the null hypothesis: liabilities and opportunities. J Wildl Manag 65:379–384

    Article  Google Scholar 

  • Guthery FS, Brennan LA, Peterson MJ, Lusk LL (2005) Information theory in wildlife science: critique and viewpoint. J Wildl Manag 69:457–465

    Article  Google Scholar 

  • Han X, Huettmann F, Guo Y, Mi C, Wen L (2018) Conservation prioritization with machine learning predictions for the black-necked crane Grus nigricollis, a flagship species on the Tibetan Plateau for 2070. Glob Environ Chang. https://doi.org/10.1007/s10113-018-1336-4

    Article  Google Scholar 

  • Hardy SM, Lindgren M, Konakanchi H, Huettmann F (2011) Predicting the distribution and ecological niche of unexploited snow crab (Chionoecetesopilio) populations in Alaskan waters: a first open-access ensemble model. Integr Comp Biol 51:608–622. https://doi.org/10.1093/icb/icr102

    Article  PubMed  Google Scholar 

  • Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York

    Book  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Hastie T, Fithian W (2013) Inference from presence-only data; the ongoing controversy. Ecography 36:864–867

    Article  PubMed  PubMed Central  Google Scholar 

  • Hegel T, Cushman SA, Evans J, Huettmann F (2010) Chapter 16: Current state of the art for statistical modelling of species distributions. In: Cushman S, Huettmann F (eds) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo, pp 273–312

    Chapter  Google Scholar 

  • Hernandez PA, Graham CH, Master LL, Albert D (2006) The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773–785

    Article  Google Scholar 

  • Herrick KA, Huettmann F, Lindgren MA (2014) A global model of avian influenza prediction in wild birds: the importance of northern regions. Vet Res 44:42. https://doi.org/10.1186/1297-9716-44-42.

    Article  Google Scholar 

  • Hervías S, Henriques A, Oliveira N, Pipa T, Cowen H, Ramos JA, Nogales M, Geraldes P, Silva C, de Ruiz Ybáñez R, Oppel S (2013) Studying the effects of multiple invasive mammals on Cory’s shearwater nest survival. Biol Invasions 15:143–155

    Article  Google Scholar 

  • Hijmans RJ (2012) Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93:679–688

    Article  PubMed  Google Scholar 

  • Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton, p 330

    Google Scholar 

  • Hochachka WE, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data-mining discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437. https://doi.org/10.2193/2006-503

    Article  Google Scholar 

  • Hochachka WM, Fink D, Hutchinson RA, Sheldon D, Wong W-K, Kelling S (2012) Data-intensive science applied to broad-scale citizen science. Trends Ecol Evol 27:130–137

    Article  PubMed  Google Scholar 

  • Hothorn T, Hornik K, Zeileis K (2006) Party: a laboratory for recursive part(y)itioning. Available at: http://CRAN.R-project.org/. Accessed 21 Dec 2008

  • Hothorn T, Müller J, Schröder B, Kneib T, Brandl R (2011) Decomposing environmental, spatial, and spatiotemporal components of species distributions. Ecol Monogr 81:329–347

    Article  Google Scholar 

  • Hsieh WW (2009) Machine learning methods in the environmental sciences. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Humphries G (2010) ‘The Ecological Niche of Storm-Petrels in the North Pacific and a Global Model of Dimethylsulfide DMS’. Unpublished M.Sc. thesis. University of Alaska-Fairbanks USA

    Google Scholar 

  • Huettmann F (2005) Databases and science-based management in the context of wildlife and habitat: towards a certified ISO standard for objective decision-making for the global community by using the internet. J Wildl Manag 69:466–472

    Article  Google Scholar 

  • Huettmann F (2007a) Constraints, suggested solutions and an outlook towards a new digital culture for the oceans and beyond: experiences from five predictive GIS models that contribute to global management, conservation and study of marine wildlife and habitat. In: VandenBerghe E et al (eds) Proceedings of ‘ocean biodiversity informatics’: an international conference on marine biodiversity data management Hamburg, Germany, 29 November–1 December, 2004. IOC Workshop Report, 202, VLIZ Special Publication 37, pp. 49–61. www.vliz.be/vmdcdata/imis2/imis.php?module=ref&refid=107201

  • Huettmann F (2007b) Modern adaptive management: adding digital opportunities towards a sustainable world with new values. Forum Public Policy 3:337–342

    Google Scholar 

  • Huettmann F (2011) Serving the Global Village through public data sharing as a mandatory paradigm for seabird biologists and managers: why, what, how, and a call for an efficient action plan. Open Ornith J 4:1–11

    Article  CAS  Google Scholar 

  • Huettmann F (2012) Protection of the three poles. Springer, Tokyo

    Book  Google Scholar 

  • Huettmann F, Gottschalk T (2011) Simplicity, model fit, complexity and uncertainty in spatial prediction models applied over time: we are quite sure, aren’t we? In: Drew CA, Wiersma YF, Huettmann F (eds) Predictive species and habitat modeling in landscape ecology, pp 189–208. https://doi.org/10.1007/978-1-4419-7390-0_10

    Chapter  Google Scholar 

  • Huettmann F, Artukhin Y, Gilg O, Humphries G (2011) Predictions of 27 Arctic pelagic seabird distributions using public environmental variables, assessed with colony data: a first digital IPY and GBIF open access synthesis platform. Mar Biodivers 41:141–179. https://doi.org/10.1007/s12526-011-0083-2

    Article  Google Scholar 

  • Hutchinson RA, Liu L-P, Dietterich TG (2011) Incorporating boosted regression trees into ecological latent variable models. In: 25th AAAI conference on artificial intelligence. Association for the Advancement of Artificial Intelligence, San Francisco

    Google Scholar 

  • Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537

    Article  Google Scholar 

  • Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860

    Article  Google Scholar 

  • Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S (2016) Combining satellite imagery and machine learning to predict poverty. Science 353:790–794

    Article  CAS  PubMed  Google Scholar 

  • Jiao S, Huettmann F, Guo Y, Li Y, Ouyang Y (2016) Advanced long-term bird banding and climate data mining in spring confirm passerine population declines for the Northeast Chinese-Russian flyway. Glob Planet Chang. https://doi.org/10.1016/j.gloplacha.2016.06.015

    Article  Google Scholar 

  • Johnson DH (1999) The insignificance of statistical significance testing. J Wildl Manag 63:763–772

    Article  Google Scholar 

  • Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Eco Inform 5:441–450

    Article  Google Scholar 

  • Kandel K, Huettmann F, Suwal MK, Regmi RG, Nijman V, Nekaris KAI, Lama ST, Thapa A, Sharma HP, Subedi TR (2015) Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region. Biol Conserv 181:150–161

    Article  Google Scholar 

  • Kelling S, Hochachka WM, Fink D, Riedewald M, Caruana R, Ballard G, Hooker G (2009) Data-intensive science: a new paradigm for biodiversity studies. Bioscience 59:613–620 www.jstor.org/stable/10.1525/bio.2009.59.7.12

    Article  Google Scholar 

  • Kéry M, Schaub M (2012) Bayesian population analysis using WinBUGS. Academic Press, Oxford

    Google Scholar 

  • Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109

    Article  CAS  PubMed  Google Scholar 

  • Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215

    Article  Google Scholar 

  • Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range-shifts: model differences and model reliability. Glob Chang Biol 12:1568–1584

    Article  Google Scholar 

  • Lawler JJ, Yo W, Huettmann F (2011) Chapter 5: Designing predictive models for increased utility: using species distribution models for conservation planning, forecasting, and risk assessment. In: Drew CA, Wiersma Y, Huettmann F (eds) Predictive modeling in landscape ecology. Springer, New York, pp 271–290

    Chapter  Google Scholar 

  • Lee KC, Han I, Kwon Y (1996) Hybrid neural network models for bankruptcy predictions. Decis Support Syst 18:63–72

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomforests. R News 2(3):18

    Google Scholar 

  • Louzao M, Aumont O, Hothorn T, Wiegand T, Weimerskirch H (2013) Foraging in a changing environment: habitat shifts of an oceanic predator over the last half century. Ecography 36:057–067. https://doi.org/10.1111/j.1600-0587.2012.07587.x

    Article  Google Scholar 

  • Mace G, Cramer W, Diaz S, Faith DP, Larigauderie A, Le Prestre P, Palmer M, Perrings C, Scholes RJ, Walpole M, Walter BA, Watson JEM, Mooney HA (2010) Biodiversity targets after 2010. Environ Sustain 2:3–8

    Google Scholar 

  • Mac Nally R (2000) Regression and model-building in conservation biology, biogeography and ecology: the distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models. Biodivers Conserv 6:655–671

    Article  Google Scholar 

  • Magness DR, Huettmann F, Morton JM (2008) Using random forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. In: Smolinski TG, Milanova MG, Hassanien AE (eds) Applications of computational intelligence in biology: current trends and open problems, studies in computational intelligence, vol 122. Springer, Berlin/Heidelberg, pp 209–229

    Chapter  Google Scholar 

  • Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931

    Article  Google Scholar 

  • Manly BF, McDonald L, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies. Springer, Dordrecht

    Google Scholar 

  • McCullagh P, Nelder J (1989) Generalized linear models. Chapman and Hall, London

    Book  Google Scholar 

  • Mi C, Huettmann F, Guo Y, Han X, Wen L (2017) Why to choose random forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ. https://doi.org/10.7717/peerj.2849

    Article  PubMed  PubMed Central  Google Scholar 

  • Miller K, Huettmann F, Norcross B, Lorenz M (2014) Multivariate random forest models of estuarine-associated fish and invertebrate communities. MEPS 500:159–174

    Article  Google Scholar 

  • Miller K, Huettmann F, Norcross B (2015) Efficient spatial models for predicting the occurrence of subarctic estuarine-associated fishes: implications for management. Fish Manag Ecol 22:501–517

    Article  Google Scholar 

  • Mogie M (2004) In support of null hypothesis significance testing. Proc R Soc Lond B 271:S82–S84

    Article  Google Scholar 

  • Mullet TC, Gage SH, Morton JM, Huettmann F (2016) Temporal and spatial variation of a winter soundscape in Alaska. Landsc Ecol 31:1117–1137

    Article  Google Scholar 

  • Murphy AH, Winkler RL (1992) Diagnostic verification of probability forecasts. Int J Forecast 7:435–455

    Article  Google Scholar 

  • Murphy MA, Evans JS, Storfer A (2010) Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252–261

    Article  PubMed  Google Scholar 

  • Murphy K, Huettmann F, Fresco N, Morton JM (2012a) Connecting Alaska landscapes into the future. U.S. Fish and Wildlife Service, And the University of Alaska. Prepared by the Scenarios Network for Arctic Planning (SNAP). www.snap.uaf.edu/attachments/SNAP-connectivity-2010-complete.pdf

  • Murphy K, Reynolds J, Whitten E, Fresco N, Lindgren M, Huettmann F (2012b) Predicting future potential climate-biomes for the Yukon, northwest territories, and Alaska: a climate-linked cluster analysis approach to analyzing possible ecological refugia and areas of greatest change. Prepared by the Scenarios Network for Arctic Planning (SNAP) and the EWHALE lab, University of Alaska-Fairbanks on behalf of The Nature Conservancy Canada, Government Northwest Territories. www.snap.uaf.edu/attachments/Cliomes-FINAL.pdf

  • Næss A (1997) Ecology, community and lifestyle: outline of an ecosophy (trans: D. Rothenberg). Cambridge University Press, Cambridge

    Google Scholar 

  • Ohse B, Huettmann F, Ickert-Bond S, Juday G (2009) Modeling the distribution of white spruce (Picea glauca) for Alaska with high accuracy: an open access role-model for predicting tree species in last remaining wilderness areas. Pol Biol 32:1717–1724

    Article  Google Scholar 

  • Olden JD, Lawler JJ, Poff NJ (2008) Machine learning without tears: a practical primer for ecologists. Q Rev Biol 83:171–193

    Article  PubMed  Google Scholar 

  • Oppel S, Huettmann F (2010) Chapter 8: Using a random forests moedel and public data to predict the distribution of prey for marine wildlife management. In: Cushman S, Huettmann F (eds) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo, pp 151–164

    Chapter  Google Scholar 

  • Oppel S, Pain DJ, Lindsell J, Lachmann L, Diop I, Tegetmeyer C, Donald PF, Anderson G, Bowden CGR, Tanneberger F, Flade M (2011) High variation reduces the value of feather stable isotope ratios in identifying new wintering areas for aquatic warblers in West Africa. J Avian Biol 42:342–354

    Article  Google Scholar 

  • Oppel S, Strobl C, Huettmann F (2009a) Alternative methods to quantify variable importance in ecology. Technical Report Number 65, Department of Statistics, University of Munich, Germany

    Google Scholar 

  • Oppel S, Powell AN, Dickson DL (2009b) Using an algorithmic model toreveal individually variable movement decisions in a wintering sea duck. J Anim Ecol 78:524–531

    Article  PubMed  Google Scholar 

  • Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell A, Miller PI, Louzao M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104

    Article  Google Scholar 

  • Oppel S et al (2017) Landscape factors affecting territory occupancy and breeding success of Egyptian Vultures on the Balkan Peninsula. J Ornithol 158:443–457

    Article  Google Scholar 

  • Ott R (2005) Sound truth & corporate myth: the legacy of the Exxon Valdez oil spill. Dragonfly Sisters Press, Cordova

    Google Scholar 

  • Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133:225–245

    Article  Google Scholar 

  • Phillips SJ, Dudík M, Schapire RE (2004) A maximum entropy approach to species distribution modeling. In: Proceedings of the 21st international conference on machine learning. ACM Press, New York, pp 655–662

    Google Scholar 

  • Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190:231–259

    Article  Google Scholar 

  • Phillips SJ, Elith J (2013) On estimating probability of presence from use–availability or presence–background data. Ecology 94:1409–1419

    Article  PubMed  Google Scholar 

  • Pittmann S, Huettmann F (2006) Chapter 4: Seabird distribution and diversity. An ecological characterization of the Stellwagen Bank national marine sanctuary region: oceanographic, biogeographic, and contaminants assessment. In: Battista T, Clark R, Pittmann S (eds) Prepared by NCCOS’s Biogeography Team in cooperation with the National Marine Sanctuary Program. Silver Spring, MD. NOAA Technical Memorandum NCCOS 45

    Google Scholar 

  • Popp J, Neubauer D, Huettmann F (2007) Using TreeNet for identifying management thresholds of mantled howling monkeys’ habitat preferences on Ometepe Island, Nicaragua, on a tree and home range scale. J Med Biol Sci 1(2):1–14 www.scientificjournals.org/journals2007/articles/1096.pdf

    Google Scholar 

  • Prasad A, Iverson L, Matthews S, Peters M (2009) Atlases of tree and bird species habitats for current and future climates. Ecol Restor 27:260–263

    Article  Google Scholar 

  • Quinn G, Keough Q (2004) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

    Google Scholar 

  • Core Team R. (2016) R: a language and environment for statistical computing. R foundation for statistical computing. www.r-project.org

  • Recknagel F (2001) Applications of machine learning to ecological modelling. Ecol Model 146:303–310

    Article  Google Scholar 

  • Reich Y, Barai SV (1999) Evaluating machine learning models for engineering problems. Artif Intell Eng 13:257–272

    Article  Google Scholar 

  • Ribiero Jr., P J., Diggle PJ (2013) Package ‘geoR’. www.leg.ufpr.br/geoR

  • Ritter J (2007) Species distribution models for Denali national park and preserve, Alaska. Unpublished M.Sc. thesis, University of Alaska-Fairbanks (UAF), Alaska

    Google Scholar 

  • Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: European conference on computer vision. Springer, pp 430–443

    Google Scholar 

  • Royle JA, Chandler RB, Yackulic C, J D N (2012) Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods Ecol Evol 3:545–554

    Article  Google Scholar 

  • Schaub M, Kery M (2012) Combining information in hierarchical models improves inferences in population ecology and demographic population analyses. Anim Conserv 15:125–126. https://doi.org/10.1111/j.1469-1795.2012.00531.x

    Article  Google Scholar 

  • Schmitt S, Pouteau R, Justeau D, Boissieu F, Birnbaum P (2017) ssdm: An r package to predict distribution of species richness and composition based on stacked species distribution models. Methods Ecol Evol 8:1795–1803. https://doi.org/10.1111/2041-210X.12841

    Article  Google Scholar 

  • Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8:68–74

    Article  CAS  PubMed  Google Scholar 

  • de Smith MJ, Goodchild MF, Longley PA (2007) Geospatial analysis: a comprehensive guide to principles, techniques, and software tools. Troubadour Publishing, Ltd., Leicester

    Google Scholar 

  • Stephens PA, Buskirk SW, Hayward GD, Martinez del Rio C (2007) A call for statistical pluralism answered. J Appl Ecol 44:461–463. https://doi.org/10.1111/j.1365–2664.2007.01302.x

  • Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forests variable importance measures: illustrations, sources and a solution. Research Report Series/Department of Statistics and Mathematics, 40. Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna

    Google Scholar 

  • Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. Bioinformatics 9:307. https://doi.org/10.1186/1471-2105-9-307

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Strobl C, Malley J, Tutz G (2009) An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods 14:323–348. https://doi.org/10.1037/a0016973

    Article  PubMed  PubMed Central  Google Scholar 

  • Strogatz SH (2001) Exploring complex networks. Nature 410:268–276

    Article  CAS  PubMed  Google Scholar 

  • Thuiller WB, Lafourcade R, Engler J, Araujo MB (2009) BIOMOD a platform for ensemble forecasting of species distributions. Ecography 32:369–373. https://doi.org/10.1111/j.1600-0587.2008.05742.x

    Article  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistical analysis, 4th edn. Springer, New York

    Google Scholar 

  • Wei C et al (15 co-authors) (2011) A global analysis of marine benthos biomass using random forests. Public Libr Sci 5:e15323

    Google Scholar 

  • Weinstein BG (2018) A computer vision for animal ecology. J Anim Ecol 87:533–545. https://doi.org/10.1111/1365-2656.12780

    Article  PubMed  Google Scholar 

  • Wickert C, Wallschlaeger D, Huettmann F (2010) Spatially predictive habitat modeling of a white stork (Ciconiaciconia) population in former East Prussia in 1939. Open Ornithol 3:1–12

    Article  Google Scholar 

  • Wilson EO (1998) Consilience: the unity of knowledge. Alfred A Knopf, Inc., New York

    Google Scholar 

  • Wisz MS, Hijmans RJ, Peterson AT, Graham CT, Guisan A, NCEAS Predicting Species Distributions Working Group (2008) Effects ofsample size on the performance of species distribution models. Divers Distrib 14:763–773

    Article  Google Scholar 

  • Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RB (2006) Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189

    Article  PubMed  Google Scholar 

  • Yackulic CB, Chandler R, Zipkin EF, Royle JA, Nichols JD, Campbell Grant EH, Veran S (2012) Presence-only modeling using MAXENT: when can we trust the inferences? Methods Ecol Evol 4:236–243

    Article  Google Scholar 

  • Yen P, Huettmann F, Cooke F (2004) Modeling abundance and distribution of Marbled Murrelets (Brachyramphusmarmoratus) using GIS, marine data and advanced multivariate statistics. Ecol Model 171:395–413

    Article  Google Scholar 

  • Young B (2012) Diversity in the boreal forest of Alaska: distribution and impacts on ecosystem services. Unpublished PhD thesis. University of Alaska-Fairbanks (UAF), Fairbanks

    Google Scholar 

  • Zar JH (2009) Biostatistical analysis, 5th edn. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Zuckerberg B, Huettmann F, Frair J (2011) Data management as a scientific foundation for reliable predictive modeling. In: Drew A, Wiersma Y, Huettmann F (eds) Predictive modeling in landscape ecology. Springer, New York

    Google Scholar 

Download references

Acknowledgements

This is a shared MS summarizing work efforts from over 2 decades on international projects. FH is grateful to all individuals who were open-minded enough to develop and try machine learning algorithms and to support them. The late R. O’Connor and A.W. Diamond are thanked for introducing us to CARTs early on. J.Liu kindly helped to start a co-authored model session at IALE-U.S. in 2007 on such subjects, published with Springer. Salford Systems Ltd., D.Steinberg and his great team, are specifically thanked for the long collaboration, for ideas and for helpful support using their thoughts and their software in many ways. Most EWHALE students heroically supported machine learning projects, either helping to evaluate the paradigms of statistics, or putting themselves out there for the debate and advancement of conservation science and management with machine learning; finding new knowledge and information. FH is further grateful to S. Linke, L. Strecker, and to the ArcOD project (B. Bluhm), Alaska GAP project (T. Gotthard), SNAP (N. Fresco et al.), Antarctic Biogeography Atlas project (B. Danis, C. Broyer, Philiippi et al.), Red Panda project (G. Regmi, K. Kamal, MS et al), the Chinese Crane and Bustard projects (G. Yumin and students like H. Juang, M. Chunrong, P. Guopanlian), J. Morton, S. Cushman, J. Evans, T. Hegel, J. Ritter, D. Watts, A. Drew, Y. Wiersma, W. Thogmartin, T. Gottschalk, B. Raymond, B. Walther, I. Presse and H. Berrios for general support, publications, replies, and advice regarding machine learning implementations and applications. This is EWHALE publication # 125.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falk Huettmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huettmann, F. et al. (2018). Use of Machine Learning (ML) for Predicting and Analyzing Ecological and ‘Presence Only’ Data: An Overview of Applications and a Good Outlook. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_2

Download citation

Publish with us

Policies and ethics