Abstract
Although inference is a critical component in ecological modeling, the balance between accurate predictions and inference is the ultimate goal in ecological studies (Peters 1991; De’ath 2007). Practical applications of ecology in conservation planning, ecosystem assessment, and bio-diversity are highly dependent on very accurate spatial predictions of ecological process and spatial patterns (Millar et al. 2007). However, the complex nature of ecological systems hinders our ability to generate accurate models using the traditional frequentist data model (Breiman 2001a; Austin 2007). Well-defined issues in ecological modeling, such as complex non-linear interactions, spatial autocorrelation, high-dimensionality, non-stationary, historic signal, anisotropy, and scale contribute to problems that the frequentist data model has difficulty addressing (Olden et al. 2008). When one critically evaluates data used in ecological models, rarely do the data meet assumptions of independence, homoscedasticity, and multivariate normality (Breiman 2001a). This has caused constant reevaluation of modeling approaches and the effects of reoccurring issues such as spatial autocorrelation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allouche O, Steinitz O, Rotem D, Rosenfeld A, Kadmon R (2008) Incorporating distance constraints into species distribution models. J Appl Ecol 45:599–609.
Austin M (2007) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Modell 200:1–19.
Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. Wadsworth, London.
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140.
Breiman L (2001a) Statistical modeling: the two cultures. Stat Sci 16:199–231.
Breiman L (2001b) Random forests. Mach Learn 45:5–32.
Bunn AG, Graumlich LJ, Urban DL (2005) Trends in twentieth-century tree growth at high elevations in the Sierra Nevada and White Mountains, USA. Holocene 15:481–488.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357.
Chefaoui RM, Lobo JM (2007) Assessing the conservation status of an Iberian moth using pseudo-absences. J Wildl Manage 71:2507–2516.
Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666. Statistics Department, University of California, Berkeley.
Chesson PL (1981) Models for spatially distributed populations: the effect of within-patch variability. Theor Popul Biol 19:288–325.
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46.
Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70:213–20.
Cook TD, Campbell DT (1979) Quasi-experimentation: design and analysis issues for field settings. Houghton Mifflin, Boston.
Costa GC, Wolfe C, Shepard DB, Caldwell JP, Vitt LJ (2008) Detecting the influence of climate variables on species distribution: a test using GIS niche-based models along a steep longitudinal environmental gradient. J Biogeogr 35:637–646.
Cox TF, Cox MAA (1994) Multidimensional scaling. Chapman and Hall, Boca Raton.
Cressie N (1996) Change of support and the modifiable areal unit problem. Geogr Syst 3:159–180.
Cressie N, Calder CA, Clarke JS, Ver Hoef JM, Wikle CK (2009) Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. Ecol Appl 19:553–570.
Crookston NL, Finley AO (2008) yaImpute: an R package for kNN imputation. J Stat Softw 23:1–16.
Curtis JT, McIntosh RP (1951) An upland forest continuum in the prairie-forest border region of Wisconsin. Ecology 32:476–496.
Cushman SA, McKelvey K, Flather C, McGarigal K (2008) Do forest community types provide a sufficient basis to evaluate biological diversity? Front Ecol Environ 6:13–17.
Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler J (2007) Random forests for classification in ecology. Ecology 88:2783–2792.
De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251.
De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192.
Díaz-Uriarte R, Alvarez de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3.
Dungan JL, Perry JN, Dale MRT, Legendre P, Citron-Pousty S, Fortin MJ, Jakomulska A, Miriti M, Rosenberg MS (2002) A balanced view of scale in spatial statistical analysis. Ecography 25:626–240.
Evans JS, Cushman SA (2009) Gradient modeling of conifer species using random forests. Landsc Ecol 24:673–683.
Falkowski MJ, Evans JS, Martinuzzi S, Gessler PE, Hudak AT (2009) Characterizing forest succession with lidar data: an evaluation for the inland Northwest, USA. Remote Sens Environ 113:946–956.
Fawcett T (2006). An introduction to ROC analysis. Pattern Recognit Lett 27:861–874.
Finegan B (1984) Forest succession. Nature 312:109–114.
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Machine learning: proceedings of the thirteenth international conference. Morgan Kaufmann, San Francisco.
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232.
Fu P, Rich PM (1999) Design and implementation of the Solar Analyst: an ArcView extension for modeling solar radiation at landscape scales. In: Proceedings of the 19th annual ESRI User Conference, San Diego.
Gleason HA (1926) The individualistic concept of the plant association. Bull Torrey Bot Club 53:7–26.
Glenn RH, Collins SL (1992) Effects of scale and disturbance on rates of immigration and extinction of species in prairies. Oikos 63:273–280.
Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Modell 135:147–186.
Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163.
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edition. Springer, New York.
Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427.
Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. For Ecol Manage 254:390–406.
Jiménez-Valverde A, Lobo JM (2006) The ghost of unbalanced species distribution data in geographic model predictions. Divers Distrib 12:521–524.
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215.
Lawrence RL, Wood SD, Sheley RL (2006) Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sens Environ 100:356–362.
Legendre P, Legendre L (1998) Numerical ecology. Elsevier, Amsterdam.
Lele SR, Dennis B (2009) Bayesian methods for hierarchical models: are ecologists making a Faustian bargain? Ecol Appl 19:581–584.
Liaw A, Wiener M (2002) Classification and regression by Random Forest. R News 2:18–22.
Manel S, William HC, Ormerod SJ (2001) Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931.
McGarigal K, Cushman SA (2005) The gradient concept of landscape structure. In: Wiens J, Moss M (eds) Issues and perspectives in landscape ecology. Cambridge University Press, Cambridge.
McGarigal K, Tagil S, Cushman SA (2009) Surface metrics: an alternative to patch metrics for the quantification of landscape structure. Landsc Ecol 24:433–450.
McGuffie K, Henderson-Sellers A (1997) A climate modelling primer. John Wiley & Sons, Chichester.
McKenney DW, Pedlar JH, Lawrence K, Campbell K, Hutchinson MF (2007) Potential impacts of climate change on the distribution of North American trees. BioScience 57:939–948.
Millar CI, Stephenson NL, Stephens SL (2007) Climate change and forests of the future: managing in the face of uncertainty. Ecol Appl 17:2145–2151.
Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Modell 62:275–293.
Moore ID, Gessler P, Nielsen GA, Peterson GA (1993) Terrain attributes: estimation and scale effects. In Jakeman AJ, Beck MB, McAleer M (eds) Modelling change in environmental systems. John Wiley & Sons, Chichester.
Morrison D (2002). Multivariate statistical methods. 4th edition. McGraw-Hill series in probability & statistics. McGraw-Hill, New York.
Mouer MH, Riemann R (1999) Preserving spatial and attribute correlation in the interpolation of forest inventory data. In: Lowell K, Jaton A (eds) Spatial accuracy assessment: land information uncertainty in natural resources. Ann Arbor Press, Chelsea.
Murphy MA, Evans JS, Storfer AS (2010) Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252–261.
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83:171–193.
Park YS, Chon TS (2007) Biologically inspired machine learning implemented to ecological informatics. Ecol Modell 203:1–7.
Peters RH (1991) A critique for ecology. Cambridge University Press, Cambridge.
Peterson AT, Papes M, Soberón J (2008) Rethinking receiver operating characteristic analysis applications in ecological modelling. Ecol Modell 213:63–72.
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199.
R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
Randin CF, Engler R, Normand S, Zappa M, Zimmermann N, Pearman PB, Vittoz P, Thuller W, Guisan A (2009) Climate change and plant distribution: local models predict high-elevation persistence. Glob Chang Biol 15:1557–1569.
Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150.
Risser PG (1987) Landscape ecology: state of the art. In: Turner MG (ed) Landscape heterogeneity and disturbance. Springer-Verlag, New York.
Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357.
Rogan J, Franklin J, Stow D, Miller J, Woodcock C, Roberts D (2008) Mapping land-cover modification over large areas: a comparison of machine learning algorithms. Remote Sens Environ 112:2272–2283.
Runkle JR (1985) Disturbance regimes in temperature forests. In: Pickett STA, White PS (eds) The ecology of natural disturbance and patch dynamics. Academic Press, New York.
Simonoff JS (1998) Smoothing methods in statistics. Springer-Verlag, New York.
Stage A (1976) An expression for the effect of aspect, slope and habitat type on tree growth. For Sci 22:457–460.
Sutton CD (2005) Classification and regression trees, bagging, and boosting. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization, Volume 24. Elsevier, Amsterdam.
ter Braak CJF, Prentice IC (2004) A theory of gradient analysis. Adv Ecol Res 34:235–282.
Tilman D (1982) Resource competition and community structure. Princeton University Press, Princeton.
Whitaker RH (1967) Gradient analysis of vegetation. Biol Rev 42:207–264.
Whittaker RH, Niering WA (1975) Vegetation of the Santa Catalina mountains, Arizona. V. biomass, production and diversity along the elevation gradient. Ecology 56:771–790.
Wiens JA (1989) Spatial scaling in ecology. Funct Ecol 3:385–397.
Willis KJ, Bhagwat SA (2009) Biodiversity and climate change. Science 326:806–807.
Acknowledgments
Funding for this research was provided by the USDA Forest Service, Rocky Mountain Research Station and The Nature Conservancy. The authors would like to thank G. Rehfeldt, A. Hudak, N. Crookston, L. Iverson, and A. Cutler for valuable discussion on Random Forest and species distribution modeling and A. Prasad, J. Kiesecker and two anonymous reviewers for comments that strengthened this chapter. Additionally we would like to thank the editors for their patience and perseverance in seeing this book published.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+BUsiness Media, LLC
About this chapter
Cite this chapter
Evans, J.S., Murphy, M.A., Holden, Z.A., Cushman, S.A. (2011). Modeling Species Distribution and Change Using Random Forest. In: Drew, C., Wiersma, Y., Huettmann, F. (eds) Predictive Species and Habitat Modeling in Landscape Ecology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7390-0_8
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7390-0_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7389-4
Online ISBN: 978-1-4419-7390-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)