Abstract
This chapter considers a range of numerical techniques that lie outside the familiar statistical methods of linear regression, analysis of variance, and generalised linear models or data-analytical techniques such as ordination, clustering, and partitioning. The techniques outlined have developed as a result of the spectacular increase in computing power since the 1980s. The methods make fewer distributional assumptions than classical statistical methods and can be applied to more complicated estimators and to huge data-sets. They are part of the ever-increasing array of ‘statistical learning’ techniques (sensu Hastie, Tibshirani, Friedman J, The elements of statistical learning, 2nd edn. Springer, New York, 2011) that try to make sense of the data at hand, to detect major patterns and trends, to understand ‘what the data say’, and thus to learn from the data.
A range of tree-based and network-based techniques are presented. These are classification and regression trees, multivariate regression trees, bagged trees, random forests, boosted trees, multivariate adaptive regression splines, artificial neural networks, self-organising maps, Bayesian networks, and genetic algorithms. Principal curves and surfaces are also discussed as they relate to unsupervised self-organising maps. The chapter concludes with a discussion of current developments in shrinkage methods and variable selection in statistical modelling that can help in model selection and can minimise collinearity problems. These include principal components regression, ridge regression, the lasso, and the elastic net.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Pressure gradient between Los Angeles airport (LAX) and Daggert in mmHg.
References
Aalders I (2008) Modeling land-use decision behavior with Bayesian belief networks. Ecol Soc 13:16
Aho K, Weaver T, Regele S (2011) Identification and siting of native vegetation types on disturbed land: demonstration of statistical methods. Appl Veg Sci 14:277–290
Amsinck SL, Strzelczak A, Bjerring R, Landkildehus F, Lauridsen TL, Christoffersen K, Jeppesen E (2006) Lake depth rather than fish planktivory determines cladoceran community structure in Faroese lakes – evidence from contemporary data and sediments. Freshw Biol 51:2124–2142
Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York
Anderson RP, Lew D, Peterson AT (2003) Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Model 162:211–232
Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–139
Balshi MS, McGuire AD, Duffy P, Flannigan M, Walsh J, Melillo J (2009) Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biol 15:578–600
Banfield JD, Raftery AE (1992) Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. J Am Stat Assoc 87:7–16
Barrows TT, Juggins S (2005) Sea-surface temperatures around the Australian margin and Indian Ocean during the last Glacial Maximum. Quat Sci Rev 24:1017–1047
Barton AM, Nurse AM, Michaud K, Hardy SW (2011) Use of CART analysis to differentiate pollen of red pine (Pinus resinosa) and jack pine (P. banksiana) in New England. Quat Res 75:18–23
Belgrano A, Malmgren BA, Lindahl O (2001) Application of artificial neural networks (ANN) to primary production time-series data. J Plankton Res 23:651–658
Benito Garzón M, Blazek R, Neteler M, Sánchez de Dios R, Sainz Ollero H, Furlanello C (2006) Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol Model 197:383–393
Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2007) Predictive modelling of tree species distributions on the Iberian Peninsula during the Last Glacial Maximum and Mid-Holocene. Ecography 30:120–134
Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2008) Effects of climate change on the distribution of Iberian tree species. Appl Veg Sci 11:169–178
Birks HH, Mathewes RW (1978) Studies in the vegetational history of Scotland. V. Late Devensian and early Flandrian pollen and macrofossil stratigraphy at Abernethy Forest, Inverness-shire. New Phytol 80:455–484
Birks HJB (1995) Quantitative palaeoenvironmental reconstructions. In: Maddy D, Brew J (eds) Statistical modelling of quaternary science data, vol 5, Technical guide. Quaternary Research Association, Cambridge, pp 161–254
Birks HJB (2012a) Chapter 2 Overview of numerical methods in palaeolimnology. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) 2012. Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Birks HJB (2012a) Chapter 11 Stratigraphical data analysis. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Birks HJB, Gordon AD (1985) Numerical methods in Quaternary pollen analysis. Academic, London
Birks HJB, Jones VJ (2012) Chapter 3 Data-sets. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Birks HJB, Line JM, Juggins S, Stevenson AC, ter Braak CJF (1990) Diatoms and pH reconstruction. Philos Trans R Soc B 327:263–278
Bishop CM (1995) Neural networks for pattern recognition. Clarendon, Oxford
Bishop CM (2007) Pattern recognition and machine learner. Springer, Dordrecht
Bjerring R, Becares E, Declerck S et~al (2009) Subfossil Cladocera in relation to contemporary environmental variables in 54 pan-European lakes. Freshw Biol 54:2401–2417
Blaauw M, Heegaard E (2012) Chapter 12 Estimation of age-depth relationships. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Borggaard C, Thodberg HH (1992) Optimal minimal neural interpretation of spectra. Anal Chem 64:545–551
Bourg NA, McShea WJ, Gill DE (2005) Putting a CART before the search: successful habitat prediction for a rare forest herb. Ecology 86:2793–2804
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Brosse S, Guégan J-F, Tourenq J-N, Lek S (1999) The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a mesotrophic lake. Ecol Model 120:299–311
Brunelle A, Rehfeldt GE, Bentz B, Munson AS (2008) Holocene records of Dendroctonus bark beetles in high elevation pine forests of Idaho and Montana, USA. Ecol Manage 255:836–846
Burman P, Chow E, Nolan D (1994) A cross-validatory method for dependent data. Biometrika 81:351–358
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York
Cairns DM (2001) A comparison of methods for predicting vegetation type. Plant Ecol 156:3–18
Caley P, Kuhnert PM (2006) Application and evaluation of classification trees for screening unwanted plants. Austral Ecol 31:647–655
Carlisle DM, Wolock DM, Meador MR (2011) Alteration of streamflow magnitudes and potential ecological consequences: a multiregional assessment. Front Ecol Environ 9:264–270
Castelletti A, Soncini-Sessa R (2007a) Bayesian Networks and participatory modelling in water resource management. Environ Model Softw 22:1075–1088
Castelletti A, Soncini-Sessa R (2007b) Coupling real-time and control and socio-economic issues in participatory river basin planning. Environ Model Softw 22:1114–1128
Céréghino R, Giraudel JL, Compin A (2001) Spatial analysis of stream invertebrates distribution in the Adour-Garonne drainage basin (France), using Kohonen self-organizing maps. Ecol Model 146:167–180
Černá L, Chytrý M (2005) Supervised classification of plant communities with artificial neural networks. J Veg Sci 16:407–414
Chapman DS (2010) Weak climatic associations among British plant distributions. Global Ecol Biogeogr 19:831–841
Chapman DS, Purse BV (2011) Community versus single-species distribution models for British plants. J Biogeogr 38:1524–1535
Chapman DS, Bonn A, Kunin WE, Cornell SJ (2010) Random Forest characterization of upland vegetation and management burning from aerial imagery. J Biogeogr 37:37–46
Chatfield C (1993) Neural networks: forecasting breakthrough or passing fad? Int J Forecast 9:1–3
Chon T-S (2011) Self-organising maps applied to ecological sciences. Ecol Inform 6:50–61
Chytrý M, Jarošik V, Pyšek P, Hájek O, Knollová I, Tichý L, Danihelka J (2008) Separating habitat invasibility by alien plants from the actual level of invasion. Ecology 89:1541–1553
Copas JB (1983) Regression, prediction and shrinkage. J R Stat Soc Ser B 45:311–354
Cutler A, Stevens JR (2006) Random forests for microarrays. Methods Enzymol 411:422–432
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792
Dahlgren JP (2010) Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecol Lett 13:E7–E9
Davidson TA, Sayer CD, Perrow M, Bramm M, Jeppesen E (2010a) The simultaneous inference of zooplanktivorous fish and macrophyte density from sub-fossil cladoceran assemblages: a multivariate regression tree approach. Freshw Biol 55:546–564
Davidson TA, Sayer CD, Langdon PG, Burgess A, Jackson MJ (2010b) Inferring past zooplanktivorous fish and macrophyte density in a shallow lake: application of a new regression tree model. Freshw Biol 55:584–599
De’ath G (1999) Principal curves: a new technique for indirect and direct gradient analysis. Ecology 80:2237–2253
De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1108–1117
De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251
De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192
De’ath G, Fabricius KE (2010) Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef. Ecol Appl 20:840–850
DeFries RS, Rudel T, Uriarte M, Hansen M (2010) Deforestation driven by urban population growth and agricultural trade in the twenty-first century. Nat Geosci 3:178–181
Despagne F, Massart D-L (1998) Variable selection for neural networks in multivariate calibration. Chemometrics Intell Lab Syst 40:145–163
D’heygere T, Goethals PLM, de Pauw N (2003) Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates. Ecol Model 160:291–300
Dobrowski SZ, Thorne JH, Greenberg JA, Safford HD, Mynsberge AR, Crimins SM, Swanson AK (2011) Modeling plant ranges over 75 years of climate change in California, USA: temporal transferability and species traits. Ecol Monogr 81:241–257
Dutilleul P, Cumming BF, Lontoc-Roy M (2012) Chapter 16 Autocorrelogram and periodogram analyses of palaeolimnological temporal series from lakes in central and western North America to assess shifts in drought conditions. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Efron B, Tibshirani R (1991) Statistical data analysis in the computer age. Science 253:390–395
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, London
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Elith J, Burgman M (2002) Predictions and their validation: rare plants in the Central Highlands, Victoria, Australia. In: Scott JM, Heglund P, Morrison ML, Raven PH (eds) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC
Elith J, Leathwick JR (2007) Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines. Divers Distrib 13:265–275
Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A et~al (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge
Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–748
Franklin J (2010) Mapping species distributions — spatial inference and prediction. Cambridge University Press, Cambridge
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
Friedman G, Meulman JJ (2003) Multivariate adaptive regression trees with application in epidemiology. Stat Med 22:1365–1381
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Software 33:1–22
Furlanello C, Neteler M, Merler S,Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C (2003) GIS and the random forests predictor: integration in R for tick-borne disease risk. In: Hornik K, Leitch F, Zeileis A (eds) Proceedings of the third international workshop on distributed statistical computings, pp 1–11
Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160:249–264
Giraudel JL, Lek S (2001) A comparison of self-organising map algorithm and some conventional statistical methods for ecological community ordination. Ecol Model 146:329–339
Gordon AD (1973) Classifications in the presence of constraints. Biometrics 29:821–827
Gordon AD, Birks HJB (1972) Numerical methods in Quaternary palaeoecology. I. Zonation of pollen diagrams. New Phytol 71:961–979
Gordon AD, Birks HJB (1974) Numerical methods in Quaternary palaeoecology. II. Comparison of pollen diagrams. New Phytol 73:221–249
Goring S, Lacourse T, Pellatt MG, Walker IR, Matthewes RW (2010) Are pollen-based climate models improved by combining surface samples from soil and lacustrine substrates? Rev Palaeobot Palynol 162:203–212
Grieger B (2002) Interpolating paleovegetation data with an artificial neural network approach. Global Planet Change 34:199–208
Guégan J-F, Lek S, Oberdorff T (1998) Energy availability and habitat heterogeneity predict global riverine fish diversity. Nature 391:382–384
Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84:502–516
Hastie T, Tibshirani R, Friedman J (2011) The elements of statistical learning, 2nd edn. Springer, New York
Haykin S (1999) Neural networks, 2nd edn. Prentice-Hall, Upper Saddle River
Hejda M, Pyšek P, Jarošik V (2009) Impact of invasive plants on the species richness, diversity and composition of invaded communities. J Ecol 97:393–403
Herzschuh U, Birks HJB (2010) Evaluating the indicator value of Tibetan pollen taxa for modern vegetation and climate. Rev Palaeobot Palynol 160:197–208
Hoerl AE, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Holmqvist BH (2005) Classification of large pollen datasets using neural networks with application to mapping and modelling pollen data. LUNDQUA report 39, Lund University
Horsak M, Chytrý M, Pokryszko BM, Danihelka J, Ermakov N, Hajek M, Hajkova P, Kintrova K, Koci M, Kubesova S, Lustyk P, Otypkova Z, Pelánková B, Valachovic M (2010) Habitats of relict terrestrial snails in southern Siberia: lessons for the reconstruction of palaeoenvironments of full-glacial Europe. J Biogeogr 37:1450–1462
Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mongr 68:465–485
Iverson LR, Prasad AM (2001) Potential changes in tree species richness and forest community types following climate change. Ecosystems 4:186–199
Iverson LR, Prasad AM, Schwartz MW (1999) Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virgiana. Ecol Model 115:77–93
Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. Forest Ecol Manage 254:390–406
Jacob G, Marriott FHC, Robbins PA (1997) Fitting curves to human respiratory data. Appl Stat 46:235–243
Jensen FV, Nielsen TD (2007) Bayesian networks and decision graphs, 2nd edn. Springer, New York
Jeschke JM, Strayer DL (2008) Usefulness of bioclimatic models for studying climate change and invasive species. Ann NY Acad Sci 1134:1–24
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Juggins S, Birks HJB (2012) Chapter 14 Quantitative environmental reconstructions from biological data. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Juggins S, Telford RJ (2012) Chapter 5 Exploratory data analysis and data display. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Kallimanis AS, Ragia V, Sgardelis SP, Pantis JD (2007) Using regression trees to predict alpha diversity based upon geographical and habitat characteristics. Biodivers Conserv 16:3863–3876
Keith RP, Veblen TT, Schoennagel TL, Sherriff RL (2010) Understory vegetation indicates historic fire regimes in ponderosa pine-dominated ecosystems in the Colorado Front Range. J Veg Sci 21:488–499
Kohonen T (2001) Self-organising maps, 3rd edn. Springer, Berlin
Korb KB, Nicholson AE (2004) Bayesian artificial intelligence. Chapman & Hall, Boca Raton
Kragt ME, Newham LTH, Jakeman AJ (2009) A Bayesian network approach to integrating economic and biophysical modelling. In: Anderssen RS, Braddock RD, Newham LTH (eds) 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand and International Association for Mathematics and Computers in Simulation. pp 2377–2383
Kucera M, Weinelt M, Kiefer T, Pflaumann U, Hayes A, Chen MT, Mix AC, Barrows TT, Cortijo E, Duprat J, Juggins S, Waelbroeck C (2005) Reconstruction of sea-surface temperatures from assemblages of planktonic foraminifera: multi-technique approach based on geographically constrained calibration data sets and its application to glacial Atlantic and Pacific Oceans. Quat Sci Rev 24:951–998
Larsen DR, Speckman PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics 60:543–549
Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range shifts: model differences and model reliability. Global Change Biol 12:1568–1584
Leathwick JR, Rowe D, Richardson J, Elith J, Hastie T (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw Biol 50:2034–2052
Leathwick JR, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199:188–196
Legendre P, Birks HJB (2012a) Chapter 7 Clustering and partitioning. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniqlues, vol 5. Springer, Dordrecht
Legendre P, Birks HJB (2012a) Chapter 8 From classical to canonical ordination. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Lek S, Guégan JF (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73
Lek S, Guégan J-F (2000) Artificial neuronal networks: application to ecology and evolution. Springer, Berlin
Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S (1996a) Application of neural networks to modelling nonlinear relationships in ecology. Ecol Model 90:39–52
Lek S, Dimopoulos I, Fabre A (1996b) Predicting phosphorus concentration and phosphorus load from watershed characteristics using backpropagation neural networks. Acta Oecol 17:43–53
Lindblah M, O’Connor R, Jacobson GL Jr (2002) Morphometric analysis of pollen grains for palaeoecological studies: classification of Picea from eastern North America. Am J Bot 89:1459–1467
Lindblah M, Jacobson GL Jr, Schauffler M (2003) The postglacial history of three Picea species in New England, USA. Quat Res 59:61–69
Lindström J, Kokko H, Ranta E, Lindén H (1998) Predicting population fluctuations with artificial neural networks. Wildl Biol 4:47–53
Lotter AF, Anderson NJ (2012) Chapter 18 Limnological responses to environmental changes at inter-annual to decadal time-scales. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Malmgren BA, Nordlund U (1997) Application of artificial neural networks to paleoceanographic data. Palaeogeogr Palaeoclim Palaeoecol 136:359–373
Malmgren BA, Winter A (1999) Climate zonation in Puerto Rico based on principal component analysis and an artificial neural network. J Climate 12:977–985
Malmgren BA, Kucera M, Nyberg J, Waelbroeck C (2001) Comparison of statistical and artificial neural network techniques for estimating past sea surface temperatures from planktonic foraminfer census data. Paleoceanography 16:520–530
Manel S, Dias JM, Buckton ST, Ormerord SJ (1999a) Alternative methods for predicting species distribution: an illustration with Himalayan river birds. J Appl Ecol 36:734–747
Manel S, Dias JM, Ormerord SJ (1999b) Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird. Ecol Model 120:337–347
Marcot BG, Holthausen RS, Raphael MG, Rowland MG, Wisdom MJ (2001) Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. Forest Ecol Manage 153:29–42
Martens H, Næes T (1989) Multivariate calibration. Wiley, Chichester
Maslow AH (1996) The psychology of science: a reconnaissance. Maurice Bassett Publishing
Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemometrics Intell Lab Syst 83:99–113
Melssen W, Bulent U, Buydens L (2007) SOMPLS: a supervised self-organising map-partial least squares algorithm for multivariate regression problems. Chemometrics Intell Lab Syst 86:102–120
Michaelson J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–686
Milbarrow S (2011) Earth. R package version 3.2-0. http://cran.r-project.org/packages=earth
Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton
Miller J, Franklin J (2002) Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence. Ecol Model 157:227–247
Moisen GG, Frescino TS (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–225
Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434
Mundry R, Nunn CL (2009) Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat 173:119–123
Murphy B, Jansen C, Murray J, de Barro P (2010) Risk analysis on the Australian release of Aedes aegypti (L.) (Diptera: Culicidae) Containing Wolbachia. CSIRO
Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12:1061–1068
Nakagawa S, Freckleton RP (2008) Missing inaction: the danger of ignoring missing data. Trends Ecol Evol 23:592–596
Newton AC, Marshall E, Schreckenberg K, Golicher D, te Velde DW, Edouard F, Arancibia E (2006) Use of a Bayesian belief network to predict the impacts of commercializing non-timber forest products on livelihoods. Ecol Soc 11:24
Newton AC, Stewart GB, Diaz A, Golicher D, Pullin AS (2007) Bayesian belief networks as a tool for evidence-based conservation management. J Nat Conserv 15:144–160
Nyberg H, Malmgren BA, Kuijpers A, Winter A (2002) A centennial-scale variability of tropical North Atlantic surface hydrology during the late Holocene. Palaeogeogr Palaeoclim Palaeoecol 183:25–41
Næs T, Kvaal K, Isaksson T, Miller C (1993) Artificial neural networks in multivariate calibration. J Near IR Spectrosc 1:1–11
Næs T, Isaksson T, Fearn T, Davies T (2002) A user-friendly guide to multivariate calibration and classification. NIR Publications, Chichester
Olden JD (2000) An artificial neural network approach for studying phytoplankton succession. Hydrobiologia 436:131–143
Olden JD, Jackson DA (2002) Illuminating the ‘black box’: a randomization approach for understanding variable contributions in artificial neural networks. Ecol Model 154:135–150
Olden JD, Joy MK, Death RG (2004) An accurate comparison on methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178:389–397
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a paper for ecologists. Quart Rev Biol 83:171–193
Ôzesmi SL, Tan CO, Özesmi U (2006) Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecol Model 195:83–93
Pakeman RJ, Torvell L (2008) Identifying suitable restoration sites for a scarce subarctic willow (Salix arbuscula) using different information sources and methods. Plant Ecol Divers 1:105–114
Park MY, Hastie T (2007) l1-regularization path algorithm for generalised linear models. J R Stat Soc Ser B 69:659–677
Pearson RG, Thuiller W, Araújo MB, Martinez-Meyer E, Brotons L, McClean C, Miles L, Segurado P, DawsonTP LDC (2006) Model-based uncertainty in species range prediction. J Biogeogr 33:1704–1711
Pelánková B, Kuneš P, Chytrý M, Jankovská V, Ermakov N, Svobodová-Svitavaská H (2008) The relationships of modern pollen spectra to vegetation and climate along a steppe-forest-tundra transition in southern Siberia, explored by decision trees. Holocene 18:1259–1271
Peters J, De Baets B, Verhoest NEC, Samson R, Degroeve S, de Becker P, Huybrechts W (2007) Random forests as a tool for predictive ecohydrological modelling. Ecol Model 207:304–318
Peyron O, Guiot J, Cheddadi R, Tarasov P, Reille M, de Beaulieu J-L, Bottema S, Andrieu V (1998) Climatic reconstruction of Europe for 18,000 yr BP from pollen data. Quat Res 49:183–196
Peyron O, Jolly D, Bonnefille R, Vincens A, Guiot J (2000) Climate of East Africa 6000 14C yr BP as inferred from pollen data. Quat Res 54:90–101
Peyron O, Bégeot C, Brewer S, Heiri O, Magny M, Millet L, Ruffaldi P, van Campo E, Yu G (2005) Lateglacial climatic changes in Eastern France (Lake Lautrey) from pollen, lake-levels, and chironomids. Quat Res 64:197–211
Ploner A, Brandenburg C (2003) Modelling visitor attendance levels subject to day of the week and weather: a comparison between linear regression models and regression trees. J Nat Conserv 11:297–308
Pourret O, Naïm P, Marcot B (eds) (2008) Bayesian networks. A practical guide to applications. Wiley, Chichester
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufman, San Mateo
R Development Core Team (2011) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. http://www.r-project.org
Racca JMJ, Philibert A, Racca R, Prairie YT (2001) A comparison between diatom-pH-inference models using artificial neural networks (ANN), weighted averaging (WA) and weighted averaging partial least square (WA-PLS) regressions. J Paleolimnol 26:411–422
Racca JMJ, Wild M, Birks HJB, Prairie YT (2003) Separating wheat from chaff: diatom taxon selection using an artificial neural network pruning algorithm. J Paleolimnol 29:123–133
Racca JMJ, Gregory-Eaves I, Pienitz R, Prairie YT (2004) Tailoring palaeolimnological diatom-based transfer functions. Can J Fish Aquat Sci 61:2440–2454
Ramakrishnan N, Grama A (2001) Mining scientific data. Adv Comput 55:119–169
Raymond B, Watts DJ, Burton H, Bonnice J (2005) Data mining and scientific data. Arct Antarct Alp Res 37:348–357
Recknagel F, French M, Harkonen P, Yabunaka K-I (1997) Artificial neural network approach for modelling and prediction of algal blooms. Ecol Model 96:11–28
Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150
Rejwan C, Collins NC, Brunner LJ, Shuter BJ, Ridgway MS (1999) Tree regression analysis on the nesting habitat of smallmouth bass. Ecology 80:341–348
Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf. Accessed 20 July 2011
Ridgeway G (2010) gbm. R package version 1.6-3.1. http://cran.r-project.org/web/packages/gbm/
Rieman B, Peterson JT, Clayton J, Howell P, Thurow R, Thompson W, Lee D (2001) Evaluation of potential effects of federal land management alternatives on trends of salmonids and their habitats in the interior Columbia River basin. Forest Ecol Manage 153:43–62
Ripley BD (2008) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Roberts DR, Hamann A (2011) Predicting potential climate change impacts with bioclimate envelope models: a palaeoecological perspective. Global Ecol Biogeogr. doi:10.1111/j.1466-8238.2011.00657.x
Rose NL (2001) Fly-ash particles. In: Last WM, Smol JP (eds) Tracking environmental change using lake sediments, vol 2, Physical and geochemical methods. Kluwer Academic Publishers, Dordrecht, pp 319–349
Rose NL, Juggins S, Watt J, Battarbee RW (1994) Fuel-type characterization of spheroidal carbonaceous particles using surface chemistry. Ambio 23:296–299
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Scull P, Franklin J, Chadwick OA (2005) The application of classification tree analysis to soil type prediction in a desert landscape. Ecol Model 181:1–15
Simpson GL (2012) Chapter 15 Modern analogue techniques. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Spadavecchia L, Williams M, Bell R, Stoy PC, Huntley B, van Wijk MT (2008) Topographic controls on the leaf area index and plant functional type of a tundra ecosystem. J Ecol 96:1238–1251
Spitz F, Lek S (1999) Environmental impact prediction using neural network modelling. An example in wildlife damage. J Appl Ecol 36:317–326
Steiner D, Pauling A, Nussbaumer SU, Nesje A, Luterbacher J, Wanner H, Zumbühl HJ (2008) Sensitivity of European glaciers to precipitation and temperature – two case studies. Clim Chang 90:413–441
Stewart-Koster B, Bunn SE, Mackay SJ, Poff NL, Naiman RJ, Lake PS (2010) The use of Bayesian networks to guide investments in flow and catchment restoration for impaired river ecosystems. Freshw Biol 55:243–260
Stockwell DRB, Noble IR (1992) Induction of sets of rules from animal distribution data: a robust and informative method of data analysis. Math Comput Sims 33:385–390
Stockwell DRB, Peters D (1999) The GARP modelling system: problems and solutions to automated spatial prediction. Int J Geogr Info Sci 13:143–158
Stockwell DRB, Peterson AT (2002) Effects of sample size on accuracy of species distribution models. Ecol Model 148:1–13
Tarasov P, Peyron O, Guiot J, Brewer S, Volkova VS, Bezusko LG, Dorofeyuk NI, Kvavadze EV, Osipova IM, Panova NK (1999a) Late glacial maximum climate of the former Soviet Union and Mongolia reconstructed from pollen and plant macrofossil data. Clim Dyn 15:227–240
Tarasov P, Guiot J, Cheddadi R, Andreev AA, Bezusko LG, Blyakharchuk TA, Dorofeyuk NI, Filimonova LV, Volkova VS, Zernitskayo VP (1999b) Climate in northern Eurasia 6000 years ago reconstructed from pollen data. Earth Planet Sci Lett 171:635–645
Telford RJ, Birks HJB (2009) Design and evaluation of transfer functions in spatially structured environments. Quat Sci Rev 28:1309–1316
ter Braak CJF (2009) Regression by L 1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model. J Chemometrics 23:217–228
Therneau TM, Atkinson B [R port by Ripley B] (2011) rpart: recursive partitioning. R package version 3.1-50. http://cran.r-project.org/package/rpart
Thuiller W, Araújo MB, Lavorel S (2003) Generalized models vs, classification tree analysis: predicting spatial distributions of plant species at different scales. J Veg Sci 14:669–680
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Ticehurst JL, Curtis A, Merritt WS (2011) Using Bayesian networks to complement conventional analyses to explore landholder management of native vegetation. Environ Model Softw 26:52–65
Tsaor A, Allouche O, Steinitz O, Rotem D, Kadmon R (2007) A comparative evaluation of presence-only methods for modelling species distribution. Divers Distrib 13:397–405
van Dijk ADJ, ter Braak CJF, Immink RG, Angenent GC, van Ham RCHJ (2008) Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control. Bioinformatics 24:26–33
Vayssieres MP, Plant RE, Allen-Diaz BH (2000) Classification trees: an alternative non-parametric approach for predicting species distributions. J Veg Sci 11:679–694
Vincenzi S, Zucchetta M, Franzoi P, Pellizzato M, Pranovi F, de Leo GA, Torricelli P (2011) Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecol Model 222:1471–1478
Warner B, Misra M (1996) Understanding neural networks as statistical tools. Am Stat 50:284–293
Wehrens R (2011) Chemometrics with R: multivariate analysis in the natural sciences and life sciences. Springer, New York
Wehrens R, Buydens LMC (2007) Self- and super-organising maps in R: the kohonen package. J Stat Softw 21:1–19
Weller AF, Harris AJ, Ware JA (2006) Artificial neural networks as potential classification tools for dinoflagellate cyst images: a case using the self-organizing map clustering algorithm. Rev Palaeobot Palynol 141:287–302
Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use step-wise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189
Williams JN, Seo C, Thorne J, Nelson JK, Erwin S, O’Brien JM, Schwartz MW (2009) Using species distribution models to predict new occurrences for rare plants. Divers Distrib 15:565–576
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann/Elsevier, Amsterdam
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Acknowledgements
We are indebted to Richard Telford, Steve Juggins, and John Smol for helpful comments and/or discussion. Whilst writing this chapter, GLS was supported by the European Union Seventh Framework Programme projects REFRESH (Contract N. 244121) and BioFresh (Contract No. 226874), and by the UK Natural Environment Research Council (grant NE/G020027/1).We are particularly grateful to Cathy Jenks for her editorial help. This is publication A359 from the Bjerknes Centre for Climate Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Simpson, G.L., Birks, H.J.B. (2012). Statistical Learning in Palaeolimnology. In: Birks, H., Lotter, A., Juggins, S., Smol, J. (eds) Tracking Environmental Change Using Lake Sediments. Developments in Paleoenvironmental Research, vol 5. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2745-8_9
Download citation
DOI: https://doi.org/10.1007/978-94-007-2745-8_9
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2744-1
Online ISBN: 978-94-007-2745-8
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)