Skip to main content

Statistical Learning in Palaeolimnology

  • Chapter
  • First Online:
Tracking Environmental Change Using Lake Sediments

Part of the book series: Developments in Paleoenvironmental Research ((DPER,volume 5))

Abstract

This chapter considers a range of numerical techniques that lie outside the familiar statistical methods of linear regression, analysis of variance, and generalised linear models or data-analytical techniques such as ordination, clustering, and partitioning. The techniques outlined have developed as a result of the spectacular increase in computing power since the 1980s. The methods make fewer distributional assumptions than classical statistical methods and can be applied to more complicated estimators and to huge data-sets. They are part of the ever-increasing array of ‘statistical learning’ techniques (sensu Hastie, Tibshirani, Friedman J, The elements of statistical learning, 2nd edn. Springer, New York, 2011) that try to make sense of the data at hand, to detect major patterns and trends, to understand ‘what the data say’, and thus to learn from the data.

A range of tree-based and network-based techniques are presented. These are classification and regression trees, multivariate regression trees, bagged trees, random forests, boosted trees, multivariate adaptive regression splines, artificial neural networks, self-organising maps, Bayesian networks, and genetic algorithms. Principal curves and surfaces are also discussed as they relate to unsupervised self-organising maps. The chapter concludes with a discussion of current developments in shrinkage methods and variable selection in statistical modelling that can help in model selection and can minimise collinearity problems. These include principal components regression, ridge regression, the lasso, and the elastic net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Pressure gradient between Los Angeles airport (LAX) and Daggert in mmHg.

References

  • Aalders I (2008) Modeling land-use decision behavior with Bayesian belief networks. Ecol Soc 13:16

    Google Scholar 

  • Aho K, Weaver T, Regele S (2011) Identification and siting of native vegetation types on disturbed land: demonstration of statistical methods. Appl Veg Sci 14:277–290

    Google Scholar 

  • Amsinck SL, Strzelczak A, Bjerring R, Landkildehus F, Lauridsen TL, Christoffersen K, Jeppesen E (2006) Lake depth rather than fish planktivory determines cladoceran community structure in Faroese lakes – evidence from contemporary data and sediments. Freshw Biol 51:2124–2142

    CAS  Google Scholar 

  • Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York

    Google Scholar 

  • Anderson RP, Lew D, Peterson AT (2003) Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Model 162:211–232

    Google Scholar 

  • Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–139

    Google Scholar 

  • Balshi MS, McGuire AD, Duffy P, Flannigan M, Walsh J, Melillo J (2009) Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biol 15:578–600

    Google Scholar 

  • Banfield JD, Raftery AE (1992) Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. J Am Stat Assoc 87:7–16

    Google Scholar 

  • Barrows TT, Juggins S (2005) Sea-surface temperatures around the Australian margin and Indian Ocean during the last Glacial Maximum. Quat Sci Rev 24:1017–1047

    Google Scholar 

  • Barton AM, Nurse AM, Michaud K, Hardy SW (2011) Use of CART analysis to differentiate pollen of red pine (Pinus resinosa) and jack pine (P. banksiana) in New England. Quat Res 75:18–23

    Google Scholar 

  • Belgrano A, Malmgren BA, Lindahl O (2001) Application of artificial neural networks (ANN) to primary production time-series data. J Plankton Res 23:651–658

    CAS  Google Scholar 

  • Benito Garzón M, Blazek R, Neteler M, Sánchez de Dios R, Sainz Ollero H, Furlanello C (2006) Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol Model 197:383–393

    Google Scholar 

  • Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2007) Predictive modelling of tree species distributions on the Iberian Peninsula during the Last Glacial Maximum and Mid-Holocene. Ecography 30:120–134

    Google Scholar 

  • Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2008) Effects of climate change on the distribution of Iberian tree species. Appl Veg Sci 11:169–178

    Google Scholar 

  • Birks HH, Mathewes RW (1978) Studies in the vegetational history of Scotland. V. Late Devensian and early Flandrian pollen and macrofossil stratigraphy at Abernethy Forest, Inverness-shire. New Phytol 80:455–484

    Google Scholar 

  • Birks HJB (1995) Quantitative palaeoenvironmental reconstructions. In: Maddy D, Brew J (eds) Statistical modelling of quaternary science data, vol 5, Technical guide. Quaternary Research Association, Cambridge, pp 161–254

    Google Scholar 

  • Birks HJB (2012a) Chapter 2 Overview of numerical methods in palaeolimnology. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) 2012. Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Birks HJB (2012a) Chapter 11 Stratigraphical data analysis. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Birks HJB, Gordon AD (1985) Numerical methods in Quaternary pollen analysis. Academic, London

    Google Scholar 

  • Birks HJB, Jones VJ (2012) Chapter 3 Data-sets. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Birks HJB, Line JM, Juggins S, Stevenson AC, ter Braak CJF (1990) Diatoms and pH reconstruction. Philos Trans R Soc B 327:263–278

    Google Scholar 

  • Bishop CM (1995) Neural networks for pattern recognition. Clarendon, Oxford

    Google Scholar 

  • Bishop CM (2007) Pattern recognition and machine learner. Springer, Dordrecht

    Google Scholar 

  • Bjerring R, Becares E, Declerck S et~al (2009) Subfossil Cladocera in relation to contemporary environmental variables in 54 pan-European lakes. Freshw Biol 54:2401–2417

    CAS  Google Scholar 

  • Blaauw M, Heegaard E (2012) Chapter 12 Estimation of age-depth relationships. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Borggaard C, Thodberg HH (1992) Optimal minimal neural interpretation of spectra. Anal Chem 64:545–551

    CAS  Google Scholar 

  • Bourg NA, McShea WJ, Gill DE (2005) Putting a CART before the search: successful habitat prediction for a rare forest herb. Ecology 86:2793–2804

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  • Brosse S, Guégan J-F, Tourenq J-N, Lek S (1999) The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a mesotrophic lake. Ecol Model 120:299–311

    Google Scholar 

  • Brunelle A, Rehfeldt GE, Bentz B, Munson AS (2008) Holocene records of Dendroctonus bark beetles in high elevation pine forests of Idaho and Montana, USA. Ecol Manage 255:836–846

    Google Scholar 

  • Burman P, Chow E, Nolan D (1994) A cross-validatory method for dependent data. Biometrika 81:351–358

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York

    Google Scholar 

  • Cairns DM (2001) A comparison of methods for predicting vegetation type. Plant Ecol 156:3–18

    Google Scholar 

  • Caley P, Kuhnert PM (2006) Application and evaluation of classification trees for screening unwanted plants. Austral Ecol 31:647–655

    Google Scholar 

  • Carlisle DM, Wolock DM, Meador MR (2011) Alteration of streamflow magnitudes and potential ecological consequences: a multiregional assessment. Front Ecol Environ 9:264–270

    Google Scholar 

  • Castelletti A, Soncini-Sessa R (2007a) Bayesian Networks and participatory modelling in water resource management. Environ Model Softw 22:1075–1088

    Google Scholar 

  • Castelletti A, Soncini-Sessa R (2007b) Coupling real-time and control and socio-economic issues in participatory river basin planning. Environ Model Softw 22:1114–1128

    Google Scholar 

  • Céréghino R, Giraudel JL, Compin A (2001) Spatial analysis of stream invertebrates distribution in the Adour-Garonne drainage basin (France), using Kohonen self-organizing maps. Ecol Model 146:167–180

    Google Scholar 

  • Černá L, Chytrý M (2005) Supervised classification of plant communities with artificial neural networks. J Veg Sci 16:407–414

    Google Scholar 

  • Chapman DS (2010) Weak climatic associations among British plant distributions. Global Ecol Biogeogr 19:831–841

    Google Scholar 

  • Chapman DS, Purse BV (2011) Community versus single-species distribution models for British plants. J Biogeogr 38:1524–1535

    Google Scholar 

  • Chapman DS, Bonn A, Kunin WE, Cornell SJ (2010) Random Forest characterization of upland vegetation and management burning from aerial imagery. J Biogeogr 37:37–46

    Google Scholar 

  • Chatfield C (1993) Neural networks: forecasting breakthrough or passing fad? Int J Forecast 9:1–3

    Google Scholar 

  • Chon T-S (2011) Self-organising maps applied to ecological sciences. Ecol Inform 6:50–61

    Google Scholar 

  • Chytrý M, Jarošik V, Pyšek P, Hájek O, Knollová I, Tichý L, Danihelka J (2008) Separating habitat invasibility by alien plants from the actual level of invasion. Ecology 89:1541–1553

    Google Scholar 

  • Copas JB (1983) Regression, prediction and shrinkage. J R Stat Soc Ser B 45:311–354

    Google Scholar 

  • Cutler A, Stevens JR (2006) Random forests for microarrays. Methods Enzymol 411:422–432

    CAS  Google Scholar 

  • Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792

    Google Scholar 

  • Dahlgren JP (2010) Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecol Lett 13:E7–E9

    Google Scholar 

  • Davidson TA, Sayer CD, Perrow M, Bramm M, Jeppesen E (2010a) The simultaneous inference of zooplanktivorous fish and macrophyte density from sub-fossil cladoceran assemblages: a multivariate regression tree approach. Freshw Biol 55:546–564

    CAS  Google Scholar 

  • Davidson TA, Sayer CD, Langdon PG, Burgess A, Jackson MJ (2010b) Inferring past zooplanktivorous fish and macrophyte density in a shallow lake: application of a new regression tree model. Freshw Biol 55:584–599

    Google Scholar 

  • De’ath G (1999) Principal curves: a new technique for indirect and direct gradient analysis. Ecology 80:2237–2253

    Google Scholar 

  • De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1108–1117

    Google Scholar 

  • De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251

    Google Scholar 

  • De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192

    Google Scholar 

  • De’ath G, Fabricius KE (2010) Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef. Ecol Appl 20:840–850

    Google Scholar 

  • DeFries RS, Rudel T, Uriarte M, Hansen M (2010) Deforestation driven by urban population growth and agricultural trade in the twenty-first century. Nat Geosci 3:178–181

    CAS  Google Scholar 

  • Despagne F, Massart D-L (1998) Variable selection for neural networks in multivariate calibration. Chemometrics Intell Lab Syst 40:145–163

    CAS  Google Scholar 

  • D’heygere T, Goethals PLM, de Pauw N (2003) Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates. Ecol Model 160:291–300

    Google Scholar 

  • Dobrowski SZ, Thorne JH, Greenberg JA, Safford HD, Mynsberge AR, Crimins SM, Swanson AK (2011) Modeling plant ranges over 75 years of climate change in California, USA: temporal transferability and species traits. Ecol Monogr 81:241–257

    Google Scholar 

  • Dutilleul P, Cumming BF, Lontoc-Roy M (2012) Chapter 16 Autocorrelogram and periodogram analyses of palaeolimnological temporal series from lakes in central and western North America to assess shifts in drought conditions. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Google Scholar 

  • Efron B, Tibshirani R (1991) Statistical data analysis in the computer age. Science 253:390–395

    CAS  Google Scholar 

  • Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, London

    Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Google Scholar 

  • Elith J, Burgman M (2002) Predictions and their validation: rare plants in the Central Highlands, Victoria, Australia. In: Scott JM, Heglund P, Morrison ML, Raven PH (eds) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC

    Google Scholar 

  • Elith J, Leathwick JR (2007) Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines. Divers Distrib 13:265–275

    Google Scholar 

  • Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A et~al (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151

    Google Scholar 

  • Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813

    CAS  Google Scholar 

  • Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge

    Google Scholar 

  • Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–748

    Google Scholar 

  • Franklin J (2010) Mapping species distributions — spatial inference and prediction. Cambridge University Press, Cambridge

    Google Scholar 

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285

    Google Scholar 

  • Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378

    Google Scholar 

  • Friedman G, Meulman JJ (2003) Multivariate adaptive regression trees with application in epidemiology. Stat Med 22:1365–1381

    Google Scholar 

  • Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407

    Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Software 33:1–22

    Google Scholar 

  • Furlanello C, Neteler M, Merler S,Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C (2003) GIS and the random forests predictor: integration in R for tick-borne disease risk. In: Hornik K, Leitch F, Zeileis A (eds) Proceedings of the third international workshop on distributed statistical computings, pp 1–11

    Google Scholar 

  • Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160:249–264

    Google Scholar 

  • Giraudel JL, Lek S (2001) A comparison of self-organising map algorithm and some conventional statistical methods for ecological community ordination. Ecol Model 146:329–339

    Google Scholar 

  • Gordon AD (1973) Classifications in the presence of constraints. Biometrics 29:821–827

    Google Scholar 

  • Gordon AD, Birks HJB (1972) Numerical methods in Quaternary palaeoecology. I. Zonation of pollen diagrams. New Phytol 71:961–979

    Google Scholar 

  • Gordon AD, Birks HJB (1974) Numerical methods in Quaternary palaeoecology. II. Comparison of pollen diagrams. New Phytol 73:221–249

    Google Scholar 

  • Goring S, Lacourse T, Pellatt MG, Walker IR, Matthewes RW (2010) Are pollen-based climate models improved by combining surface samples from soil and lacustrine substrates? Rev Palaeobot Palynol 162:203–212

    Google Scholar 

  • Grieger B (2002) Interpolating paleovegetation data with an artificial neural network approach. Global Planet Change 34:199–208

    Google Scholar 

  • Guégan J-F, Lek S, Oberdorff T (1998) Energy availability and habitat heterogeneity predict global riverine fish diversity. Nature 391:382–384

    Google Scholar 

  • Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84:502–516

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2011) The elements of statistical learning, 2nd edn. Springer, New York

    Google Scholar 

  • Haykin S (1999) Neural networks, 2nd edn. Prentice-Hall, Upper Saddle River

    Google Scholar 

  • Hejda M, Pyšek P, Jarošik V (2009) Impact of invasive plants on the species richness, diversity and composition of invaded communities. J Ecol 97:393–403

    Google Scholar 

  • Herzschuh U, Birks HJB (2010) Evaluating the indicator value of Tibetan pollen taxa for modern vegetation and climate. Rev Palaeobot Palynol 160:197–208

    Google Scholar 

  • Hoerl AE, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Google Scholar 

  • Holmqvist BH (2005) Classification of large pollen datasets using neural networks with application to mapping and modelling pollen data. LUNDQUA report 39, Lund University

    Google Scholar 

  • Horsak M, Chytrý M, Pokryszko BM, Danihelka J, Ermakov N, Hajek M, Hajkova P, Kintrova K, Koci M, Kubesova S, Lustyk P, Otypkova Z, Pelánková B, Valachovic M (2010) Habitats of relict terrestrial snails in southern Siberia: lessons for the reconstruction of palaeoenvironments of full-glacial Europe. J Biogeogr 37:1450–1462

    Google Scholar 

  • Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mongr 68:465–485

    Google Scholar 

  • Iverson LR, Prasad AM (2001) Potential changes in tree species richness and forest community types following climate change. Ecosystems 4:186–199

    CAS  Google Scholar 

  • Iverson LR, Prasad AM, Schwartz MW (1999) Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virgiana. Ecol Model 115:77–93

    Google Scholar 

  • Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. Forest Ecol Manage 254:390–406

    Google Scholar 

  • Jacob G, Marriott FHC, Robbins PA (1997) Fitting curves to human respiratory data. Appl Stat 46:235–243

    Google Scholar 

  • Jensen FV, Nielsen TD (2007) Bayesian networks and decision graphs, 2nd edn. Springer, New York

    Google Scholar 

  • Jeschke JM, Strayer DL (2008) Usefulness of bioclimatic models for studying climate change and invasive species. Ann NY Acad Sci 1134:1–24

    Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    Google Scholar 

  • Juggins S, Birks HJB (2012) Chapter 14 Quantitative environmental reconstructions from biological data. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Juggins S, Telford RJ (2012) Chapter 5 Exploratory data analysis and data display. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Kallimanis AS, Ragia V, Sgardelis SP, Pantis JD (2007) Using regression trees to predict alpha diversity based upon geographical and habitat characteristics. Biodivers Conserv 16:3863–3876

    Google Scholar 

  • Keith RP, Veblen TT, Schoennagel TL, Sherriff RL (2010) Understory vegetation indicates historic fire regimes in ponderosa pine-dominated ecosystems in the Colorado Front Range. J Veg Sci 21:488–499

    Google Scholar 

  • Kohonen T (2001) Self-organising maps, 3rd edn. Springer, Berlin

    Google Scholar 

  • Korb KB, Nicholson AE (2004) Bayesian artificial intelligence. Chapman & Hall, Boca Raton

    Google Scholar 

  • Kragt ME, Newham LTH, Jakeman AJ (2009) A Bayesian network approach to integrating economic and biophysical modelling. In: Anderssen RS, Braddock RD, Newham LTH (eds) 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand and International Association for Mathematics and Computers in Simulation. pp 2377–2383

    Google Scholar 

  • Kucera M, Weinelt M, Kiefer T, Pflaumann U, Hayes A, Chen MT, Mix AC, Barrows TT, Cortijo E, Duprat J, Juggins S, Waelbroeck C (2005) Reconstruction of sea-surface temperatures from assemblages of planktonic foraminifera: multi-technique approach based on geographically constrained calibration data sets and its application to glacial Atlantic and Pacific Oceans. Quat Sci Rev 24:951–998

    Google Scholar 

  • Larsen DR, Speckman PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics 60:543–549

    Google Scholar 

  • Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range shifts: model differences and model reliability. Global Change Biol 12:1568–1584

    Google Scholar 

  • Leathwick JR, Rowe D, Richardson J, Elith J, Hastie T (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw Biol 50:2034–2052

    Google Scholar 

  • Leathwick JR, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199:188–196

    Google Scholar 

  • Legendre P, Birks HJB (2012a) Chapter 7 Clustering and partitioning. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniqlues, vol 5. Springer, Dordrecht

    Google Scholar 

  • Legendre P, Birks HJB (2012a) Chapter 8 From classical to canonical ordination. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Lek S, Guégan JF (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73

    Google Scholar 

  • Lek S, Guégan J-F (2000) Artificial neuronal networks: application to ecology and evolution. Springer, Berlin

    Google Scholar 

  • Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S (1996a) Application of neural networks to modelling nonlinear relationships in ecology. Ecol Model 90:39–52

    Google Scholar 

  • Lek S, Dimopoulos I, Fabre A (1996b) Predicting phosphorus concentration and phosphorus load from watershed characteristics using backpropagation neural networks. Acta Oecol 17:43–53

    Google Scholar 

  • Lindblah M, O’Connor R, Jacobson GL Jr (2002) Morphometric analysis of pollen grains for palaeoecological studies: classification of Picea from eastern North America. Am J Bot 89:1459–1467

    Google Scholar 

  • Lindblah M, Jacobson GL Jr, Schauffler M (2003) The postglacial history of three Picea species in New England, USA. Quat Res 59:61–69

    Google Scholar 

  • Lindström J, Kokko H, Ranta E, Lindén H (1998) Predicting population fluctuations with artificial neural networks. Wildl Biol 4:47–53

    Google Scholar 

  • Lotter AF, Anderson NJ (2012) Chapter 18 Limnological responses to environmental changes at inter-annual to decadal time-scales. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Malmgren BA, Nordlund U (1997) Application of artificial neural networks to paleoceanographic data. Palaeogeogr Palaeoclim Palaeoecol 136:359–373

    Google Scholar 

  • Malmgren BA, Winter A (1999) Climate zonation in Puerto Rico based on principal component analysis and an artificial neural network. J Climate 12:977–985

    Google Scholar 

  • Malmgren BA, Kucera M, Nyberg J, Waelbroeck C (2001) Comparison of statistical and artificial neural network techniques for estimating past sea surface temperatures from planktonic foraminfer census data. Paleoceanography 16:520–530

    Google Scholar 

  • Manel S, Dias JM, Buckton ST, Ormerord SJ (1999a) Alternative methods for predicting species distribution: an illustration with Himalayan river birds. J Appl Ecol 36:734–747

    Google Scholar 

  • Manel S, Dias JM, Ormerord SJ (1999b) Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird. Ecol Model 120:337–347

    Google Scholar 

  • Marcot BG, Holthausen RS, Raphael MG, Rowland MG, Wisdom MJ (2001) Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. Forest Ecol Manage 153:29–42

    Google Scholar 

  • Martens H, Næes T (1989) Multivariate calibration. Wiley, Chichester

    Google Scholar 

  • Maslow AH (1996) The psychology of science: a reconnaissance. Maurice Bassett Publishing

    Google Scholar 

  • Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemometrics Intell Lab Syst 83:99–113

    CAS  Google Scholar 

  • Melssen W, Bulent U, Buydens L (2007) SOMPLS: a supervised self-organising map-partial least squares algorithm for multivariate regression problems. Chemometrics Intell Lab Syst 86:102–120

    CAS  Google Scholar 

  • Michaelson J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–686

    Google Scholar 

  • Milbarrow S (2011) Earth. R package version 3.2-0. http://cran.r-project.org/packages=earth

  • Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  • Miller J, Franklin J (2002) Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence. Ecol Model 157:227–247

    Google Scholar 

  • Moisen GG, Frescino TS (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–225

    Google Scholar 

  • Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434

    Google Scholar 

  • Mundry R, Nunn CL (2009) Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat 173:119–123

    Google Scholar 

  • Murphy B, Jansen C, Murray J, de Barro P (2010) Risk analysis on the Australian release of Aedes aegypti (L.) (Diptera: Culicidae) Containing Wolbachia. CSIRO

    Google Scholar 

  • Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12:1061–1068

    Google Scholar 

  • Nakagawa S, Freckleton RP (2008) Missing inaction: the danger of ignoring missing data. Trends Ecol Evol 23:592–596

    Google Scholar 

  • Newton AC, Marshall E, Schreckenberg K, Golicher D, te Velde DW, Edouard F, Arancibia E (2006) Use of a Bayesian belief network to predict the impacts of commercializing non-timber forest products on livelihoods. Ecol Soc 11:24

    Google Scholar 

  • Newton AC, Stewart GB, Diaz A, Golicher D, Pullin AS (2007) Bayesian belief networks as a tool for evidence-based conservation management. J Nat Conserv 15:144–160

    Google Scholar 

  • Nyberg H, Malmgren BA, Kuijpers A, Winter A (2002) A centennial-scale variability of tropical North Atlantic surface hydrology during the late Holocene. Palaeogeogr Palaeoclim Palaeoecol 183:25–41

    Google Scholar 

  • Næs T, Kvaal K, Isaksson T, Miller C (1993) Artificial neural networks in multivariate calibration. J Near IR Spectrosc 1:1–11

    Google Scholar 

  • Næs T, Isaksson T, Fearn T, Davies T (2002) A user-friendly guide to multivariate calibration and classification. NIR Publications, Chichester

    Google Scholar 

  • Olden JD (2000) An artificial neural network approach for studying phytoplankton succession. Hydrobiologia 436:131–143

    CAS  Google Scholar 

  • Olden JD, Jackson DA (2002) Illuminating the ‘black box’: a randomization approach for understanding variable contributions in artificial neural networks. Ecol Model 154:135–150

    Google Scholar 

  • Olden JD, Joy MK, Death RG (2004) An accurate comparison on methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178:389–397

    Google Scholar 

  • Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a paper for ecologists. Quart Rev Biol 83:171–193

    Google Scholar 

  • Ôzesmi SL, Tan CO, Özesmi U (2006) Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecol Model 195:83–93

    Google Scholar 

  • Pakeman RJ, Torvell L (2008) Identifying suitable restoration sites for a scarce subarctic willow (Salix arbuscula) using different information sources and methods. Plant Ecol Divers 1:105–114

    Google Scholar 

  • Park MY, Hastie T (2007) l1-regularization path algorithm for generalised linear models. J R Stat Soc Ser B 69:659–677

    Google Scholar 

  • Pearson RG, Thuiller W, Araújo MB, Martinez-Meyer E, Brotons L, McClean C, Miles L, Segurado P, DawsonTP LDC (2006) Model-based uncertainty in species range prediction. J Biogeogr 33:1704–1711

    Google Scholar 

  • Pelánková B, Kuneš P, Chytrý M, Jankovská V, Ermakov N, Svobodová-Svitavaská H (2008) The relationships of modern pollen spectra to vegetation and climate along a steppe-forest-tundra transition in southern Siberia, explored by decision trees. Holocene 18:1259–1271

    Google Scholar 

  • Peters J, De Baets B, Verhoest NEC, Samson R, Degroeve S, de Becker P, Huybrechts W (2007) Random forests as a tool for predictive ecohydrological modelling. Ecol Model 207:304–318

    Google Scholar 

  • Peyron O, Guiot J, Cheddadi R, Tarasov P, Reille M, de Beaulieu J-L, Bottema S, Andrieu V (1998) Climatic reconstruction of Europe for 18,000 yr BP from pollen data. Quat Res 49:183–196

    Google Scholar 

  • Peyron O, Jolly D, Bonnefille R, Vincens A, Guiot J (2000) Climate of East Africa 6000 14C yr BP as inferred from pollen data. Quat Res 54:90–101

    Google Scholar 

  • Peyron O, Bégeot C, Brewer S, Heiri O, Magny M, Millet L, Ruffaldi P, van Campo E, Yu G (2005) Lateglacial climatic changes in Eastern France (Lake Lautrey) from pollen, lake-levels, and chironomids. Quat Res 64:197–211

    Google Scholar 

  • Ploner A, Brandenburg C (2003) Modelling visitor attendance levels subject to day of the week and weather: a comparison between linear regression models and regression trees. J Nat Conserv 11:297–308

    Google Scholar 

  • Pourret O, Naïm P, Marcot B (eds) (2008) Bayesian networks. A practical guide to applications. Wiley, Chichester

    Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199

    Google Scholar 

  • Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufman, San Mateo

    Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. http://www.r-project.org

  • Racca JMJ, Philibert A, Racca R, Prairie YT (2001) A comparison between diatom-pH-inference models using artificial neural networks (ANN), weighted averaging (WA) and weighted averaging partial least square (WA-PLS) regressions. J Paleolimnol 26:411–422

    Google Scholar 

  • Racca JMJ, Wild M, Birks HJB, Prairie YT (2003) Separating wheat from chaff: diatom taxon selection using an artificial neural network pruning algorithm. J Paleolimnol 29:123–133

    Google Scholar 

  • Racca JMJ, Gregory-Eaves I, Pienitz R, Prairie YT (2004) Tailoring palaeolimnological diatom-based transfer functions. Can J Fish Aquat Sci 61:2440–2454

    Google Scholar 

  • Ramakrishnan N, Grama A (2001) Mining scientific data. Adv Comput 55:119–169

    Google Scholar 

  • Raymond B, Watts DJ, Burton H, Bonnice J (2005) Data mining and scientific data. Arct Antarct Alp Res 37:348–357

    Google Scholar 

  • Recknagel F, French M, Harkonen P, Yabunaka K-I (1997) Artificial neural network approach for modelling and prediction of algal blooms. Ecol Model 96:11–28

    CAS  Google Scholar 

  • Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150

    Google Scholar 

  • Rejwan C, Collins NC, Brunner LJ, Shuter BJ, Ridgway MS (1999) Tree regression analysis on the nesting habitat of smallmouth bass. Ecology 80:341–348

    Google Scholar 

  • Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf. Accessed 20 July 2011

  • Ridgeway G (2010) gbm. R package version 1.6-3.1. http://cran.r-project.org/web/packages/gbm/

  • Rieman B, Peterson JT, Clayton J, Howell P, Thurow R, Thompson W, Lee D (2001) Evaluation of potential effects of federal land management alternatives on trends of salmonids and their habitats in the interior Columbia River basin. Forest Ecol Manage 153:43–62

    Google Scholar 

  • Ripley BD (2008) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Google Scholar 

  • Roberts DR, Hamann A (2011) Predicting potential climate change impacts with bioclimate envelope models: a palaeoecological perspective. Global Ecol Biogeogr. doi:10.1111/j.1466-8238.2011.00657.x

  • Rose NL (2001) Fly-ash particles. In: Last WM, Smol JP (eds) Tracking environmental change using lake sediments, vol 2, Physical and geochemical methods. Kluwer Academic Publishers, Dordrecht, pp 319–349

    Google Scholar 

  • Rose NL, Juggins S, Watt J, Battarbee RW (1994) Fuel-type characterization of spheroidal carbonaceous particles using surface chemistry. Ambio 23:296–299

    Google Scholar 

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227

    Google Scholar 

  • Scull P, Franklin J, Chadwick OA (2005) The application of classification tree analysis to soil type prediction in a desert landscape. Ecol Model 181:1–15

    Google Scholar 

  • Simpson GL (2012) Chapter 15 Modern analogue techniques. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht

    Google Scholar 

  • Spadavecchia L, Williams M, Bell R, Stoy PC, Huntley B, van Wijk MT (2008) Topographic controls on the leaf area index and plant functional type of a tundra ecosystem. J Ecol 96:1238–1251

    Google Scholar 

  • Spitz F, Lek S (1999) Environmental impact prediction using neural network modelling. An example in wildlife damage. J Appl Ecol 36:317–326

    Google Scholar 

  • Steiner D, Pauling A, Nussbaumer SU, Nesje A, Luterbacher J, Wanner H, Zumbühl HJ (2008) Sensitivity of European glaciers to precipitation and temperature – two case studies. Clim Chang 90:413–441

    Google Scholar 

  • Stewart-Koster B, Bunn SE, Mackay SJ, Poff NL, Naiman RJ, Lake PS (2010) The use of Bayesian networks to guide investments in flow and catchment restoration for impaired river ecosystems. Freshw Biol 55:243–260

    Google Scholar 

  • Stockwell DRB, Noble IR (1992) Induction of sets of rules from animal distribution data: a robust and informative method of data analysis. Math Comput Sims 33:385–390

    Google Scholar 

  • Stockwell DRB, Peters D (1999) The GARP modelling system: problems and solutions to automated spatial prediction. Int J Geogr Info Sci 13:143–158

    Google Scholar 

  • Stockwell DRB, Peterson AT (2002) Effects of sample size on accuracy of species distribution models. Ecol Model 148:1–13

    Google Scholar 

  • Tarasov P, Peyron O, Guiot J, Brewer S, Volkova VS, Bezusko LG, Dorofeyuk NI, Kvavadze EV, Osipova IM, Panova NK (1999a) Late glacial maximum climate of the former Soviet Union and Mongolia reconstructed from pollen and plant macrofossil data. Clim Dyn 15:227–240

    Google Scholar 

  • Tarasov P, Guiot J, Cheddadi R, Andreev AA, Bezusko LG, Blyakharchuk TA, Dorofeyuk NI, Filimonova LV, Volkova VS, Zernitskayo VP (1999b) Climate in northern Eurasia 6000 years ago reconstructed from pollen data. Earth Planet Sci Lett 171:635–645

    CAS  Google Scholar 

  • Telford RJ, Birks HJB (2009) Design and evaluation of transfer functions in spatially structured environments. Quat Sci Rev 28:1309–1316

    Google Scholar 

  • ter Braak CJF (2009) Regression by L 1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model. J Chemometrics 23:217–228

    Google Scholar 

  • Therneau TM, Atkinson B [R port by Ripley B] (2011) rpart: recursive partitioning. R package version 3.1-50. http://cran.r-project.org/package/rpart

  • Thuiller W, Araújo MB, Lavorel S (2003) Generalized models vs, classification tree analysis: predicting spatial distributions of plant species at different scales. J Veg Sci 14:669–680

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    Google Scholar 

  • Ticehurst JL, Curtis A, Merritt WS (2011) Using Bayesian networks to complement conventional analyses to explore landholder management of native vegetation. Environ Model Softw 26:52–65

    Google Scholar 

  • Tsaor A, Allouche O, Steinitz O, Rotem D, Kadmon R (2007) A comparative evaluation of presence-only methods for modelling species distribution. Divers Distrib 13:397–405

    Google Scholar 

  • van Dijk ADJ, ter Braak CJF, Immink RG, Angenent GC, van Ham RCHJ (2008) Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control. Bioinformatics 24:26–33

    Google Scholar 

  • Vayssieres MP, Plant RE, Allen-Diaz BH (2000) Classification trees: an alternative non-parametric approach for predicting species distributions. J Veg Sci 11:679–694

    Google Scholar 

  • Vincenzi S, Zucchetta M, Franzoi P, Pellizzato M, Pranovi F, de Leo GA, Torricelli P (2011) Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecol Model 222:1471–1478

    Google Scholar 

  • Warner B, Misra M (1996) Understanding neural networks as statistical tools. Am Stat 50:284–293

    Google Scholar 

  • Wehrens R (2011) Chemometrics with R: multivariate analysis in the natural sciences and life sciences. Springer, New York

    Google Scholar 

  • Wehrens R, Buydens LMC (2007) Self- and super-organising maps in R: the kohonen package. J Stat Softw 21:1–19

    Google Scholar 

  • Weller AF, Harris AJ, Ware JA (2006) Artificial neural networks as potential classification tools for dinoflagellate cyst images: a case using the self-organizing map clustering algorithm. Rev Palaeobot Palynol 141:287–302

    Google Scholar 

  • Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use step-wise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189

    Google Scholar 

  • Williams JN, Seo C, Thorne J, Nelson JK, Erwin S, O’Brien JM, Schwartz MW (2009) Using species distribution models to predict new occurrences for rare plants. Divers Distrib 15:565–576

    Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann/Elsevier, Amsterdam

    Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Google Scholar 

Download references

Acknowledgements

We are indebted to Richard Telford, Steve Juggins, and John Smol for helpful comments and/or discussion. Whilst writing this chapter, GLS was supported by the European Union Seventh Framework Programme projects REFRESH (Contract N. 244121) and BioFresh (Contract No. 226874), and by the UK Natural Environment Research Council (grant NE/G020027/1).We are particularly grateful to Cathy Jenks for her editorial help. This is publication A359 from the Bjerknes Centre for Climate Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gavin L. Simpson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Simpson, G.L., Birks, H.J.B. (2012). Statistical Learning in Palaeolimnology. In: Birks, H., Lotter, A., Juggins, S., Smol, J. (eds) Tracking Environmental Change Using Lake Sediments. Developments in Paleoenvironmental Research, vol 5. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2745-8_9

Download citation

Publish with us

Policies and ethics