Statistical Learning in Palaeolimnology

Simpson, Gavin L.; Birks, H. John B.

doi:10.1007/978-94-007-2745-8_9

Gavin L. Simpson⁵ &
H. John B. Birks^5,6,7

Part of the book series: Developments in Paleoenvironmental Research ((DPER,volume 5))

7816 Accesses
32 Citations
6 Altmetric

Abstract

This chapter considers a range of numerical techniques that lie outside the familiar statistical methods of linear regression, analysis of variance, and generalised linear models or data-analytical techniques such as ordination, clustering, and partitioning. The techniques outlined have developed as a result of the spectacular increase in computing power since the 1980s. The methods make fewer distributional assumptions than classical statistical methods and can be applied to more complicated estimators and to huge data-sets. They are part of the ever-increasing array of ‘statistical learning’ techniques (sensu Hastie, Tibshirani, Friedman J, The elements of statistical learning, 2nd edn. Springer, New York, 2011) that try to make sense of the data at hand, to detect major patterns and trends, to understand ‘what the data say’, and thus to learn from the data.

A range of tree-based and network-based techniques are presented. These are classification and regression trees, multivariate regression trees, bagged trees, random forests, boosted trees, multivariate adaptive regression splines, artificial neural networks, self-organising maps, Bayesian networks, and genetic algorithms. Principal curves and surfaces are also discussed as they relate to unsupervised self-organising maps. The chapter concludes with a discussion of current developments in shrinkage methods and variable selection in statistical modelling that can help in model selection and can minimise collinearity problems. These include principal components regression, ridge regression, the lasso, and the elastic net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Pressure gradient between Los Angeles airport (LAX) and Daggert in mmHg.

References

Aalders I (2008) Modeling land-use decision behavior with Bayesian belief networks. Ecol Soc 13:16
Google Scholar
Aho K, Weaver T, Regele S (2011) Identification and siting of native vegetation types on disturbed land: demonstration of statistical methods. Appl Veg Sci 14:277–290
Google Scholar
Amsinck SL, Strzelczak A, Bjerring R, Landkildehus F, Lauridsen TL, Christoffersen K, Jeppesen E (2006) Lake depth rather than fish planktivory determines cladoceran community structure in Faroese lakes – evidence from contemporary data and sediments. Freshw Biol 51:2124–2142
CAS Google Scholar
Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York
Google Scholar
Anderson RP, Lew D, Peterson AT (2003) Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Model 162:211–232
Google Scholar
Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–139
Google Scholar
Balshi MS, McGuire AD, Duffy P, Flannigan M, Walsh J, Melillo J (2009) Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biol 15:578–600
Google Scholar
Banfield JD, Raftery AE (1992) Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. J Am Stat Assoc 87:7–16
Google Scholar
Barrows TT, Juggins S (2005) Sea-surface temperatures around the Australian margin and Indian Ocean during the last Glacial Maximum. Quat Sci Rev 24:1017–1047
Google Scholar
Barton AM, Nurse AM, Michaud K, Hardy SW (2011) Use of CART analysis to differentiate pollen of red pine (Pinus resinosa) and jack pine (P. banksiana) in New England. Quat Res 75:18–23
Google Scholar
Belgrano A, Malmgren BA, Lindahl O (2001) Application of artificial neural networks (ANN) to primary production time-series data. J Plankton Res 23:651–658
CAS Google Scholar
Benito Garzón M, Blazek R, Neteler M, Sánchez de Dios R, Sainz Ollero H, Furlanello C (2006) Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol Model 197:383–393
Google Scholar
Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2007) Predictive modelling of tree species distributions on the Iberian Peninsula during the Last Glacial Maximum and Mid-Holocene. Ecography 30:120–134
Google Scholar
Benito Garzón M, Sánchez de Dios R, Sainz Ollero H (2008) Effects of climate change on the distribution of Iberian tree species. Appl Veg Sci 11:169–178
Google Scholar
Birks HH, Mathewes RW (1978) Studies in the vegetational history of Scotland. V. Late Devensian and early Flandrian pollen and macrofossil stratigraphy at Abernethy Forest, Inverness-shire. New Phytol 80:455–484
Google Scholar
Birks HJB (1995) Quantitative palaeoenvironmental reconstructions. In: Maddy D, Brew J (eds) Statistical modelling of quaternary science data, vol 5, Technical guide. Quaternary Research Association, Cambridge, pp 161–254
Google Scholar
Birks HJB (2012a) Chapter 2 Overview of numerical methods in palaeolimnology. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) 2012. Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Birks HJB (2012a) Chapter 11 Stratigraphical data analysis. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Birks HJB, Gordon AD (1985) Numerical methods in Quaternary pollen analysis. Academic, London
Google Scholar
Birks HJB, Jones VJ (2012) Chapter 3 Data-sets. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Birks HJB, Line JM, Juggins S, Stevenson AC, ter Braak CJF (1990) Diatoms and pH reconstruction. Philos Trans R Soc B 327:263–278
Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Clarendon, Oxford
Google Scholar
Bishop CM (2007) Pattern recognition and machine learner. Springer, Dordrecht
Google Scholar
Bjerring R, Becares E, Declerck S et~al (2009) Subfossil Cladocera in relation to contemporary environmental variables in 54 pan-European lakes. Freshw Biol 54:2401–2417
CAS Google Scholar
Blaauw M, Heegaard E (2012) Chapter 12 Estimation of age-depth relationships. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Borggaard C, Thodberg HH (1992) Optimal minimal neural interpretation of spectra. Anal Chem 64:545–551
CAS Google Scholar
Bourg NA, McShea WJ, Gill DE (2005) Putting a CART before the search: successful habitat prediction for a rare forest herb. Ecology 86:2793–2804
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Google Scholar
Brosse S, Guégan J-F, Tourenq J-N, Lek S (1999) The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a mesotrophic lake. Ecol Model 120:299–311
Google Scholar
Brunelle A, Rehfeldt GE, Bentz B, Munson AS (2008) Holocene records of Dendroctonus bark beetles in high elevation pine forests of Idaho and Montana, USA. Ecol Manage 255:836–846
Google Scholar
Burman P, Chow E, Nolan D (1994) A cross-validatory method for dependent data. Biometrika 81:351–358
Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York
Google Scholar
Cairns DM (2001) A comparison of methods for predicting vegetation type. Plant Ecol 156:3–18
Google Scholar
Caley P, Kuhnert PM (2006) Application and evaluation of classification trees for screening unwanted plants. Austral Ecol 31:647–655
Google Scholar
Carlisle DM, Wolock DM, Meador MR (2011) Alteration of streamflow magnitudes and potential ecological consequences: a multiregional assessment. Front Ecol Environ 9:264–270
Google Scholar
Castelletti A, Soncini-Sessa R (2007a) Bayesian Networks and participatory modelling in water resource management. Environ Model Softw 22:1075–1088
Google Scholar
Castelletti A, Soncini-Sessa R (2007b) Coupling real-time and control and socio-economic issues in participatory river basin planning. Environ Model Softw 22:1114–1128
Google Scholar
Céréghino R, Giraudel JL, Compin A (2001) Spatial analysis of stream invertebrates distribution in the Adour-Garonne drainage basin (France), using Kohonen self-organizing maps. Ecol Model 146:167–180
Google Scholar
Černá L, Chytrý M (2005) Supervised classification of plant communities with artificial neural networks. J Veg Sci 16:407–414
Google Scholar
Chapman DS (2010) Weak climatic associations among British plant distributions. Global Ecol Biogeogr 19:831–841
Google Scholar
Chapman DS, Purse BV (2011) Community versus single-species distribution models for British plants. J Biogeogr 38:1524–1535
Google Scholar
Chapman DS, Bonn A, Kunin WE, Cornell SJ (2010) Random Forest characterization of upland vegetation and management burning from aerial imagery. J Biogeogr 37:37–46
Google Scholar
Chatfield C (1993) Neural networks: forecasting breakthrough or passing fad? Int J Forecast 9:1–3
Google Scholar
Chon T-S (2011) Self-organising maps applied to ecological sciences. Ecol Inform 6:50–61
Google Scholar
Chytrý M, Jarošik V, Pyšek P, Hájek O, Knollová I, Tichý L, Danihelka J (2008) Separating habitat invasibility by alien plants from the actual level of invasion. Ecology 89:1541–1553
Google Scholar
Copas JB (1983) Regression, prediction and shrinkage. J R Stat Soc Ser B 45:311–354
Google Scholar
Cutler A, Stevens JR (2006) Random forests for microarrays. Methods Enzymol 411:422–432
CAS Google Scholar
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792
Google Scholar
Dahlgren JP (2010) Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecol Lett 13:E7–E9
Google Scholar
Davidson TA, Sayer CD, Perrow M, Bramm M, Jeppesen E (2010a) The simultaneous inference of zooplanktivorous fish and macrophyte density from sub-fossil cladoceran assemblages: a multivariate regression tree approach. Freshw Biol 55:546–564
CAS Google Scholar
Davidson TA, Sayer CD, Langdon PG, Burgess A, Jackson MJ (2010b) Inferring past zooplanktivorous fish and macrophyte density in a shallow lake: application of a new regression tree model. Freshw Biol 55:584–599
Google Scholar
De’ath G (1999) Principal curves: a new technique for indirect and direct gradient analysis. Ecology 80:2237–2253
Google Scholar
De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1108–1117
Google Scholar
De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251
Google Scholar
De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192
Google Scholar
De’ath G, Fabricius KE (2010) Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef. Ecol Appl 20:840–850
Google Scholar
DeFries RS, Rudel T, Uriarte M, Hansen M (2010) Deforestation driven by urban population growth and agricultural trade in the twenty-first century. Nat Geosci 3:178–181
CAS Google Scholar
Despagne F, Massart D-L (1998) Variable selection for neural networks in multivariate calibration. Chemometrics Intell Lab Syst 40:145–163
CAS Google Scholar
D’heygere T, Goethals PLM, de Pauw N (2003) Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates. Ecol Model 160:291–300
Google Scholar
Dobrowski SZ, Thorne JH, Greenberg JA, Safford HD, Mynsberge AR, Crimins SM, Swanson AK (2011) Modeling plant ranges over 75 years of climate change in California, USA: temporal transferability and species traits. Ecol Monogr 81:241–257
Google Scholar
Dutilleul P, Cumming BF, Lontoc-Roy M (2012) Chapter 16 Autocorrelogram and periodogram analyses of palaeolimnological temporal series from lakes in central and western North America to assess shifts in drought conditions. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Google Scholar
Efron B, Tibshirani R (1991) Statistical data analysis in the computer age. Science 253:390–395
CAS Google Scholar
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, London
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Google Scholar
Elith J, Burgman M (2002) Predictions and their validation: rare plants in the Central Highlands, Victoria, Australia. In: Scott JM, Heglund P, Morrison ML, Raven PH (eds) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC
Google Scholar
Elith J, Leathwick JR (2007) Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines. Divers Distrib 13:265–275
Google Scholar
Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A et~al (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151
Google Scholar
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813
CAS Google Scholar
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge
Google Scholar
Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–748
Google Scholar
Franklin J (2010) Mapping species distributions — spatial inference and prediction. Cambridge University Press, Cambridge
Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285
Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
Google Scholar
Friedman G, Meulman JJ (2003) Multivariate adaptive regression trees with application in epidemiology. Stat Med 22:1365–1381
Google Scholar
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407
Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Software 33:1–22
Google Scholar
Furlanello C, Neteler M, Merler S,Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C (2003) GIS and the random forests predictor: integration in R for tick-borne disease risk. In: Hornik K, Leitch F, Zeileis A (eds) Proceedings of the third international workshop on distributed statistical computings, pp 1–11
Google Scholar
Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160:249–264
Google Scholar
Giraudel JL, Lek S (2001) A comparison of self-organising map algorithm and some conventional statistical methods for ecological community ordination. Ecol Model 146:329–339
Google Scholar
Gordon AD (1973) Classifications in the presence of constraints. Biometrics 29:821–827
Google Scholar
Gordon AD, Birks HJB (1972) Numerical methods in Quaternary palaeoecology. I. Zonation of pollen diagrams. New Phytol 71:961–979
Google Scholar
Gordon AD, Birks HJB (1974) Numerical methods in Quaternary palaeoecology. II. Comparison of pollen diagrams. New Phytol 73:221–249
Google Scholar
Goring S, Lacourse T, Pellatt MG, Walker IR, Matthewes RW (2010) Are pollen-based climate models improved by combining surface samples from soil and lacustrine substrates? Rev Palaeobot Palynol 162:203–212
Google Scholar
Grieger B (2002) Interpolating paleovegetation data with an artificial neural network approach. Global Planet Change 34:199–208
Google Scholar
Guégan J-F, Lek S, Oberdorff T (1998) Energy availability and habitat heterogeneity predict global riverine fish diversity. Nature 391:382–384
Google Scholar
Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84:502–516
Google Scholar
Hastie T, Tibshirani R, Friedman J (2011) The elements of statistical learning, 2nd edn. Springer, New York
Google Scholar
Haykin S (1999) Neural networks, 2nd edn. Prentice-Hall, Upper Saddle River
Google Scholar
Hejda M, Pyšek P, Jarošik V (2009) Impact of invasive plants on the species richness, diversity and composition of invaded communities. J Ecol 97:393–403
Google Scholar
Herzschuh U, Birks HJB (2010) Evaluating the indicator value of Tibetan pollen taxa for modern vegetation and climate. Rev Palaeobot Palynol 160:197–208
Google Scholar
Hoerl AE, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Google Scholar
Holmqvist BH (2005) Classification of large pollen datasets using neural networks with application to mapping and modelling pollen data. LUNDQUA report 39, Lund University
Google Scholar
Horsak M, Chytrý M, Pokryszko BM, Danihelka J, Ermakov N, Hajek M, Hajkova P, Kintrova K, Koci M, Kubesova S, Lustyk P, Otypkova Z, Pelánková B, Valachovic M (2010) Habitats of relict terrestrial snails in southern Siberia: lessons for the reconstruction of palaeoenvironments of full-glacial Europe. J Biogeogr 37:1450–1462
Google Scholar
Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mongr 68:465–485
Google Scholar
Iverson LR, Prasad AM (2001) Potential changes in tree species richness and forest community types following climate change. Ecosystems 4:186–199
CAS Google Scholar
Iverson LR, Prasad AM, Schwartz MW (1999) Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virgiana. Ecol Model 115:77–93
Google Scholar
Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. Forest Ecol Manage 254:390–406
Google Scholar
Jacob G, Marriott FHC, Robbins PA (1997) Fitting curves to human respiratory data. Appl Stat 46:235–243
Google Scholar
Jensen FV, Nielsen TD (2007) Bayesian networks and decision graphs, 2nd edn. Springer, New York
Google Scholar
Jeschke JM, Strayer DL (2008) Usefulness of bioclimatic models for studying climate change and invasive species. Ann NY Acad Sci 1134:1–24
Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Google Scholar
Juggins S, Birks HJB (2012) Chapter 14 Quantitative environmental reconstructions from biological data. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Juggins S, Telford RJ (2012) Chapter 5 Exploratory data analysis and data display. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Kallimanis AS, Ragia V, Sgardelis SP, Pantis JD (2007) Using regression trees to predict alpha diversity based upon geographical and habitat characteristics. Biodivers Conserv 16:3863–3876
Google Scholar
Keith RP, Veblen TT, Schoennagel TL, Sherriff RL (2010) Understory vegetation indicates historic fire regimes in ponderosa pine-dominated ecosystems in the Colorado Front Range. J Veg Sci 21:488–499
Google Scholar
Kohonen T (2001) Self-organising maps, 3rd edn. Springer, Berlin
Google Scholar
Korb KB, Nicholson AE (2004) Bayesian artificial intelligence. Chapman & Hall, Boca Raton
Google Scholar
Kragt ME, Newham LTH, Jakeman AJ (2009) A Bayesian network approach to integrating economic and biophysical modelling. In: Anderssen RS, Braddock RD, Newham LTH (eds) 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand and International Association for Mathematics and Computers in Simulation. pp 2377–2383
Google Scholar
Kucera M, Weinelt M, Kiefer T, Pflaumann U, Hayes A, Chen MT, Mix AC, Barrows TT, Cortijo E, Duprat J, Juggins S, Waelbroeck C (2005) Reconstruction of sea-surface temperatures from assemblages of planktonic foraminifera: multi-technique approach based on geographically constrained calibration data sets and its application to glacial Atlantic and Pacific Oceans. Quat Sci Rev 24:951–998
Google Scholar
Larsen DR, Speckman PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics 60:543–549
Google Scholar
Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range shifts: model differences and model reliability. Global Change Biol 12:1568–1584
Google Scholar
Leathwick JR, Rowe D, Richardson J, Elith J, Hastie T (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw Biol 50:2034–2052
Google Scholar
Leathwick JR, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199:188–196
Google Scholar
Legendre P, Birks HJB (2012a) Chapter 7 Clustering and partitioning. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniqlues, vol 5. Springer, Dordrecht
Google Scholar
Legendre P, Birks HJB (2012a) Chapter 8 From classical to canonical ordination. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Lek S, Guégan JF (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73
Google Scholar
Lek S, Guégan J-F (2000) Artificial neuronal networks: application to ecology and evolution. Springer, Berlin
Google Scholar
Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S (1996a) Application of neural networks to modelling nonlinear relationships in ecology. Ecol Model 90:39–52
Google Scholar
Lek S, Dimopoulos I, Fabre A (1996b) Predicting phosphorus concentration and phosphorus load from watershed characteristics using backpropagation neural networks. Acta Oecol 17:43–53
Google Scholar
Lindblah M, O’Connor R, Jacobson GL Jr (2002) Morphometric analysis of pollen grains for palaeoecological studies: classification of Picea from eastern North America. Am J Bot 89:1459–1467
Google Scholar
Lindblah M, Jacobson GL Jr, Schauffler M (2003) The postglacial history of three Picea species in New England, USA. Quat Res 59:61–69
Google Scholar
Lindström J, Kokko H, Ranta E, Lindén H (1998) Predicting population fluctuations with artificial neural networks. Wildl Biol 4:47–53
Google Scholar
Lotter AF, Anderson NJ (2012) Chapter 18 Limnological responses to environmental changes at inter-annual to decadal time-scales. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Malmgren BA, Nordlund U (1997) Application of artificial neural networks to paleoceanographic data. Palaeogeogr Palaeoclim Palaeoecol 136:359–373
Google Scholar
Malmgren BA, Winter A (1999) Climate zonation in Puerto Rico based on principal component analysis and an artificial neural network. J Climate 12:977–985
Google Scholar
Malmgren BA, Kucera M, Nyberg J, Waelbroeck C (2001) Comparison of statistical and artificial neural network techniques for estimating past sea surface temperatures from planktonic foraminfer census data. Paleoceanography 16:520–530
Google Scholar
Manel S, Dias JM, Buckton ST, Ormerord SJ (1999a) Alternative methods for predicting species distribution: an illustration with Himalayan river birds. J Appl Ecol 36:734–747
Google Scholar
Manel S, Dias JM, Ormerord SJ (1999b) Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird. Ecol Model 120:337–347
Google Scholar
Marcot BG, Holthausen RS, Raphael MG, Rowland MG, Wisdom MJ (2001) Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. Forest Ecol Manage 153:29–42
Google Scholar
Martens H, Næes T (1989) Multivariate calibration. Wiley, Chichester
Google Scholar
Maslow AH (1996) The psychology of science: a reconnaissance. Maurice Bassett Publishing
Google Scholar
Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemometrics Intell Lab Syst 83:99–113
CAS Google Scholar
Melssen W, Bulent U, Buydens L (2007) SOMPLS: a supervised self-organising map-partial least squares algorithm for multivariate regression problems. Chemometrics Intell Lab Syst 86:102–120
CAS Google Scholar
Michaelson J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–686
Google Scholar
Milbarrow S (2011) Earth. R package version 3.2-0. http://cran.r-project.org/packages=earth
Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton
Google Scholar
Miller J, Franklin J (2002) Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence. Ecol Model 157:227–247
Google Scholar
Moisen GG, Frescino TS (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–225
Google Scholar
Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434
Google Scholar
Mundry R, Nunn CL (2009) Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat 173:119–123
Google Scholar
Murphy B, Jansen C, Murray J, de Barro P (2010) Risk analysis on the Australian release of Aedes aegypti (L.) (Diptera: Culicidae) Containing Wolbachia. CSIRO
Google Scholar
Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12:1061–1068
Google Scholar
Nakagawa S, Freckleton RP (2008) Missing inaction: the danger of ignoring missing data. Trends Ecol Evol 23:592–596
Google Scholar
Newton AC, Marshall E, Schreckenberg K, Golicher D, te Velde DW, Edouard F, Arancibia E (2006) Use of a Bayesian belief network to predict the impacts of commercializing non-timber forest products on livelihoods. Ecol Soc 11:24
Google Scholar
Newton AC, Stewart GB, Diaz A, Golicher D, Pullin AS (2007) Bayesian belief networks as a tool for evidence-based conservation management. J Nat Conserv 15:144–160
Google Scholar
Nyberg H, Malmgren BA, Kuijpers A, Winter A (2002) A centennial-scale variability of tropical North Atlantic surface hydrology during the late Holocene. Palaeogeogr Palaeoclim Palaeoecol 183:25–41
Google Scholar
Næs T, Kvaal K, Isaksson T, Miller C (1993) Artificial neural networks in multivariate calibration. J Near IR Spectrosc 1:1–11
Google Scholar
Næs T, Isaksson T, Fearn T, Davies T (2002) A user-friendly guide to multivariate calibration and classification. NIR Publications, Chichester
Google Scholar
Olden JD (2000) An artificial neural network approach for studying phytoplankton succession. Hydrobiologia 436:131–143
CAS Google Scholar
Olden JD, Jackson DA (2002) Illuminating the ‘black box’: a randomization approach for understanding variable contributions in artificial neural networks. Ecol Model 154:135–150
Google Scholar
Olden JD, Joy MK, Death RG (2004) An accurate comparison on methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178:389–397
Google Scholar
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a paper for ecologists. Quart Rev Biol 83:171–193
Google Scholar
Ôzesmi SL, Tan CO, Özesmi U (2006) Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecol Model 195:83–93
Google Scholar
Pakeman RJ, Torvell L (2008) Identifying suitable restoration sites for a scarce subarctic willow (Salix arbuscula) using different information sources and methods. Plant Ecol Divers 1:105–114
Google Scholar
Park MY, Hastie T (2007) l1-regularization path algorithm for generalised linear models. J R Stat Soc Ser B 69:659–677
Google Scholar
Pearson RG, Thuiller W, Araújo MB, Martinez-Meyer E, Brotons L, McClean C, Miles L, Segurado P, DawsonTP LDC (2006) Model-based uncertainty in species range prediction. J Biogeogr 33:1704–1711
Google Scholar
Pelánková B, Kuneš P, Chytrý M, Jankovská V, Ermakov N, Svobodová-Svitavaská H (2008) The relationships of modern pollen spectra to vegetation and climate along a steppe-forest-tundra transition in southern Siberia, explored by decision trees. Holocene 18:1259–1271
Google Scholar
Peters J, De Baets B, Verhoest NEC, Samson R, Degroeve S, de Becker P, Huybrechts W (2007) Random forests as a tool for predictive ecohydrological modelling. Ecol Model 207:304–318
Google Scholar
Peyron O, Guiot J, Cheddadi R, Tarasov P, Reille M, de Beaulieu J-L, Bottema S, Andrieu V (1998) Climatic reconstruction of Europe for 18,000 yr BP from pollen data. Quat Res 49:183–196
Google Scholar
Peyron O, Jolly D, Bonnefille R, Vincens A, Guiot J (2000) Climate of East Africa 6000 ¹⁴C yr BP as inferred from pollen data. Quat Res 54:90–101
Google Scholar
Peyron O, Bégeot C, Brewer S, Heiri O, Magny M, Millet L, Ruffaldi P, van Campo E, Yu G (2005) Lateglacial climatic changes in Eastern France (Lake Lautrey) from pollen, lake-levels, and chironomids. Quat Res 64:197–211
Google Scholar
Ploner A, Brandenburg C (2003) Modelling visitor attendance levels subject to day of the week and weather: a comparison between linear regression models and regression trees. J Nat Conserv 11:297–308
Google Scholar
Pourret O, Naïm P, Marcot B (eds) (2008) Bayesian networks. A practical guide to applications. Wiley, Chichester
Google Scholar
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199
Google Scholar
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufman, San Mateo
Google Scholar
R Development Core Team (2011) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. http://www.r-project.org
Racca JMJ, Philibert A, Racca R, Prairie YT (2001) A comparison between diatom-pH-inference models using artificial neural networks (ANN), weighted averaging (WA) and weighted averaging partial least square (WA-PLS) regressions. J Paleolimnol 26:411–422
Google Scholar
Racca JMJ, Wild M, Birks HJB, Prairie YT (2003) Separating wheat from chaff: diatom taxon selection using an artificial neural network pruning algorithm. J Paleolimnol 29:123–133
Google Scholar
Racca JMJ, Gregory-Eaves I, Pienitz R, Prairie YT (2004) Tailoring palaeolimnological diatom-based transfer functions. Can J Fish Aquat Sci 61:2440–2454
Google Scholar
Ramakrishnan N, Grama A (2001) Mining scientific data. Adv Comput 55:119–169
Google Scholar
Raymond B, Watts DJ, Burton H, Bonnice J (2005) Data mining and scientific data. Arct Antarct Alp Res 37:348–357
Google Scholar
Recknagel F, French M, Harkonen P, Yabunaka K-I (1997) Artificial neural network approach for modelling and prediction of algal blooms. Ecol Model 96:11–28
CAS Google Scholar
Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150
Google Scholar
Rejwan C, Collins NC, Brunner LJ, Shuter BJ, Ridgway MS (1999) Tree regression analysis on the nesting habitat of smallmouth bass. Ecology 80:341–348
Google Scholar
Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf. Accessed 20 July 2011
Ridgeway G (2010) gbm. R package version 1.6-3.1. http://cran.r-project.org/web/packages/gbm/
Rieman B, Peterson JT, Clayton J, Howell P, Thurow R, Thompson W, Lee D (2001) Evaluation of potential effects of federal land management alternatives on trends of salmonids and their habitats in the interior Columbia River basin. Forest Ecol Manage 153:43–62
Google Scholar
Ripley BD (2008) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Google Scholar
Roberts DR, Hamann A (2011) Predicting potential climate change impacts with bioclimate envelope models: a palaeoecological perspective. Global Ecol Biogeogr. doi:10.1111/j.1466-8238.2011.00657.x
Rose NL (2001) Fly-ash particles. In: Last WM, Smol JP (eds) Tracking environmental change using lake sediments, vol 2, Physical and geochemical methods. Kluwer Academic Publishers, Dordrecht, pp 319–349
Google Scholar
Rose NL, Juggins S, Watt J, Battarbee RW (1994) Fuel-type characterization of spheroidal carbonaceous particles using surface chemistry. Ambio 23:296–299
Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Scull P, Franklin J, Chadwick OA (2005) The application of classification tree analysis to soil type prediction in a desert landscape. Ecol Model 181:1–15
Google Scholar
Simpson GL (2012) Chapter 15 Modern analogue techniques. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds) Tracking environmental change using lake sediments. Data handling and numerical techniques, vol 5. Springer, Dordrecht
Google Scholar
Spadavecchia L, Williams M, Bell R, Stoy PC, Huntley B, van Wijk MT (2008) Topographic controls on the leaf area index and plant functional type of a tundra ecosystem. J Ecol 96:1238–1251
Google Scholar
Spitz F, Lek S (1999) Environmental impact prediction using neural network modelling. An example in wildlife damage. J Appl Ecol 36:317–326
Google Scholar
Steiner D, Pauling A, Nussbaumer SU, Nesje A, Luterbacher J, Wanner H, Zumbühl HJ (2008) Sensitivity of European glaciers to precipitation and temperature – two case studies. Clim Chang 90:413–441
Google Scholar
Stewart-Koster B, Bunn SE, Mackay SJ, Poff NL, Naiman RJ, Lake PS (2010) The use of Bayesian networks to guide investments in flow and catchment restoration for impaired river ecosystems. Freshw Biol 55:243–260
Google Scholar
Stockwell DRB, Noble IR (1992) Induction of sets of rules from animal distribution data: a robust and informative method of data analysis. Math Comput Sims 33:385–390
Google Scholar
Stockwell DRB, Peters D (1999) The GARP modelling system: problems and solutions to automated spatial prediction. Int J Geogr Info Sci 13:143–158
Google Scholar
Stockwell DRB, Peterson AT (2002) Effects of sample size on accuracy of species distribution models. Ecol Model 148:1–13
Google Scholar
Tarasov P, Peyron O, Guiot J, Brewer S, Volkova VS, Bezusko LG, Dorofeyuk NI, Kvavadze EV, Osipova IM, Panova NK (1999a) Late glacial maximum climate of the former Soviet Union and Mongolia reconstructed from pollen and plant macrofossil data. Clim Dyn 15:227–240
Google Scholar
Tarasov P, Guiot J, Cheddadi R, Andreev AA, Bezusko LG, Blyakharchuk TA, Dorofeyuk NI, Filimonova LV, Volkova VS, Zernitskayo VP (1999b) Climate in northern Eurasia 6000 years ago reconstructed from pollen data. Earth Planet Sci Lett 171:635–645
CAS Google Scholar
Telford RJ, Birks HJB (2009) Design and evaluation of transfer functions in spatially structured environments. Quat Sci Rev 28:1309–1316
Google Scholar
ter Braak CJF (2009) Regression by L ₁ regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model. J Chemometrics 23:217–228
Google Scholar
Therneau TM, Atkinson B [R port by Ripley B] (2011) rpart: recursive partitioning. R package version 3.1-50. http://cran.r-project.org/package/rpart
Thuiller W, Araújo MB, Lavorel S (2003) Generalized models vs, classification tree analysis: predicting spatial distributions of plant species at different scales. J Veg Sci 14:669–680
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Google Scholar
Ticehurst JL, Curtis A, Merritt WS (2011) Using Bayesian networks to complement conventional analyses to explore landholder management of native vegetation. Environ Model Softw 26:52–65
Google Scholar
Tsaor A, Allouche O, Steinitz O, Rotem D, Kadmon R (2007) A comparative evaluation of presence-only methods for modelling species distribution. Divers Distrib 13:397–405
Google Scholar
van Dijk ADJ, ter Braak CJF, Immink RG, Angenent GC, van Ham RCHJ (2008) Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control. Bioinformatics 24:26–33
Google Scholar
Vayssieres MP, Plant RE, Allen-Diaz BH (2000) Classification trees: an alternative non-parametric approach for predicting species distributions. J Veg Sci 11:679–694
Google Scholar
Vincenzi S, Zucchetta M, Franzoi P, Pellizzato M, Pranovi F, de Leo GA, Torricelli P (2011) Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecol Model 222:1471–1478
Google Scholar
Warner B, Misra M (1996) Understanding neural networks as statistical tools. Am Stat 50:284–293
Google Scholar
Wehrens R (2011) Chemometrics with R: multivariate analysis in the natural sciences and life sciences. Springer, New York
Google Scholar
Wehrens R, Buydens LMC (2007) Self- and super-organising maps in R: the kohonen package. J Stat Softw 21:1–19
Google Scholar
Weller AF, Harris AJ, Ware JA (2006) Artificial neural networks as potential classification tools for dinoflagellate cyst images: a case using the self-organizing map clustering algorithm. Rev Palaeobot Palynol 141:287–302
Google Scholar
Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use step-wise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189
Google Scholar
Williams JN, Seo C, Thorne J, Nelson JK, Erwin S, O’Brien JM, Schwartz MW (2009) Using species distribution models to predict new occurrences for rare plants. Divers Distrib 15:565–576
Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann/Elsevier, Amsterdam
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Google Scholar

Download references

Acknowledgements

We are indebted to Richard Telford, Steve Juggins, and John Smol for helpful comments and/or discussion. Whilst writing this chapter, GLS was supported by the European Union Seventh Framework Programme projects REFRESH (Contract N. 244121) and BioFresh (Contract No. 226874), and by the UK Natural Environment Research Council (grant NE/G020027/1).We are particularly grateful to Cathy Jenks for her editorial help. This is publication A359 from the Bjerknes Centre for Climate Research.

Author information

Authors and Affiliations

Environmental Change Research Centre, University College London, Pearson Building, Gower Street, London, WC1E 6BT, UK
Gavin L. Simpson & H. John B. Birks
Department of Biology and Bjerknes Centre for Climate Research, University of Bergen, 7803, Bergen, N-5020, Norway
H. John B. Birks
School of Geography and the Environment, University of Oxford, Oxford, OX1 3QY, UK
H. John B. Birks

Authors

Gavin L. Simpson
View author publications
You can also search for this author in PubMed Google Scholar
H. John B. Birks
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gavin L. Simpson .

Editor information

Editors and Affiliations

Allegaten 41, Bergen, 5007, Norway
H. John B. Birks
Dept. Geobiology, Lab. Palaeobotany and Palynology, University Utrecht, Budapestlaan 4, Utrecht, 3584 CD, Netherlands
André F. Lotter
, School of Geography,, Newcastle University, Daysh Building 14b, Newcastle-upon-Tyne, NE1 7RU, United Kingdom
Steve Juggins
Dept. Biology, Paleoecological Environmental, Queen's University, Kingston, K7L 3N6, Ontario, Canada
John P. Smol

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Simpson, G.L., Birks, H.J.B. (2012). Statistical Learning in Palaeolimnology. In: Birks, H., Lotter, A., Juggins, S., Smol, J. (eds) Tracking Environmental Change Using Lake Sediments. Developments in Paleoenvironmental Research, vol 5. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2745-8_9

Download citation

DOI: https://doi.org/10.1007/978-94-007-2745-8_9
Published: 08 February 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2744-1
Online ISBN: 978-94-007-2745-8
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics