Skip to main content

Proper Data Management as a Scientific Foundation for Reliable Species Distribution Modeling

  • Chapter
  • First Online:
Predictive Species and Habitat Modeling in Landscape Ecology

Abstract

Data management, storage, curation, and dissemination are mainstays of computer modeling. Indeed, a traditional view of computer modeling has perpetuated the notion of “garbage in, garbage out” (GIGO), which serves as a constant reminder that, no matter how sophisticated the analysis, computers will “unquestioningly process” whatever type of data are provided regardless of its quality or suitability (Pearson 2007). In ecology, the datasets used in computer modeling are inherently complex and often characterized by missing values, dynamic environmental variables, and other factors leading to numerous data anomalies ( Michener et al. 1997; Michener and Brunt 2000). Ecologists have long recognized, however, that although data quality is undoubtedly important, using different types of data, even messy ones, can still prove informative, and facilitates new questions, methods, and synergies in science and society.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aldridge CL, Boyce MS (2007) Linking occurrence and fitness to persistence: habitat-based approach for endangered Greater Sage-Grouse. Ecol Appl 17:508–526.

    Article  PubMed  Google Scholar 

  • Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York, NY.

    Book  Google Scholar 

  • Anderson DR, Burnham KP, Gould WR, Cherry S (2001) Concerns about finding effects that are actually spurious. Wildl Soc Bull 29:311–316.

    Google Scholar 

  • Araújo MB, Guisan A (2006) Five (or so) challenges for species distribution modelling. J Biogeogr 33:1677–1688.

    Article  Google Scholar 

  • Araújo MB, Luoto M (2007) The importance of biotic interactions for modelling species distributions under climate change. Global Ecol Biogeogr 16:743–753.

    Article  Google Scholar 

  • Araújo MB, Williams PH, Fuller RJ (2002) Dynamics of extinction and the selection of nature reserves. Proc R Soc Lond Ser B 269:1971–1980.

    Article  Google Scholar 

  • Austin MP (2002) Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. Ecol Model 157:101–118.

    Article  Google Scholar 

  • Austin M (2006) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Model 200:1–19.

    Article  Google Scholar 

  • Barry S, Elith J (2006) Error and uncertainty in habitat models. J Appl Ecol 43:413–423.

    Article  Google Scholar 

  • Bibby CJ, Burgess ND, Hill DA, Mustoe S (2000) Bird census techniques. Academic Press, San Diego, CA.

    Google Scholar 

  • Bishop JA, Myers WL (2005) Associations between avian functional guild response and regional landscape properties for conservation planning. Ecol Indic 5:33–48.

    Article  Google Scholar 

  • Braun CE (2005) Techniques for wildlife investigations and management. The Wildlife Society, Bethesda, MD.

    Google Scholar 

  • Breiman L (2001a) Random forests. Mach Learn 45:5–32.

    Article  Google Scholar 

  • Breiman L (2001b) Statistical modeling: the two cultures. Stat Sci 16:199–231.

    Article  Google Scholar 

  • Brennan JM, Bender DJ, Contreras TA, Fahrig L (2002) Focal patch landscape studies for wildlife management: optimizing sampling effort across scales. In Lui J, Taylor WW (eds) Integrating landscape ecology into natural resource management. Cambridge University Press, NY.

    Google Scholar 

  • Brotons L, Thuiller W, Araújo MB, Hirzel AH (2004) Presence–absence versus presence-only modelling methods for predicting bird habitat suitability. Ecography 27:437–448.

    Article  Google Scholar 

  • Buckland ST (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and inference: a practical information-theoretic approach. Springer-Verlag, New York.

    Google Scholar 

  • Coudun C, Gégout JC (2006) The derivation of species response curves with Gaussian logistic regression is sensitive to sampling intensity and curve characteristics. Ecol Model 199:164–175.

    Article  Google Scholar 

  • Craig E, Huettmann F (2009) Using “blackbox” algorithms such as TreeNET and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. In Wang HF (ed) Intelligent data analysis: developing new methodologies through pattern discovery and recovery. Information Science Reference, Hershey, PA.

    Google Scholar 

  • D’Eon RG, Delparte D (2005) Effects of radio-collar position and orientation on GPS radio-collar performance, and the implications of PDOP in data screening. J Appl Ecol 42:383–388.

    Article  Google Scholar 

  • D’Eon RG, Serrouya R, Smith G, Kochanny C (2002) GPS radiotelemetry error and bias in mountainous terrain. Wildl Soc Bull 30:430–439.

    Google Scholar 

  • Donald PF, Fuller RJ (1998) Ornithological atlas: a review of uses and limitations. Bird Study 45:129–145.

    Article  Google Scholar 

  • Duke CS (2006) Data: share and share alike. Front Ecol Environ 4:395–395.

    Article  Google Scholar 

  • Duke CS (2007) Beyond data: reproducible research in ecology and environmental sciences – the author replies. Front Ecol Environ 5:67.

    Google Scholar 

  • Edwards TC, Cutler DR, Zimmermann NE, Geiser L, Moisen GG (2006) Effects of sample survey design on the accuracy of classification tree models in species distribution models. Ecol Model 199:132–141.

    Article  Google Scholar 

  • Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton JM, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151.

    Article  Google Scholar 

  • Elzinga CL (2001) Monitoring plant and animal populations. Blackwell Science, Malden, MA.

    Google Scholar 

  • Esanu JM, Uhlir PF (2003) The role of scientific and technical data and information in the public domain: proceedings of a symposium. National Academies Press, Washington, DC.

    Google Scholar 

  • Ferrier S, Guisan A (2006) Spatial modelling of biodiversity at the community level. J Appl Ecol 43:393–404.

    Article  Google Scholar 

  • Fortin MJ, Dale MRT (2005) Spatial analysis: a guide for ecologists. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • Frair JL, Nielsen SE, Merrill EH, Lele SR, Boyce MS, Munro RHM, Stenhouse GB, Beyer HL (2004) Removing GPS collar bias in habitat selection studies. J Appl Ecol 41:201–212.

    Article  Google Scholar 

  • Frair JL, Merrill EH, Allen JR, Boyce MS (2007) Know thy enemy: experience affects elk translocation success in risky landscapes. J Wildl Manag 71:541–554.

    Article  Google Scholar 

  • Gibbons DW, Donald PF, Bauer HG, Fornasari L, Dawson IK (2007) Mapping avian distributions: the evolution of bird atlases. Bird Study 54:324–334.

    Article  Google Scholar 

  • Graham CH, Ferrier S, Huettman F, Mortiz C, Peterson AT (2004) New developments in museum-based informatics and applications in biodiversity analysis. Trends Ecol Evol 19:497–503.

    Article  PubMed  Google Scholar 

  • Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Model 135:147–186.

    Article  Google Scholar 

  • Guisan A, Lehmann A, Ferrier S, Austin M, Overton JMcC, Aspinall R, Hastie T (2006) Making better biogeographical predictions of species’ distributions. J Appl Ecol 43:386–392.

    Article  Google Scholar 

  • Guisan A, Graham CH, Elith J, Huettmann F, Dudik M, Ferrier S, Hijmans R, Lehmann A., Li J, Lohmann LG, Loiselle B, Manion G, Moritz C, Nakamura M, Nakawawa Y., Overton JMcC, Peterson AT, Phllips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Williams SE, Wisz MS, Zimmermann NE (2007) Sensitivity of predictive species distribution models to change in grain size. Divers Distrib 13:332–340.

    Article  Google Scholar 

  • Hames RS, Rosenberg KV, Lowe JD, Dhondt AA (2001) Site reoccupation in fragmented landscapes: testing predictions of metapopulation theory. J Anim Ecol 70:182–190.

    Article  Google Scholar 

  • Hastie AT, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York.

    Google Scholar 

  • Heikkinen RK, Luoto M, Virkkala R, Pearson RG, Körber JH (2007) Biotic interactions improve prediction of boreal bird distributions at macro-scales. Global Ecol Biogeogr 16:754–763.

    Article  Google Scholar 

  • Hernandez PA, Graham CH, Master LL, Albert DL (2006) The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773–785.

    Article  Google Scholar 

  • Hirzel A, Guisan A (2002) Which is the optimal sampling strategy for habitat suitability modelling. Ecol Model 157:331–341.

    Article  Google Scholar 

  • Hochachka WM, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data-mining discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437.

    Article  Google Scholar 

  • Hollister JW, Walker HA (2007) Beyond data: reproducible research in ecology and environmental sciences. Front Ecol Environ 5:11–12.

    Google Scholar 

  • Huettmann F (2005) Databases and science-based management in the context of wildlife and habitat: toward a certified ISO standard for objective decision-making for the global community by using the internet. J Wildl Manag 69:466–472.

    Article  Google Scholar 

  • Huettmann F (2007) The digital teaching legacy of the International Polar Year (IPY): details of a present to the global village for achieving sustainability. Proceedings 18th International Workshop on Database and Expert Systems Applications, DEXA: 673–677.

    Google Scholar 

  • Huettmann F, Diamond AW (2006) Large-scale effects on the spatial distribution of seabirds in the Northwest Atlantic. Landsc Ecol 21:1089–1108.

    Article  Google Scholar 

  • Huettmann, F. (2009) The Global Need for, and Appreciation of, High-Quality Metadata in Biodiversity work. In: E. Spehn and C. Koerner (eds). Data Mining for Global Trends in Mountain Biodiversity. CRC Press, Taylor & Francis. pp 25–28.

    Google Scholar 

  • Jan L (2006) Database model for taxonomic and observation data. In Sahni S (ed) Proceedings of the 2nd IASTED international conference on advances in computer science and technology. ACTA Press, Puerto Vallarta, Mexico.

    Google Scholar 

  • Jochum K (2008) Benefits of using marginal opportunistic wildlife behavior data: constraints and applications across taxa – a dominance hierarchy example relevant for wildlife management. M.Sc. Thesis, University Hannover: Hannover, Germany.

    Google Scholar 

  • Kadmon R, Farber O, Danin A (2004) Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models. Ecol Appl 14:401–413.

    Article  Google Scholar 

  • Karasti H, Baker KS (2008) Digital data practices and the long term ecological research program growing global. Int J Digit Curation 3:42–58.

    Google Scholar 

  • Lutolf M, Kienast F, Guisan A (2006) The ghost of past species occurrence: improving species distribution models for presence-only data. J Appl Ecol 43:802–815.

    Article  Google Scholar 

  • MacKenzie DI (2005a) Was it there? Dealing with imperfect detection for species presence/absence data. Aust N-Z J Stat 47:65–74.

    Article  Google Scholar 

  • MacKenzie DI (2005b) What are the issues with presence–absence data for wildlife managers? J Wildl Manag 69:849–860.

    Article  Google Scholar 

  • MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, HInes JE (2006) Occupancy estimation and modeling: inferring patterns and dynamics of species. Elsevier, Burlington, MA.

    Google Scholar 

  • MacKenzie DI, Royle JA (2005). Designing occupancy studies: general advice and allocating survey effort. J Appl Ecol 42:1105–1114.

    Article  Google Scholar 

  • Magness DR, Huettmann F, and Morton JM (2008) Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209–229 in Smolinski TG, Milanova MG & Hassanien A-E (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg. 428 pp.

    Google Scholar 

  • Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931.

    Article  Google Scholar 

  • Manly BFJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies. Kluwer Academic Publishers, Boston, MA.

    Google Scholar 

  • Marzluff JM, Knick ST, Millspaugh JJ (2001) High-tech behavioral ecology: modeling the distribution of animal activities to better understand wildlife space use and resource selection. In Marzluff JM, Millspaugh JJ (eds) Radio-tracking and animal populations. Academic Press, San Diego, CA.

    Google Scholar 

  • McGowan K, Zuckerberg B (2008) Summary of results. In McGowan K, Corwin K (eds) The second atlas of breeding birds in New York State. Cornell University Press, Ithaca, NY.

    Google Scholar 

  • Meyer CB (2007) Does scale matter in predicting species distributions? Case study with the Marbled Murrelet. Ecol Appl 17:1474–1483.

    Article  CAS  PubMed  Google Scholar 

  • Michener WK, Brunt JW (eds) (2000) Ecological data: design, management, and processing. Blackwell Science, Malden, MA.

    Google Scholar 

  • Michener WK, Brunt JW, Helly JJ, Kirchner TB, Stafford SG (1997) Nongeospatial metadata for the ecological sciences. Ecol Appl 7:330–342.

    Article  Google Scholar 

  • Moen R, Pastor J, Cohen Y, Schwartz CC (1996) Effects of moose movement and habitat use on GPS collar performance. J Wildl Manag 60:659–668.

    Article  Google Scholar 

  • Moen R, Pastor J, Cohen Y (1997) Accuracy of GPS telemetry collar locations with differential correction. J Wildl Manag 61:530–539.

    Article  Google Scholar 

  • Nemitz, D. 2008 An assessment of sampling detectability for global bioidversity monitoring: results from sampling GRIDs in different climatic regions. MINK program, University of Goettingen, Germany, unpublished Masters thesis.

    Google Scholar 

  • Nielsen SE, Stenhouse GB, Boyce MS (2006) A habitat-based framework for grizzly bear conservation in Alberta. Biol Conserv 130:217–229.

    Article  Google Scholar 

  • Pearson RG (2007) Species’ distribution modeling for conservation educators and practitioners – synthesis. American Museum of Natural History. http://ncep.amnh.org. Accessed 7 May 2008.

  • Pearson RG, Raxworthy CJ, Nakamura M, Peterson AT (2007) Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. J Biogeogr 34:102–117.

    Article  Google Scholar 

  • Peng RD, Dominici F, Zeger SL (2006) Reproducible epidemiologic research. Am J Epidemiol 163:783–789.

    Article  PubMed  Google Scholar 

  • Pulliam HR (2000) On the relationship between niche and distribution. Ecol Lett 3:349–361.

    Article  Google Scholar 

  • Rempel RS, Rodgers AR (1997) Effects of differential correction on accuracy of a GPS animal location system. J Wildl Manag 61:525–530.

    Article  Google Scholar 

  • Rempel RS, Rodgers AR, Abraham KF (1995) Performance of a GPS animal location system under boreal forest canopy. J Wildl Manag 59:543–551.

    Article  Google Scholar 

  • Rodgers AR (2001) Recent telemetry technology. In: Millspaugh JJ, Marzluff JM (eds) ­Radio-tracking and animal populations. Academic Press, San Diego, CA.

    Google Scholar 

  • Royle JA, Dorazio RM (2008) Hiearchical modeling and inference in ecology: the analysis of data from populations, metapopulations, and communities. Academic Press, Boston, MA.

    Google Scholar 

  • Saltz D (1994) Reporting error measures in radio location by triangulation – a review. J Wildl Manag 58:181–184.

    Article  Google Scholar 

  • Sauer JR, Hines JE, Fallon J (2007) The North American breeding bird survey, results and analysis 1966–2006. Version 10.13.2007. USGS Patuxent Wildlife Research Center, Laurel, MD.

    Google Scholar 

  • Scott JM, Heglund PJ, Morrison ML (eds) (2002) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC.

    Google Scholar 

  • Segurado P, Araújo MB (2004) An evaluation of methods for modelling species distributions. J Biogeogr 31:1555–1568.

    Article  Google Scholar 

  • Soberón JM, Llorente JB, Onate L (2000) The use of specimen-label databases for conservation purposes: an example using Mexican Papilionid and Pierid butterflies. Biodiv Conserv 9:1441–1466.

    Article  Google Scholar 

  • Stockwell DRB, Peterson AT (2002) Effects of sample size on accuracy of species distribution models. Ecol Model 148:1–13.

    Article  Google Scholar 

  • Sutherland WJ (2000) The conservation handbook: research, management and policy. Blackwell Science, Malden, MA.

    Book  Google Scholar 

  • Sutherland WJ (2006) Ecological census techniques: a handbook. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • Thompson WL (2004) Sampling rare or elusive species: concepts, designs, and techniques for estimating population parameters. Island Press, Washington, DC.

    Google Scholar 

  • Thompson WL, White GC, Gowan C (1998) Monitoring vertebrate populations. Academic Press, San Diego, CA.

    Google Scholar 

  • Travaini A, Bustamante J, Rodríguez A, Zapata S, Procopio D, Pedrana J, Peck RM (2007) An integrated framework to map animal distributions in large and remote regions. Divers Distrib 13:289–298.

    Article  Google Scholar 

  • Trzcinski MK, Fahrig L, Merriam G (1999) Independent effects of forest cover and fragmentation on the distribution of forest breeding birds. Ecol Appl 9:586–593.

    Article  Google Scholar 

  • Turchin P (1998) Quantitative analysis of movement: measuring and modeling population redistribution in animals and plants. Sinauer Associates, Sunderland, MA.

    Google Scholar 

  • Venier LA, McKenney DW, Wang Y, McKee J (1999) Models of large-scale breeding-bird distribution as a function of macro-climate in Ontario, Canada. J Biogeogr 26:315–328.

    Article  Google Scholar 

  • Venier LA, Pearce J, McKee JE, McKenney DW, Niemi GJ (2004) Climate and satellite-derived land cover for predicting breeding bird distribution in the Great Lakes Basin. J Biogeogr 31:315–331.

    Article  Google Scholar 

  • Vesley D, McComb BC, Vojta CD, Suring LH, Halaj J, Holthausen RS, Zuckerberg B, Manley PM (2006). Development of protocols to inventory or monitor wildlife, fish, or rare plants. General Technical Report WO-72. U.S. Department of Agriculture, Forest Service, Washington, DC.

    Google Scholar 

  • White GC, Garrott RA (1990) Analysis of wildlife radio-tracking data. Academic Press, San Diego, CA.

    Google Scholar 

  • Withey JC, Bloxton TD, Marzluff JM (2001) Effects of tagging and location error in wildlife radiotelemetry studies. In Millspaugh JJ, Marzluff JM (eds) Radio-tracking and animal populations. Academic Press, San Diego, CA.

    Google Scholar 

Download references

Acknowledgments

Many colleagues have contributed to the ideas and concepts expressed in this chapter. This chapter benefited from the suggestions and reviews of B. McComb, W. Hochachka, M. Hooten, and an anonymous reviewer. We are very grateful for their input. We would also like to thank the editors for the invitation to contribute to this volume and we are grateful for their guidance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Zuckerberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+BUsiness Media, LLC

About this chapter

Cite this chapter

Zuckerberg, B., Huettmann, F., Frair, J. (2011). Proper Data Management as a Scientific Foundation for Reliable Species Distribution Modeling. In: Drew, C., Wiersma, Y., Huettmann, F. (eds) Predictive Species and Habitat Modeling in Landscape Ecology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7390-0_4

Download citation

Publish with us

Policies and ethics