Proper Data Management as a Scientific Foundation for Reliable Species Distribution Modeling

  • Benjamin Zuckerberg
  • Falk Huettmann
  • Jacqueline Frair


Data management, storage, curation, and dissemination are mainstays of computer modeling. Indeed, a traditional view of computer modeling has perpetuated the notion of “garbage in, garbage out” (GIGO), which serves as a constant reminder that, no matter how sophisticated the analysis, computers will “unquestioningly process” whatever type of data are provided regardless of its quality or suitability (Pearson 2007). In ecology, the datasets used in computer modeling are inherently complex and often characterized by missing values, dynamic environmental variables, and other factors leading to numerous data anomalies ( Michener et al. 1997; Michener and Brunt 2000). Ecologists have long recognized, however, that although data quality is undoubtedly important, using different types of data, even messy ones, can still prove informative, and facilitates new questions, methods, and synergies in science and society.


Global Position System Geographic Information System Species Occurrence Species Distribution Modeling Structure Query Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Many colleagues have contributed to the ideas and concepts expressed in this chapter. This chapter benefited from the suggestions and reviews of B. McComb, W. Hochachka, M. Hooten, and an anonymous reviewer. We are very grateful for their input. We would also like to thank the editors for the invitation to contribute to this volume and we are grateful for their guidance.


  1. Aldridge CL, Boyce MS (2007) Linking occurrence and fitness to persistence: habitat-based approach for endangered Greater Sage-Grouse. Ecol Appl 17:508–526.CrossRefPubMedGoogle Scholar
  2. Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York, NY.CrossRefGoogle Scholar
  3. Anderson DR, Burnham KP, Gould WR, Cherry S (2001) Concerns about finding effects that are actually spurious. Wildl Soc Bull 29:311–316.Google Scholar
  4. Araújo MB, Guisan A (2006) Five (or so) challenges for species distribution modelling. J Biogeogr 33:1677–1688.CrossRefGoogle Scholar
  5. Araújo MB, Luoto M (2007) The importance of biotic interactions for modelling species distributions under climate change. Global Ecol Biogeogr 16:743–753.CrossRefGoogle Scholar
  6. Araújo MB, Williams PH, Fuller RJ (2002) Dynamics of extinction and the selection of nature reserves. Proc R Soc Lond Ser B 269:1971–1980.CrossRefGoogle Scholar
  7. Austin MP (2002) Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. Ecol Model 157:101–118.CrossRefGoogle Scholar
  8. Austin M (2006) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Model 200:1–19.CrossRefGoogle Scholar
  9. Barry S, Elith J (2006) Error and uncertainty in habitat models. J Appl Ecol 43:413–423.CrossRefGoogle Scholar
  10. Bibby CJ, Burgess ND, Hill DA, Mustoe S (2000) Bird census techniques. Academic Press, San Diego, CA.Google Scholar
  11. Bishop JA, Myers WL (2005) Associations between avian functional guild response and regional landscape properties for conservation planning. Ecol Indic 5:33–48.CrossRefGoogle Scholar
  12. Braun CE (2005) Techniques for wildlife investigations and management. The Wildlife Society, Bethesda, MD.Google Scholar
  13. Breiman L (2001a) Random forests. Mach Learn 45:5–32.CrossRefGoogle Scholar
  14. Breiman L (2001b) Statistical modeling: the two cultures. Stat Sci 16:199–231.CrossRefGoogle Scholar
  15. Brennan JM, Bender DJ, Contreras TA, Fahrig L (2002) Focal patch landscape studies for wildlife management: optimizing sampling effort across scales. In Lui J, Taylor WW (eds) Integrating landscape ecology into natural resource management. Cambridge University Press, NY.Google Scholar
  16. Brotons L, Thuiller W, Araújo MB, Hirzel AH (2004) Presence–absence versus presence-only modelling methods for predicting bird habitat suitability. Ecography 27:437–448.CrossRefGoogle Scholar
  17. Buckland ST (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.Google Scholar
  18. Burnham KP, Anderson DR (2002) Model selection and inference: a practical information-theoretic approach. Springer-Verlag, New York.Google Scholar
  19. Coudun C, Gégout JC (2006) The derivation of species response curves with Gaussian logistic regression is sensitive to sampling intensity and curve characteristics. Ecol Model 199:164–175.CrossRefGoogle Scholar
  20. Craig E, Huettmann F (2009) Using “blackbox” algorithms such as TreeNET and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. In Wang HF (ed) Intelligent data analysis: developing new methodologies through pattern discovery and recovery. Information Science Reference, Hershey, PA.Google Scholar
  21. D’Eon RG, Delparte D (2005) Effects of radio-collar position and orientation on GPS radio-collar performance, and the implications of PDOP in data screening. J Appl Ecol 42:383–388.CrossRefGoogle Scholar
  22. D’Eon RG, Serrouya R, Smith G, Kochanny C (2002) GPS radiotelemetry error and bias in mountainous terrain. Wildl Soc Bull 30:430–439.Google Scholar
  23. Donald PF, Fuller RJ (1998) Ornithological atlas: a review of uses and limitations. Bird Study 45:129–145.CrossRefGoogle Scholar
  24. Duke CS (2006) Data: share and share alike. Front Ecol Environ 4:395–395.CrossRefGoogle Scholar
  25. Duke CS (2007) Beyond data: reproducible research in ecology and environmental sciences – the author replies. Front Ecol Environ 5:67.Google Scholar
  26. Edwards TC, Cutler DR, Zimmermann NE, Geiser L, Moisen GG (2006) Effects of sample survey design on the accuracy of classification tree models in species distribution models. Ecol Model 199:132–141.CrossRefGoogle Scholar
  27. Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton JM, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151.CrossRefGoogle Scholar
  28. Elzinga CL (2001) Monitoring plant and animal populations. Blackwell Science, Malden, MA.Google Scholar
  29. Esanu JM, Uhlir PF (2003) The role of scientific and technical data and information in the public domain: proceedings of a symposium. National Academies Press, Washington, DC.Google Scholar
  30. Ferrier S, Guisan A (2006) Spatial modelling of biodiversity at the community level. J Appl Ecol 43:393–404.CrossRefGoogle Scholar
  31. Fortin MJ, Dale MRT (2005) Spatial analysis: a guide for ecologists. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar
  32. Frair JL, Nielsen SE, Merrill EH, Lele SR, Boyce MS, Munro RHM, Stenhouse GB, Beyer HL (2004) Removing GPS collar bias in habitat selection studies. J Appl Ecol 41:201–212.CrossRefGoogle Scholar
  33. Frair JL, Merrill EH, Allen JR, Boyce MS (2007) Know thy enemy: experience affects elk translocation success in risky landscapes. J Wildl Manag 71:541–554.CrossRefGoogle Scholar
  34. Gibbons DW, Donald PF, Bauer HG, Fornasari L, Dawson IK (2007) Mapping avian distributions: the evolution of bird atlases. Bird Study 54:324–334.CrossRefGoogle Scholar
  35. Graham CH, Ferrier S, Huettman F, Mortiz C, Peterson AT (2004) New developments in museum-based informatics and applications in biodiversity analysis. Trends Ecol Evol 19:497–503.CrossRefPubMedGoogle Scholar
  36. Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Model 135:147–186.CrossRefGoogle Scholar
  37. Guisan A, Lehmann A, Ferrier S, Austin M, Overton JMcC, Aspinall R, Hastie T (2006) Making better biogeographical predictions of species’ distributions. J Appl Ecol 43:386–392.CrossRefGoogle Scholar
  38. Guisan A, Graham CH, Elith J, Huettmann F, Dudik M, Ferrier S, Hijmans R, Lehmann A., Li J, Lohmann LG, Loiselle B, Manion G, Moritz C, Nakamura M, Nakawawa Y., Overton JMcC, Peterson AT, Phllips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Williams SE, Wisz MS, Zimmermann NE (2007) Sensitivity of predictive species distribution models to change in grain size. Divers Distrib 13:332–340.CrossRefGoogle Scholar
  39. Hames RS, Rosenberg KV, Lowe JD, Dhondt AA (2001) Site reoccupation in fragmented landscapes: testing predictions of metapopulation theory. J Anim Ecol 70:182–190.CrossRefGoogle Scholar
  40. Hastie AT, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York.Google Scholar
  41. Heikkinen RK, Luoto M, Virkkala R, Pearson RG, Körber JH (2007) Biotic interactions improve prediction of boreal bird distributions at macro-scales. Global Ecol Biogeogr 16:754–763.CrossRefGoogle Scholar
  42. Hernandez PA, Graham CH, Master LL, Albert DL (2006) The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773–785.CrossRefGoogle Scholar
  43. Hirzel A, Guisan A (2002) Which is the optimal sampling strategy for habitat suitability modelling. Ecol Model 157:331–341.CrossRefGoogle Scholar
  44. Hochachka WM, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data-mining discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437.CrossRefGoogle Scholar
  45. Hollister JW, Walker HA (2007) Beyond data: reproducible research in ecology and environmental sciences. Front Ecol Environ 5:11–12.Google Scholar
  46. Huettmann F (2005) Databases and science-based management in the context of wildlife and habitat: toward a certified ISO standard for objective decision-making for the global community by using the internet. J Wildl Manag 69:466–472.CrossRefGoogle Scholar
  47. Huettmann F (2007) The digital teaching legacy of the International Polar Year (IPY): details of a present to the global village for achieving sustainability. Proceedings 18th International Workshop on Database and Expert Systems Applications, DEXA: 673–677.Google Scholar
  48. Huettmann F, Diamond AW (2006) Large-scale effects on the spatial distribution of seabirds in the Northwest Atlantic. Landsc Ecol 21:1089–1108.CrossRefGoogle Scholar
  49. Huettmann, F. (2009) The Global Need for, and Appreciation of, High-Quality Metadata in Biodiversity work. In: E. Spehn and C. Koerner (eds). Data Mining for Global Trends in Mountain Biodiversity. CRC Press, Taylor & Francis. pp 25–28.Google Scholar
  50. Jan L (2006) Database model for taxonomic and observation data. In Sahni S (ed) Proceedings of the 2nd IASTED international conference on advances in computer science and technology. ACTA Press, Puerto Vallarta, Mexico.Google Scholar
  51. Jochum K (2008) Benefits of using marginal opportunistic wildlife behavior data: constraints and applications across taxa – a dominance hierarchy example relevant for wildlife management. M.Sc. Thesis, University Hannover: Hannover, Germany.Google Scholar
  52. Kadmon R, Farber O, Danin A (2004) Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models. Ecol Appl 14:401–413.CrossRefGoogle Scholar
  53. Karasti H, Baker KS (2008) Digital data practices and the long term ecological research program growing global. Int J Digit Curation 3:42–58.Google Scholar
  54. Lutolf M, Kienast F, Guisan A (2006) The ghost of past species occurrence: improving species distribution models for presence-only data. J Appl Ecol 43:802–815.CrossRefGoogle Scholar
  55. MacKenzie DI (2005a) Was it there? Dealing with imperfect detection for species presence/absence data. Aust N-Z J Stat 47:65–74.CrossRefGoogle Scholar
  56. MacKenzie DI (2005b) What are the issues with presence–absence data for wildlife managers? J Wildl Manag 69:849–860.CrossRefGoogle Scholar
  57. MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, HInes JE (2006) Occupancy estimation and modeling: inferring patterns and dynamics of species. Elsevier, Burlington, MA.Google Scholar
  58. MacKenzie DI, Royle JA (2005). Designing occupancy studies: general advice and allocating survey effort. J Appl Ecol 42:1105–1114.CrossRefGoogle Scholar
  59. Magness DR, Huettmann F, and Morton JM (2008) Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209–229 in Smolinski TG, Milanova MG & Hassanien A-E (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg. 428 pp.Google Scholar
  60. Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931.CrossRefGoogle Scholar
  61. Manly BFJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies. Kluwer Academic Publishers, Boston, MA.Google Scholar
  62. Marzluff JM, Knick ST, Millspaugh JJ (2001) High-tech behavioral ecology: modeling the distribution of animal activities to better understand wildlife space use and resource selection. In Marzluff JM, Millspaugh JJ (eds) Radio-tracking and animal populations. Academic Press, San Diego, CA.Google Scholar
  63. McGowan K, Zuckerberg B (2008) Summary of results. In McGowan K, Corwin K (eds) The second atlas of breeding birds in New York State. Cornell University Press, Ithaca, NY.Google Scholar
  64. Meyer CB (2007) Does scale matter in predicting species distributions? Case study with the Marbled Murrelet. Ecol Appl 17:1474–1483.CrossRefPubMedGoogle Scholar
  65. Michener WK, Brunt JW (eds) (2000) Ecological data: design, management, and processing. Blackwell Science, Malden, MA.Google Scholar
  66. Michener WK, Brunt JW, Helly JJ, Kirchner TB, Stafford SG (1997) Nongeospatial metadata for the ecological sciences. Ecol Appl 7:330–342.CrossRefGoogle Scholar
  67. Moen R, Pastor J, Cohen Y, Schwartz CC (1996) Effects of moose movement and habitat use on GPS collar performance. J Wildl Manag 60:659–668.CrossRefGoogle Scholar
  68. Moen R, Pastor J, Cohen Y (1997) Accuracy of GPS telemetry collar locations with differential correction. J Wildl Manag 61:530–539.CrossRefGoogle Scholar
  69. Nemitz, D. 2008 An assessment of sampling detectability for global bioidversity monitoring: results from sampling GRIDs in different climatic regions. MINK program, University of Goettingen, Germany, unpublished Masters thesis.Google Scholar
  70. Nielsen SE, Stenhouse GB, Boyce MS (2006) A habitat-based framework for grizzly bear conservation in Alberta. Biol Conserv 130:217–229.CrossRefGoogle Scholar
  71. Pearson RG (2007) Species’ distribution modeling for conservation educators and practitioners – synthesis. American Museum of Natural History. Accessed 7 May 2008.
  72. Pearson RG, Raxworthy CJ, Nakamura M, Peterson AT (2007) Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. J Biogeogr 34:102–117.CrossRefGoogle Scholar
  73. Peng RD, Dominici F, Zeger SL (2006) Reproducible epidemiologic research. Am J Epidemiol 163:783–789.CrossRefPubMedGoogle Scholar
  74. Pulliam HR (2000) On the relationship between niche and distribution. Ecol Lett 3:349–361.CrossRefGoogle Scholar
  75. Rempel RS, Rodgers AR (1997) Effects of differential correction on accuracy of a GPS animal location system. J Wildl Manag 61:525–530.CrossRefGoogle Scholar
  76. Rempel RS, Rodgers AR, Abraham KF (1995) Performance of a GPS animal location system under boreal forest canopy. J Wildl Manag 59:543–551.CrossRefGoogle Scholar
  77. Rodgers AR (2001) Recent telemetry technology. In: Millspaugh JJ, Marzluff JM (eds) ­Radio-tracking and animal populations. Academic Press, San Diego, CA.Google Scholar
  78. Royle JA, Dorazio RM (2008) Hiearchical modeling and inference in ecology: the analysis of data from populations, metapopulations, and communities. Academic Press, Boston, MA.Google Scholar
  79. Saltz D (1994) Reporting error measures in radio location by triangulation – a review. J Wildl Manag 58:181–184.CrossRefGoogle Scholar
  80. Sauer JR, Hines JE, Fallon J (2007) The North American breeding bird survey, results and analysis 1966–2006. Version 10.13.2007. USGS Patuxent Wildlife Research Center, Laurel, MD.Google Scholar
  81. Scott JM, Heglund PJ, Morrison ML (eds) (2002) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC.Google Scholar
  82. Segurado P, Araújo MB (2004) An evaluation of methods for modelling species distributions. J Biogeogr 31:1555–1568.CrossRefGoogle Scholar
  83. Soberón JM, Llorente JB, Onate L (2000) The use of specimen-label databases for conservation purposes: an example using Mexican Papilionid and Pierid butterflies. Biodiv Conserv 9:1441–1466.CrossRefGoogle Scholar
  84. Stockwell DRB, Peterson AT (2002) Effects of sample size on accuracy of species distribution models. Ecol Model 148:1–13.CrossRefGoogle Scholar
  85. Sutherland WJ (2000) The conservation handbook: research, management and policy. Blackwell Science, Malden, MA.CrossRefGoogle Scholar
  86. Sutherland WJ (2006) Ecological census techniques: a handbook. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar
  87. Thompson WL (2004) Sampling rare or elusive species: concepts, designs, and techniques for estimating population parameters. Island Press, Washington, DC.Google Scholar
  88. Thompson WL, White GC, Gowan C (1998) Monitoring vertebrate populations. Academic Press, San Diego, CA.Google Scholar
  89. Travaini A, Bustamante J, Rodríguez A, Zapata S, Procopio D, Pedrana J, Peck RM (2007) An integrated framework to map animal distributions in large and remote regions. Divers Distrib 13:289–298.CrossRefGoogle Scholar
  90. Trzcinski MK, Fahrig L, Merriam G (1999) Independent effects of forest cover and fragmentation on the distribution of forest breeding birds. Ecol Appl 9:586–593.CrossRefGoogle Scholar
  91. Turchin P (1998) Quantitative analysis of movement: measuring and modeling population redistribution in animals and plants. Sinauer Associates, Sunderland, MA.Google Scholar
  92. Venier LA, McKenney DW, Wang Y, McKee J (1999) Models of large-scale breeding-bird distribution as a function of macro-climate in Ontario, Canada. J Biogeogr 26:315–328.CrossRefGoogle Scholar
  93. Venier LA, Pearce J, McKee JE, McKenney DW, Niemi GJ (2004) Climate and satellite-derived land cover for predicting breeding bird distribution in the Great Lakes Basin. J Biogeogr 31:315–331.CrossRefGoogle Scholar
  94. Vesley D, McComb BC, Vojta CD, Suring LH, Halaj J, Holthausen RS, Zuckerberg B, Manley PM (2006). Development of protocols to inventory or monitor wildlife, fish, or rare plants. General Technical Report WO-72. U.S. Department of Agriculture, Forest Service, Washington, DC.Google Scholar
  95. White GC, Garrott RA (1990) Analysis of wildlife radio-tracking data. Academic Press, San Diego, CA.Google Scholar
  96. Withey JC, Bloxton TD, Marzluff JM (2001) Effects of tagging and location error in wildlife radiotelemetry studies. In Millspaugh JJ, Marzluff JM (eds) Radio-tracking and animal populations. Academic Press, San Diego, CA.Google Scholar

Copyright information

© Springer Science+BUsiness Media, LLC 2011

Authors and Affiliations

  • Benjamin Zuckerberg
    • 1
  • Falk Huettmann
  • Jacqueline Frair
  1. 1.Cornell Lab of OrnithologyIthacaUSA

Personalised recommendations