Efficient Modelling of Presence-Only Species Data via Local Background Sampling

  • Jeffrey DanielEmail author
  • Julie Horrocks
  • Gary J. Umphrey


In species distribution modelling, records of species presence are often modelled as a realization of a spatial point process whose intensity is a function of environmental covariates. One way to fit a spatial point process model is to apply logistic regression to an artificial case–control sample consisting of the observed presence records combined with a simulated pattern of background points, usually a uniform random sample from within the study’s spatial domain. In this paper we propose local background sampling as an alternative to uniform background sampling when using logistic regression to fit spatial point process models to data. Our method is similar to the local case–control sampling procedure of Fithian and Hastie (Ann Appl Stat 42:1693–1724, 2014), but differs in that background points are sampled with probability proportional to an initial intensity estimate based on a pilot point process model. We compare local background sampling with uniform background sampling in a simulation study and in an example modelling the distributions of bumble bees (genus Bombus) in Ontario, Canada. Our results show local background sampling to be more efficient than uniform background sampling in all simulated settings and across all species analysed.

Supplementary materials accompanying this paper appear online.


Case–control sampling Logistic regression Spatial point processes Species distribution modelling 



Funding was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 261497-2011-RGPIN).

Supplementary material (36.5 mb)
Supplementary material 1 (zip 37343 KB)


  1. Aarts, G., Fieberg, J., and Matthiopoulos, J. (2012). Comparative interpretation of count, presence–absence and point methods for species distribution models. Methods in Ecology and Evolution 3, 177–187.CrossRefGoogle Scholar
  2. Baddeley, A. (2018). A statistical commentary on mineral prospectivity analysis. In Daya Sagar, B. S., Cheng Q. and Agterberg, F., editors, Handbook of Mathematical Geosciences, pp. 25–65. Springer, Cham.CrossRefGoogle Scholar
  3. Baddeley, A., Berman, M., Fisher, N. I., Hardegen, A., Milne, R. K., Schuhmacher, D., Shah, R., and Turner, R. (2010). Spatial logistic regression and change-of-support in Poisson point processes. Electronic Journal of Statistics 4, 1151–1201.MathSciNetCrossRefGoogle Scholar
  4. Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London.CrossRefGoogle Scholar
  5. Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial point patterns (with discussion). Australian & New Zealand Journal of Statistics 42, 283–322.MathSciNetCrossRefGoogle Scholar
  6. Barbet-Massin, M., Jiguet, F., Albert, C. H., and Thuiller, W. (2012). Selecting pseudo-absences for species distribution models: how, where and how many? Methods in Ecology and Evolution 3, 327–338.CrossRefGoogle Scholar
  7. Berman, M. and Turner, T. R. (1992). Approximating point process likelihoods with GLIM. Applied Statistics 41, 31–38.CrossRefGoogle Scholar
  8. Cameron, S. A., Lozier, J. D., Strange, J. P., Koch, J. B., Cordes, N., Solter, L. F., and Griswold, T. L. (2011). Patterns of widespread decline in North American bumble bees. Proceedings of the National Academy of Sciences 108, 662–667.CrossRefGoogle Scholar
  9. Colla, S. R. (2016). Status, threats and conservation recommendations for wild bumble bees (Bombus spp.) in Ontario, Canada: a review for policymakers and practitioners. Natural Areas Journal 36, 412–427.CrossRefGoogle Scholar
  10. Diggle, P. (1985). A kernel method for smoothing point process data. Journal of the Royal Statistical Society: Series C (Applied Statistics) 34, 138–147.zbMATHGoogle Scholar
  11. Elith, J. and Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics 40, 677–697.CrossRefGoogle Scholar
  12. Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., and Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and Distributions 17, 43–57.CrossRefGoogle Scholar
  13. Feng, X., Castro, M. C., Linde, E., and Papeş, M. (2017). Armadillo Mapper: A case study of an online application to update estimates of species’ potential distributions. Tropical Conservation Science 10, 1–5.Google Scholar
  14. Fithian, W. and Hastie, T. (2013). Finite-sample equivalence in statistical models for presence-only data. The Annals of Applied Statistics 7, 1917–1939.MathSciNetCrossRefGoogle Scholar
  15. Fithian, W. and Hastie, T. (2014). Local case-control sampling: efficient subsampling in imbalanced data sets. The Annals of Statistics 42, 1693–1724.MathSciNetCrossRefGoogle Scholar
  16. Fois, M., Fenu, G., Lombrana, A. C., Cogoni, D., and Bacchetta, G. (2015). A practical method to speed up the discovery of unknown populations using species distribution models. Journal for Nature Conservation 24, 42–48.CrossRefGoogle Scholar
  17. Franklin, J. (2010). Mapping Species Distributions: Spatial Inference and Prediction. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar
  18. GBIF (2019). GBIF occurrence download.
  19. Goulson, D., Lye, G. C., and Darvill, B. (2008). Decline and conservation of bumble bees. Annual Review of Entomology 53, 191–208.CrossRefGoogle Scholar
  20. Guisan, A., Thuiller, W., and Zimmermann, N. E. (2017). Habitat Suitability and Distribution Models with Applications in R. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar
  21. Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I., Regan, T. J., Brotons, L., McDonald-Madden, E., and Mantyka-Pringle, C. (2013). Predicting species distributions for conservation decisions. Ecology Letters 16, 1424–1435.CrossRefGoogle Scholar
  22. Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25, 1965–1978.CrossRefGoogle Scholar
  23. Hijmans, R. J. and Graham, C. H. (2006). The ability of climate envelope models to predict the effect of climate change on species distributions. Global Change Biology 12, 2272–2281.CrossRefGoogle Scholar
  24. Jiménez-Valverde, A., Peterson, A. T., Soberón, J., Overton, J., Aragón, P., and Lobo, J. M. (2011). Use of niche models in invasive species risk assessments. Biological Invasions 13, 2785–2797.CrossRefGoogle Scholar
  25. Klein, A.-M., Vaissiere, B. E., Cane, J. H., Steffan-Dewenter, I., Cunningham, S. A., Kremen, C., and Tscharntke, T. (2006). Importance of pollinators in changing landscapes for world crops. Proceedings of the Royal Society B: Biological Sciences 274, 303–313.CrossRefGoogle Scholar
  26. Lobo, J. M., Jiménez-Valverde, A., and Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145–151.CrossRefGoogle Scholar
  27. Merow, C., Smith, M. J., and Silander Jr, J. A. (2013). A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 36, 1058–1069.CrossRefGoogle Scholar
  28. Naimi, B., Hamm, N. A. S., Groen, T. A., Skidmore, A. K., and Toxopeus, A. G. (2014). Where is positional uncertainty a problem for species distribution modelling? Ecography 37, 191–203.CrossRefGoogle Scholar
  29. Pearce, J. L. and Boyce, M. S. (2006). Modelling distribution and abundance with presence-only data. Journal of Applied Ecology 43, 405–412.CrossRefGoogle Scholar
  30. Peterson, A. T., Soberón, J., Pearson, R. G., Anderson, R. P., Martínez-Meyer, E., Nakamura, M., and Araújo, M. B. (2011). Ecological Niches and Geographic Distributions. Princeton University Press, Princeton, NJ.CrossRefGoogle Scholar
  31. Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259.CrossRefGoogle Scholar
  32. Phillips, S. J., Dudík, M., Elith, J., Graham, C. H., Lehmann, A., Leathwick, J., and Ferrier, S. (2009). Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19, 181–197.CrossRefGoogle Scholar
  33. Phillips, S. J., Dudík, M., and Schapire, R. E. (2017). Maxent software for modeling species niches and distributions (Version 3.4.1).
  34. Renner, I. W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., and Warton, D. I. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution 6, 366–379.CrossRefGoogle Scholar
  35. Renner, I. W. and Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics 69, 274–281.MathSciNetCrossRefGoogle Scholar
  36. Rinnhofer, L. J., Roura-Pascual, N., Arthofer, W., Dejaco, T., Thaler-Knoflach, B., Wachter, G. A., Christian, E., Steiner, F. M., and Schlick-Steiner, B. C. (2012). Iterative species distribution modelling and ground validation in endemism research: an alpine jumping bristletail example. Biodiversity and Conservation 21, 2845–2863.CrossRefGoogle Scholar
  37. Snäll, T., Kindvall, O., Nilsson, J., and Pärt, T. (2011). Evaluating citizen-based presence data for bird monitoring. Biological Conservation 144, 804–810.CrossRefGoogle Scholar
  38. Thurman, A. L. and Zhu, J. (2014). Variable selection for spatial Poisson point processes via a regularization method. Statistical Methodology 17, 113–125.MathSciNetCrossRefGoogle Scholar
  39. Valavi, R., Elith, J., Lahoz-Monfort, J. J., and Guillera-Arroita, G. (2019). blockcv: An R package for generating spatially or environmentally separated folds for \(k\)-fold cross-validation of species distribution models. Methods in Ecology and Evolution 10, 225–232.CrossRefGoogle Scholar
  40. Warton, D. and Aarts, G. (2013). Advancing our thinking in presence-only and used-available analysis. Journal of Animal Ecology 82, 1125–1134.CrossRefGoogle Scholar
  41. Warton, D. I. and Shepherd, L. C. (2010). Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. The Annals of Applied Statistics 4, 1383–1402.MathSciNetCrossRefGoogle Scholar

Copyright information

© International Biometric Society 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of GuelphGuelphCanada

Personalised recommendations