Model-R: A Framework for Scalable and Reproducible Ecological Niche Modeling

  • Andrea Sánchez-Tapia
  • Marinez Ferreira de SiqueiraEmail author
  • Rafael Oliveira Lima
  • Felipe Sodré M. Barros
  • Guilherme M. Gall
  • Luiz M. R. GadelhaJr.Email author
  • Luís Alexandre E. da Silva
  • Carla Osthoff
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 796)


Spatial analysis tools and synthesis of results are key to identifying the best solutions in biodiversity conservation. The importance of process automation is associated with increased efficiency and performance both in the data pre-processing phase and in the post-analysis of the results generated by the packages and modeling programs. The Model-R framework was developed with the main objective of unifying pre-existing ecological niche modeling tools into a common framework and building a web interface that automates steps of the modeling process and occurrence data retrieval. The web interface includes RJabot, a functionality that allows for searching and retrieving occurrence data from Jabot, the main reference on botanical collections management system in Brazil. It returns data in a suitable format to be consumed by other components of the framework. Currently, the tools are multi-projection, they can thus be applied to different sets of temporal and spatial data. Model-R is also multi-algorithm, with seven algorithms available for modeling: BIOCLIM, Mahalanobis distance, Maxent, GLM, RandomForest, SVM, and DOMAIN. The algorithms as well as the entire modeling process may be parametrized using command-line tools or through the web interface. We hope that the use of this application, not only by modeling specialists but also as a tool for policy makers, will be a significant contribution to the continuous development of biodiversity conservation analysis. The Model-R web interface can be installed locally or on a server. A software container is provided to automate the installation.


Species distribution modeling Ecological niche modeling Science gateways Scalability Provenance 



This work has been supported by CNPq (Grants 461572/2014-1 SiBBr - SEPED/MCTIC and 441929/2016-8 Edital MCTI/CNPQ/Universal).


  1. 1.
    Araújo, M.B., Williams, P.H.: Selecting areas for species persistence using occurrence data. Biol. Conserv. 96(3), 331–345 (2000)CrossRefGoogle Scholar
  2. 2.
    Engler, R., Guisan, A., Rechsteiner, L.: An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. J. Appl. Ecol. 41(2), 263–274 (2004)CrossRefGoogle Scholar
  3. 3.
    Ortega-Huerta, M.A., Peterson, A.T.: Modelling spatial patterns of biodiversity for conservation prioritization in North-Eastern Mexico. Divers. Distrib. 10(1), 39–54 (2004)CrossRefGoogle Scholar
  4. 4.
    Chen, Y.: Conservation biogeography of the snake family colubridae of China. North-West. J. Zool. 5(2), 251–262 (2009)Google Scholar
  5. 5.
    Peterson, A.T., Soberón, J., Pearson, R.G., Anderson, R.P., Martínez-Meyer, E., Nakamura, M., Araújo, M.B.: Ecological Niches and Geographic Distributions. Princeton University Press, Princeton (2011)Google Scholar
  6. 6.
    Anderson, R.P., Lew, D., Peterson, A.: Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol. Model. 162(3), 211–232 (2003)CrossRefGoogle Scholar
  7. 7.
    Sillero, N.: What does ecological modelling model? A proposed classification of ecological niche models based on their underlying methods. Ecol. Model. 222(8), 1343–1346 (2011)CrossRefGoogle Scholar
  8. 8.
    Santana, F., de Siqueira, M., Saraiva, A., Correa, P.: A reference business process for ecological niche modelling. Ecol. Inf. 3(1), 75–86 (2008)CrossRefGoogle Scholar
  9. 9.
    Chang, W.: Shiny: Web Application Framework for R (2016).
  10. 10.
    Gadelha, L., Guimarães, P., Moura, A.M., Drucker, D.P., Dalcin, E., Gall, G., Tavares, J., Palazzi, D., Poltosi, M., Porto, F., Moura, F., Leo, W.V.: SiBBr: Uma Infraestrutura para Coleta, Integração e Análise de Dados sobre a Biodiversidade Brasileira. In: VIII Brazilian e-Science Workshop (BRESCI 2014). Proceedings XXXIV Congress of the Brazilian Computer Society (2014)Google Scholar
  11. 11.
    Tyberghein, L., Verbruggen, H., Pauly, K., Troupin, C., Mineur, F., De Clerck, O.: Bio-ORACLE: a global environmental dataset for marine species distribution modelling. Global Ecol. Biogeogr. 21, 272–281 (2012)CrossRefGoogle Scholar
  12. 12.
    Agafonkin, V.: Leaflet - a JavaScript library for interactive maps (2016).
  13. 13.
    Guisan, A., Zimmermann, N.E.: Predictive habitat distribution models in ecology. Ecol. Model. 135(2–3), 147–186 (2000)CrossRefGoogle Scholar
  14. 14.
    Lomba, A., Pellissier, L., Randin, C., Vicente, J., Moreira, F., Honrado, J., Guisan, A.: Overcoming the rare species modelling paradox: a novel hierarchical framework applied to an Iberian endemic plant. Biol. Conserv. 143(11), 2647–2657 (2010)CrossRefGoogle Scholar
  15. 15.
    Hijmans, R.J., Elith, J.: dismo: Species Distribution Modeling (2016).
  16. 16.
    Thuiller, W., Lafourcade, B., Engler, R., Araújo, M.B.: BIOMOD - a platform for ensemble forecasting of species distributions. Ecography 32(3), 369–373 (2009)CrossRefGoogle Scholar
  17. 17.
    Araújo, M.B., Whittaker, R.J., Ladle, R.J., Erhard, M.: Reducing uncertainty in projections of extinction risk from climate change: uncertainty in species’ range shift projections. Glob. Ecol. Biogeogr. 14(6), 529–538 (2005)CrossRefGoogle Scholar
  18. 18.
    Freire, J., Koop, D., Santos, E., Silva, C.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)CrossRefGoogle Scholar
  19. 19.
    Gadelha Jr., L.M.R., Mattoso, M.: Applying provenance to protect attribution in distributed computational scientific experiments. In: Ludäscher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 139–151. Springer, Cham (2015). CrossRefGoogle Scholar
  20. 20.
    Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9(10), e1003285 (2013)CrossRefGoogle Scholar
  21. 21.
    Wilson, G., Aruliah, D.A., Brown, C.T., Chue Hong, N.P., Davis, M., Guy, R.T., Haddock, S.H.D., Huff, K.D., Mitchell, I.M., Plumbley, M.D., Waugh, B., White, E.P., Wilson, P.: Best practices for scientific computing. PLoS Biol. 12(1), e1001745 (2014)CrossRefGoogle Scholar
  22. 22.
    Carvalho, G.: Flora: tools for interacting with the Brazilian flora 2020 (2016).
  23. 23.
    Cayuela, L., Oksanen, J.: Taxonstand: taxonomic standardization of plant species names (2016).
  24. 24.
    Chamberlain, S.A., Szöcs, E.: Taxize: taxonomic search and retrieval in R. F1000Research 2, 191 (2013)Google Scholar
  25. 25.
    Chamberlain, S., Szoecs, E., Foster, Z., Boettiger, C., Ram, K., Bartomeus, I., Baumgartner, J., O’Donnell, J.: Taxize: taxonomic information from around the web (2016).
  26. 26.
    Allouche, O., Tsoar, A., Kadmon, R.: Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 43(6), 1223–1232 (2006)CrossRefGoogle Scholar
  27. 27.
    Knaus, J.: Snowfall: easier cluster computing (based on snow) (2016).
  28. 28.
    Wickham, H.: Advanced R. Chapman and Hall/CRC, Boca Raton (2014)CrossRefGoogle Scholar
  29. 29.
    Simmonds, C.: Mastering embedded linux programming. Packt, Birmingham (2015)Google Scholar
  30. 30.
    Biomodelos: Instituto Alexander von Humboldt (2016).
  31. 31.
    Vicario, S., Hardisty, A., Haitas, N.: BioVeL: Biodiversity virtual e-Laboratory. EMBnet.journal 17(2), 5 (2011)CrossRefGoogle Scholar
  32. 32.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)CrossRefGoogle Scholar
  33. 33.
    Souza Muñoz, M.E., Giovanni, R., Siqueira, M.F., Sutton, T., Brewer, P., Pereira, R.S., Canhos, D.A.L., Canhos, V.P.: openModeller: a generic approach to species’ potential distribution modelling. GeoInformatica 15(1), 111–135 (2009)CrossRefGoogle Scholar
  34. 34.
    Naimi, B., Araújo, M.B.: Sdm: a reproducible and extensible R platform for species distribution modelling. Ecography 39(4), 368–375 (2016)CrossRefGoogle Scholar
  35. 35.
    Kass, J., Anderson, R.P., Aiello-Lammens, M., Muscarella, B., Vilela, B.: Wallace (beta v0.1): Harnessing Digital Biodiversity Data for Predictive Modeling, Fueled by R (2016).
  36. 36.
    Pennington, D.D., Higgins, D., Peterson, A.T., Jones, M.B., Ludäscher, B., Bowers, S.: Ecological niche modeling using the kepler workflow system. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 91–108. Springer, London (2007). CrossRefGoogle Scholar
  37. 37.
    Talbert, C., Talbert, M., Morisette, J., Koop, D.: Data management challenges in species distribution modeling. IEEE Bull. Techn. Committee Data Eng. 36(4), 31–40 (2013)Google Scholar
  38. 38.
    Morisette, J.T., Jarnevich, C.S., Holcombe, T.R., Talbert, C.B., Ignizio, D., Talbert, M.K., Silva, C., Koop, D., Swanson, A., Young, N.E.: VisTrails SAHM: visualization and workflow management for species habitat modeling. Ecography 36(2), 129–135 (2013)CrossRefGoogle Scholar
  39. 39.
    Candela, L., Castelli, D., Coro, G., Pagano, P., Sinibaldi, F.: Species distribution modeling in the cloud. Concurrency Comput. Pract. Exp. 28(4), 1056–1079 (2016)CrossRefGoogle Scholar
  40. 40.
    Candela, L., Castelli, D., Coro, G., Lelii, L., Mangiacrapa, F., Marioli, V., Pagano, P.: An infrastructure-oriented approach for supporting biodiversity research. Ecol. Inf. 26, 162–172 (2014)CrossRefGoogle Scholar
  41. 41.
    Amaral, R., Badia, R.M., Blanquer, I., Braga-Neto, R., Candela, L., Castelli, D., Flann, C., De Giovanni, R., Gray, W.A., Jones, A., Lezzi, D., Pagano, P., Perez-Canhos, V., Quevedo, F., Rafanell, R., Rebello, V., Sousa-Baena, M.S., Torres, E.: Supporting biodiversity studies with the EUBrazilOpenBio hybrid data infrastructure. Concurrency Comput. Pract. Exp. 27(2), 376–394 (2015)CrossRefGoogle Scholar
  42. 42.
    Forzza, R., Mynssen, C., Tamaio, N., Barros, C., Franco, L., Pereira, M.: As coleções do herbário. 200 anos do Jardim Botânico do Rio de Janeiro. Jardim Botânico do Rio de Janeiro, Rio de Janeiro (2008)Google Scholar
  43. 43.
    Mondelli, M.L., Galheigo, M., Medeiros, V., Bastos, B.F., Gomes, A.T.A., Vasconcelos, A.T.R., Gadelha Jr., L.M.R.: Integrating scientific workflows with scientific gateways: a bioinformatics experiment in the brazilian national high-performance computing network. In: X Brazilian e-Science Workshop. Anais do XXXVI Congresso da Sociedade Brasileira de Computação, SBC, pp. 277–284 (2016)Google Scholar
  44. 44.
    Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)CrossRefGoogle Scholar
  45. 45.
    Gadelha, L.M.R., Wilde, M., Mattoso, M., Foster, I.: Exploring provenance in high performance scientific computing. In: Proceedings of the 1st Annual Workshop on High Performance Computing meets Databases - HPCDB 2011, pp. 17–20. ACM Press (2011)Google Scholar
  46. 46.
    Mondelli, M.L., de Souza, M.T., Ocaña, K., de Vasconcelos, A.T.R., Gadelha Jr., L.M.R.: HPSW-Prof: a provenance-based framework for profiling high performance scientific workflows. In: Proceedings of Satellite Events of the 31st Brazilian Symposium on Databases (SBBD 2016), SBC, pp. 117–122 (2016)Google Scholar
  47. 47.
    Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proc. VLDB Endowment 8(12), 1840–1843 (2015)CrossRefGoogle Scholar
  48. 48.
    Venkataraman, S., Stoica, I., Zaharia, M., Yang, Z., Liu, D., Liang, E., Falaki, H., Meng, X., Xin, R., Ghodsi, A., Franklin, M.: SparkR: scaling R programs with spark. In: Proceedings of the 2016 International Conference on Management of Data - SIGMOD 2016, 1099–1104. ACM Press, New York, USA (2016)Google Scholar
  49. 49.
    Chamberlain, S.: rgbif: Interface to the Global ‘Biodiversity’ Information Facility ‘API’ (2017). R package version 0.9.8.
  50. 50.
    Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)Google Scholar
  51. 51.
    Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Andrea Sánchez-Tapia
    • 1
  • Marinez Ferreira de Siqueira
    • 1
    Email author
  • Rafael Oliveira Lima
    • 1
  • Felipe Sodré M. Barros
    • 2
  • Guilherme M. Gall
    • 3
  • Luiz M. R. GadelhaJr.
    • 3
    Email author
  • Luís Alexandre E. da Silva
    • 1
  • Carla Osthoff
    • 3
  1. 1.Rio de Janeiro Botanic GardenRio de JaneiroBrazil
  2. 2.International Institute for SustainabilityRio de JaneiroBrazil
  3. 3.National Laboratory for Scientific ComputingPetrópolisBrazil

Personalised recommendations