Bias in estimates of the classic and incidence-based Jaccard similarity indices: insights from assemblage simulation


Similarity indices are often used for measuring b-diversity and as the starting point of multivariate analysis. In this study, I used simulation to examine the direction and amount of bias in estimates of two similarity indices, Jaccard Coefficient (J) and incidence-based J (J). I design a novel simulation to generate three sets of assemblages that vary in species richness, species-occurrence distributions, and b-diversity. I characterized assemblage differences with the ratio of [proportion of rare species in all shared species / proportion of rare species in all unshared species] (i.e., PRss/PRus) and the Pearson’s correlation in the probabilities of shared species between two assemblages (i.e., share-species correlation). I found that J was subject to strong positive or negative bias, depending on PRss/PRus. J was mainly subject to negative bias, which varied with share-species correlation. In both indices, bias varied substantially from one pair of assemblages to another and among datasets. The high variation in the bias across different comparisons of assemblages may compromise b-diversity estimation established at low sampling efforts based on the two indices or their variants.



the classic Jaccard Coefficient


the incidence-based Jaccard Coefficient adjusted for unseen species


the Number of Shared Species by two assemblages


occurrence probability of Species j at a random sample unit in Assemblage i,


the Proportion of Rare species out of all Shared Species by two assemblages


the Proportion of Rare species out of all Unshared Species by two assemblages


Species-Occurrence Distribution – a plot of relative occurrence frequency of species against their ranks (from common to rare)


the Total number of species in a pair of assemblages


  1. Baselga, A. 2010. Partitioning the turnover and nestedness components of beta diversity. Glob. Ecol. Biogeogr. 19:134–143.

    Article  Google Scholar 

  2. Cao, Y. and J. Epifanio. 2010. Quantifying the responses of macroinvertebrate assemblages to simulated stress: are more accurate similarity indices less useful? Methods Ecol. Evol. 1:380–388.

    Google Scholar 

  3. Cao, Y., C.P. Hawkins, D.P. Larsen and J. Van Sickle. 2007. Effects of sample standardization on mean species detectabilities and estimates of relative differences in species richness among assemblages. Am. Nat. 170:381–385.

    Article  Google Scholar 

  4. Cao, Y., D.P. Larsen, R.M. Hughes, P. Angermeier and T. Patton. 2002. Sampling efforts affect multivariate comparisons of stream assemblages. J. N. Am. Benthol. Soc. 21:707–714.

    Article  Google Scholar 

  5. Cardoso, P., P.A.V. Borges and J.V. Veech. 2009. Testing the performance of beta diversity measures based on incidence data: the robustness to undersampling. Divers. Distrib. 15:1081–1090.

    Article  Google Scholar 

  6. Carvalho, J.C., P. Cardoso and P. Gomes. 2012. Determining the relative roles of species replacement and species richness differences in generating beta-diversity. Glob. Ecol. Biogeogr. 21:760–771.

    Article  Google Scholar 

  7. Chao, A., R.L. Chazdon, R.K. Colwell and T.J. Shen. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol. Lett. 8:148–159.

    Article  Google Scholar 

  8. Chao, A., R.L. Chazdon, R.K. Colwell and T.J. Shen. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361–371.

    Article  Google Scholar 

  9. Chao, A., W. Hwang, Y.C. Chen and C.Y. Kuo. 2000. Estimating the number of shared species in two communities. Statistica Sinica 10:227–246.

    Google Scholar 

  10. Condit, R., R. Perez, S. Lao, S. Aguilar and A. Somoza. 2005. Geographic ranges and β-diversity: discovering how many tree species there are where. Biologiske Skrifter Kongelige Danske Videnskabernes Selskab. 55:57–71.

    Google Scholar 

  11. Engen, S., V. Grøtan and B-E. Sæther. 2011. Estimating similarity of communities: a parametric approach to spatial-temporal analysis of species diversity. Ecography 34:220–231.

    Article  Google Scholar 

  12. Faith, D.O., P.R. Minchin and L. Belbin. 1987. Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69:53–68.

    Article  Google Scholar 

  13. Holtrop, A.M., Y. Cao and C.R. Dolan. 2010. Estimating sampling effort required for characterizing species richness and site-to-site similarity in fish assemblage surveys of wadeable Illinois streams. T. A. Fish. Soc. 139:1421–1435.

    Article  Google Scholar 

  14. Legendre, P., D. Borcard and P.R. Peres-Neto. 2005. Analyzing beta diversity: partitioning the spatial variation of community composition data. Ecol. Monogr. 75:435–450.

    Article  Google Scholar 

  15. Legendre, P. and L. Legendre. 2012. Numerical Ecology. 3rd Edition, Elsevier, New York.

    Google Scholar 

  16. Pan, H.Y., A. Chao and W. Foissner. 2009. A non-parametric lower bound for the number of species shared by multiple communities. J. Arg. Biol. Envir. St. 14:452–468.

    Article  Google Scholar 

  17. Smith, W., A.R. Solow, and P.E. Preston. 1996. An estimator of species overlap using a modified beta-binomial model. Biometrics 52:1472–1477.

    Article  Google Scholar 

  18. Steinitz, O., J. Heller and A. Tsoar. 2005. Predicting regional patterns of similarity in species composition for conservation planning. Conserv. Biol. 19:1978–1988.

    Article  Google Scholar 

  19. Su, J.C., D.M. Debinski, M.E. Jakubauskas and K. Kindscher. 2004. Beyond species richness: Community similarity as a measure of cross-taxon congruence for coarse-filter conservation. Conserv. Biol. 18:167–173.

    Article  Google Scholar 

  20. Yue, J., M.K. Clayton and F.C. Lin. 2001. A nonparametric estimator of species overlap. Biometrics 57:743–749.

    CAS  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Y. Cao.

Electronic supplementary material

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cao, Y. Bias in estimates of the classic and incidence-based Jaccard similarity indices: insights from assemblage simulation. COMMUNITY ECOLOGY 19, 311–318 (2018).

Download citation


  • Assemblage simulation
  • Beta-diversity
  • Estimating assemblage similarity
  • Under-sampling