Skip to main content
Log in

A new estimator for the number of species in a population

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

We consider the classic problem of estimating T, the total number of species in a population, from repeated counts in a simple random sample. We first show that the frequently used Chao-Lee estimator can in fact be obtained by Bayesian methods with a Dirichlet prior, and then use such clarification to develop a new estimator; numerical tests and some real experiments show that the new estimator is more flexible than existing ones, in the sense that it adapts to changes in the normalized interspecies variance γ 2. Our method involves simultaneous estimation of T, γ 2, and of the parameter λ in the Dirichlet prior, and the only limitation seems to come from the required convergence of the prior which imposes the restriction γ 2 ≤ 1. We also obtain confidence intervals for T and an estimation of the species’ distribution. Some numerical examples are given, together with applications to sampling from a Census database closely following Benford’s law, showing good performances of the new estimator, even beyond γ 2 = 1. Tests on confidence intervals show that the coverage frequency appears to be in good agreement with the desired confidence level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  • Benford, F. (1938). The law of anomalous numbers. Proc. Am. Philos. Soc., 78, 551–572.

    Google Scholar 

  • Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete multivariate analysis: theory and practice. MIT Press, Cambridge.

    MATH  Google Scholar 

  • Boender, C.G.E. and Rinnoy Kan, A.H.G. (1987). A multinomial Bayesan approach to the estimation of population and vocabulary size. Biometrika, 74, 849–856.

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning, D. and Schön, D. (2005). Nonparametric maximum likelihood estimation of population size based on the counting distribution. J. R. Stat. Soc. Ser. C Appl. Stat, 54, Part 4, 721–737.

    Article  MATH  Google Scholar 

  • Böhning, D., Suppawattanabe, B., Kusolvisitkul, W. and Vivatwongkasem, C. (2004). Estimating the number of drug users in Bangkok 2001: A capture-recapture approach using repeated entries in the list. Eur. J. Epidemiol., 19, 1075–1083.

    Article  Google Scholar 

  • Brose, U., Martinez, M.D. and Williams, R.J. (2003). Estimating species richness: sensitivity to sample coverage and insensitivity to spatial patterns. Ecology, 84, 2364–2377.

    Article  Google Scholar 

  • Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: a review. J. Amer. Statist. Assoc., 88, 364–373.

    Google Scholar 

  • Burnahm, K.P. and Overton, W.S. (1979). Robust estimation of population size when capture probabilities vary among animals. Ecology, 60, 927–936.

    Article  Google Scholar 

  • Burton, D. (2005). The history of mathematics: an introduction. McGraw-Hill.

  • Chao, A. (1984). Non-parametric estimation of the number of classes in a population. Scand. J. Stat., 11, 265–270.

    Google Scholar 

  • Chao, A. (2004). Species richness estimation. In Encyclopedia of Statistical Sciences (N. Balakrishnan, C. B. Read and B. Vidakovic, eds.). Wiley, New York.

    Google Scholar 

  • Chao, A. and Lee, S.M. (1992). Estimating the number of classes via sample coverage. J. Amer. Statist. Assoc., 87, 210–217.

    Article  MathSciNet  MATH  Google Scholar 

  • Chao, A., Ma, M.-C. and Yang, M.C.K. (1993). Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika, 80, 193–201.

    Article  MathSciNet  MATH  Google Scholar 

  • Chao, A., Hwang, W.-H., Chen, Y.-C. and Kuo, C.-Y. (2000). Estimating the number of shared species in two communities. Statist. Sinica, 10, 227–246.

    MathSciNet  MATH  Google Scholar 

  • Church, K.W., Gale, W.A. (1991). Enhanced Good-Turing and Cat-Cal: two new methods for estimating probabilities of English bigrams. Comput. Speech Lang., 5, 19–54.

    Article  Google Scholar 

  • Darroch, J.N. and Ratcliff (1980). A Note on Capture-Recapture Estimation. Biometrics, 36, 149–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Efron, B. and Thisted, R. (1976). Estimating the number of unseen species: how many words did Shakespeare know? Biometrika, 63, 435–467.

    MATH  Google Scholar 

  • Esty, W.W. (1985). Estimation of the number of classes in a population and the coverage of a sample. Mathematical Scientist, 10, 41–50.

    MathSciNet  MATH  Google Scholar 

  • Esty, W.W. (1986). The size of a coverage. Numismatic Chronicle, 146, 185–215.

    MathSciNet  Google Scholar 

  • Fewster, R.M. (2009). A simple explanation of Benford’s Law. Am. Stat., 63, 26–32.

    Article  MathSciNet  Google Scholar 

  • Gandolfi, A. and Sastri, C.C.A. (2004). Nonparametric estimations about species not observed in a random sample. Milan J. Math 72, 81–105.

    Article  MathSciNet  MATH  Google Scholar 

  • Good, I.J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–266.

    MathSciNet  MATH  Google Scholar 

  • Good, I.J. (1965). The estimation of probabilities: an essay on modern bayesian method. Research Monograph No. 30, MIT Press, Cambridge, MA.

    Google Scholar 

  • Good, I.J. (1967). A Bayesian significance test for multinomial distributions. J. Roy. Statist. Soc. Ser. B, 29, 399–431.

    MathSciNet  MATH  Google Scholar 

  • Good, I.J. and Toulmin, G. (1956). The number of new species and the increase in population coverage when a sample is increased. Biometrika, 43, 45–63.

    MathSciNet  MATH  Google Scholar 

  • Harris, B. (1968). Statistical inference in the classical occupancy problem: unbiased estimation of the number of classes. J. Amer. Statist. Assoc., 63, 837– 847.

    Article  MathSciNet  MATH  Google Scholar 

  • Hill, T.P. (1995). The significant-digit phenomenon. Am. Math. Month., 102, 322–327.

    Article  MATH  Google Scholar 

  • Jeffreys, H. (1961). Theory of probability. Clarendom Press, Oxford, Third Edition.

    MATH  Google Scholar 

  • Johnson, W.E. (1932). Probability: the deductive and inductive problems. Mind, 49, 409–423.

    Article  Google Scholar 

  • Laplace (1995). Philosophical essays in probabilities. Springer Verlag, New York.

    Google Scholar 

  • Lewand, R.E. (2008). Relative frequencies of letters in general English plain text. Cryptographical Mathematics.

  • Lijoi, A., Mena, H.R. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika, 94, 769–786.

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B.G. and Roeder, K. (1987). A unified treatment of integer parameter models. J. Amer. Statist. Assoc., 82, 758–764.

    Article  MathSciNet  MATH  Google Scholar 

  • Mao, C.X. (2004). Predicting the conditional probability of discovering a new class. J. Amer. Statist. Assoc., 99, 1108–1118.

    Article  MathSciNet  MATH  Google Scholar 

  • Marchand, J.P. and Schroeck, F.E. (1982). On the estimation of the number of equally likely classes in a population. Comm. Statist. Theory Methods, 11, 1139– 1146.

    Article  MathSciNet  MATH  Google Scholar 

  • Orlitsky, A., Santhanam, N.P. and Zhang, J. (2003). Always good Turing: asimptotically optimal probability estimation. Science, 302, 427–431.

    Article  MathSciNet  MATH  Google Scholar 

  • Pietronero, L., Tosatti, E., Tosatti, V. and Vespignani, A. (2001). Explaining theuneven distribution of numbers in nature: The laws of Benford and Zipf. Phys. A, 293, 297–304.

    Article  MATH  Google Scholar 

  • Shen, T-J., Chao, A. and Lin, C-F. (2003). Predicting the number of new species in further taxonomic sampling. Ecology, 84, 798–804.

    Article  Google Scholar 

  • Tao, T. (2009). Benford’s law, Zipf’s law, and the Pareto distribution, Terence Tao’s blog. http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/

  • Zabell, S.L. (1982). W. E Johnson’s “Sufficientness” postulate. Ann. Statist., 10, 1090–1099.

    Article  MathSciNet  Google Scholar 

  • Zipf, G.K. (1935). The psychobiology of language; an introduction to dynamic philology. Houghton Mifflin, Boston.

    Google Scholar 

Download references

Acknowledgement

This work was done during visits by one of us (CCAS) to the Università di Roma, Tor Vergata; Università di Milano-Bicocca; and Università di Firenze. He takes pleasure in thanking those universities for their warm hospitality and GNAMPA for its support. We would also like to thank J.S. Rao, M. Scarsini, J. Sethuraman and S.R.S. Varadhan for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chelluri C. A. Sastri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cecconi, L., Gandolfi, A. & Sastri, C.C.A. A new estimator for the number of species in a population. Sankhya A 74, 80–100 (2012). https://doi.org/10.1007/s13171-012-0012-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-012-0012-x

AMS (2000) subject classification

Keywords and phrases

Navigation