A new estimator for the number of species in a population

Cecconi, Lorenzo; Gandolfi, Alberto; Sastri, Chelluri C. A.

doi:10.1007/s13171-012-0012-x

A new estimator for the number of species in a population

Published: 23 November 2012

Volume 74, pages 80–100, (2012)
Cite this article

Sankhya A Aims and scope Submit manuscript

Lorenzo Cecconi¹,
Alberto Gandolfi¹ &
Chelluri C. A. Sastri^2,3

162 Accesses
2 Citations
Explore all metrics

Abstract

We consider the classic problem of estimating T, the total number of species in a population, from repeated counts in a simple random sample. We first show that the frequently used Chao-Lee estimator can in fact be obtained by Bayesian methods with a Dirichlet prior, and then use such clarification to develop a new estimator; numerical tests and some real experiments show that the new estimator is more flexible than existing ones, in the sense that it adapts to changes in the normalized interspecies variance γ ². Our method involves simultaneous estimation of T, γ ², and of the parameter λ in the Dirichlet prior, and the only limitation seems to come from the required convergence of the prior which imposes the restriction γ ² ≤ 1. We also obtain confidence intervals for T and an estimation of the species’ distribution. Some numerical examples are given, together with applications to sampling from a Census database closely following Benford’s law, showing good performances of the new estimator, even beyond γ ² = 1. Tests on confidence intervals show that the coverage frequency appears to be in good agreement with the desired confidence level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new diversity estimator

Article Open access 15 September 2017

One Step Entropy Variation in Sequential Sampling of Species for the Poisson-Dirichlet Process

Article 20 March 2023

A Note on Marginal Count Distributions for Diversity Estimation

References

Benford, F. (1938). The law of anomalous numbers. Proc. Am. Philos. Soc., 78, 551–572.
Google Scholar
Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete multivariate analysis: theory and practice. MIT Press, Cambridge.
MATH Google Scholar
Boender, C.G.E. and Rinnoy Kan, A.H.G. (1987). A multinomial Bayesan approach to the estimation of population and vocabulary size. Biometrika, 74, 849–856.
Article MathSciNet MATH Google Scholar
Böhning, D. and Schön, D. (2005). Nonparametric maximum likelihood estimation of population size based on the counting distribution. J. R. Stat. Soc. Ser. C Appl. Stat, 54, Part 4, 721–737.
Article MATH Google Scholar
Böhning, D., Suppawattanabe, B., Kusolvisitkul, W. and Vivatwongkasem, C. (2004). Estimating the number of drug users in Bangkok 2001: A capture-recapture approach using repeated entries in the list. Eur. J. Epidemiol., 19, 1075–1083.
Article Google Scholar
Brose, U., Martinez, M.D. and Williams, R.J. (2003). Estimating species richness: sensitivity to sample coverage and insensitivity to spatial patterns. Ecology, 84, 2364–2377.
Article Google Scholar
Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: a review. J. Amer. Statist. Assoc., 88, 364–373.
Google Scholar
Burnahm, K.P. and Overton, W.S. (1979). Robust estimation of population size when capture probabilities vary among animals. Ecology, 60, 927–936.
Article Google Scholar
Burton, D. (2005). The history of mathematics: an introduction. McGraw-Hill.
Chao, A. (1984). Non-parametric estimation of the number of classes in a population. Scand. J. Stat., 11, 265–270.
Google Scholar
Chao, A. (2004). Species richness estimation. In Encyclopedia of Statistical Sciences (N. Balakrishnan, C. B. Read and B. Vidakovic, eds.). Wiley, New York.
Google Scholar
Chao, A. and Lee, S.M. (1992). Estimating the number of classes via sample coverage. J. Amer. Statist. Assoc., 87, 210–217.
Article MathSciNet MATH Google Scholar
Chao, A., Ma, M.-C. and Yang, M.C.K. (1993). Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika, 80, 193–201.
Article MathSciNet MATH Google Scholar
Chao, A., Hwang, W.-H., Chen, Y.-C. and Kuo, C.-Y. (2000). Estimating the number of shared species in two communities. Statist. Sinica, 10, 227–246.
MathSciNet MATH Google Scholar
Church, K.W., Gale, W.A. (1991). Enhanced Good-Turing and Cat-Cal: two new methods for estimating probabilities of English bigrams. Comput. Speech Lang., 5, 19–54.
Article Google Scholar
Darroch, J.N. and Ratcliff (1980). A Note on Capture-Recapture Estimation. Biometrics, 36, 149–153.
Article MathSciNet MATH Google Scholar
Efron, B. and Thisted, R. (1976). Estimating the number of unseen species: how many words did Shakespeare know? Biometrika, 63, 435–467.
MATH Google Scholar
Esty, W.W. (1985). Estimation of the number of classes in a population and the coverage of a sample. Mathematical Scientist, 10, 41–50.
MathSciNet MATH Google Scholar
Esty, W.W. (1986). The size of a coverage. Numismatic Chronicle, 146, 185–215.
MathSciNet Google Scholar
Fewster, R.M. (2009). A simple explanation of Benford’s Law. Am. Stat., 63, 26–32.
Article MathSciNet Google Scholar
Gandolfi, A. and Sastri, C.C.A. (2004). Nonparametric estimations about species not observed in a random sample. Milan J. Math 72, 81–105.
Article MathSciNet MATH Google Scholar
Good, I.J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–266.
MathSciNet MATH Google Scholar
Good, I.J. (1965). The estimation of probabilities: an essay on modern bayesian method. Research Monograph No. 30, MIT Press, Cambridge, MA.
Google Scholar
Good, I.J. (1967). A Bayesian significance test for multinomial distributions. J. Roy. Statist. Soc. Ser. B, 29, 399–431.
MathSciNet MATH Google Scholar
Good, I.J. and Toulmin, G. (1956). The number of new species and the increase in population coverage when a sample is increased. Biometrika, 43, 45–63.
MathSciNet MATH Google Scholar
Harris, B. (1968). Statistical inference in the classical occupancy problem: unbiased estimation of the number of classes. J. Amer. Statist. Assoc., 63, 837– 847.
Article MathSciNet MATH Google Scholar
Hill, T.P. (1995). The significant-digit phenomenon. Am. Math. Month., 102, 322–327.
Article MATH Google Scholar
Jeffreys, H. (1961). Theory of probability. Clarendom Press, Oxford, Third Edition.
MATH Google Scholar
Johnson, W.E. (1932). Probability: the deductive and inductive problems. Mind, 49, 409–423.
Article Google Scholar
Laplace (1995). Philosophical essays in probabilities. Springer Verlag, New York.
Google Scholar
Lewand, R.E. (2008). Relative frequencies of letters in general English plain text. Cryptographical Mathematics.
Lijoi, A., Mena, H.R. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika, 94, 769–786.
Article MathSciNet MATH Google Scholar
Lindsay, B.G. and Roeder, K. (1987). A unified treatment of integer parameter models. J. Amer. Statist. Assoc., 82, 758–764.
Article MathSciNet MATH Google Scholar
Mao, C.X. (2004). Predicting the conditional probability of discovering a new class. J. Amer. Statist. Assoc., 99, 1108–1118.
Article MathSciNet MATH Google Scholar
Marchand, J.P. and Schroeck, F.E. (1982). On the estimation of the number of equally likely classes in a population. Comm. Statist. Theory Methods, 11, 1139– 1146.
Article MathSciNet MATH Google Scholar
Orlitsky, A., Santhanam, N.P. and Zhang, J. (2003). Always good Turing: asimptotically optimal probability estimation. Science, 302, 427–431.
Article MathSciNet MATH Google Scholar
Pietronero, L., Tosatti, E., Tosatti, V. and Vespignani, A. (2001). Explaining theuneven distribution of numbers in nature: The laws of Benford and Zipf. Phys. A, 293, 297–304.
Article MATH Google Scholar
Shen, T-J., Chao, A. and Lin, C-F. (2003). Predicting the number of new species in further taxonomic sampling. Ecology, 84, 798–804.
Article Google Scholar
Tao, T. (2009). Benford’s law, Zipf’s law, and the Pareto distribution, Terence Tao’s blog. http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/
Zabell, S.L. (1982). W. E Johnson’s “Sufficientness” postulate. Ann. Statist., 10, 1090–1099.
Article MathSciNet Google Scholar
Zipf, G.K. (1935). The psychobiology of language; an introduction to dynamic philology. Houghton Mifflin, Boston.
Google Scholar

Download references

Acknowledgement

This work was done during visits by one of us (CCAS) to the Università di Roma, Tor Vergata; Università di Milano-Bicocca; and Università di Firenze. He takes pleasure in thanking those universities for their warm hospitality and GNAMPA for its support. We would also like to thank J.S. Rao, M. Scarsini, J. Sethuraman and S.R.S. Varadhan for helpful comments.

Author information

Authors and Affiliations

Dipartimento di Matematica U. Dini, Università di Firenze, Viale Morgagni 67/A, 50134, Firenze, Italy
Lorenzo Cecconi & Alberto Gandolfi
Missouri University of Science and Technology, Rolla, Missouri, USA
Chelluri C. A. Sastri
Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, B3H 3J5, Canada
Chelluri C. A. Sastri

Authors

Lorenzo Cecconi
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Gandolfi
View author publications
You can also search for this author in PubMed Google Scholar
Chelluri C. A. Sastri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chelluri C. A. Sastri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cecconi, L., Gandolfi, A. & Sastri, C.C.A. A new estimator for the number of species in a population. Sankhya A 74, 80–100 (2012). https://doi.org/10.1007/s13171-012-0012-x

Download citation

Received: 16 December 2009
Revised: 21 November 2010
Published: 23 November 2012
Issue Date: February 2012
DOI: https://doi.org/10.1007/s13171-012-0012-x

AMS (2000) subject classification

Keywords and phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new estimator for the number of species in a population

Abstract

Access this article

Similar content being viewed by others

A new diversity estimator

One Step Entropy Variation in Sequential Sampling of Species for the Poisson-Dirichlet Process

A Note on Marginal Count Distributions for Diversity Estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS (2000) subject classification

Keywords and phrases

Navigation

A new estimator for the number of species in a population

Abstract

Access this article

Similar content being viewed by others

A new diversity estimator

One Step Entropy Variation in Sequential Sampling of Species for the Poisson-Dirichlet Process

A Note on Marginal Count Distributions for Diversity Estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS (2000) subject classification

Keywords and phrases

Search

Navigation