Skip to main content
Log in

Mineral Species Frequency Distribution Conforms to a Large Number of Rare Events Model: Prediction of Earth’s Missing Minerals

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript


A population model is introduced to describe the mineral species frequency distribution. Mineral species coupled with their localities conform to a large number of rare events (LNRE) distribution: 100 common mineral species occur at more than 1,000 localities, whereas \(34 \,\%\) of the approved 4,831 mineral species are found at only one or two localities. LNRE models formulated in terms of a structural type distribution allow the estimation of Earth’s undiscovered mineralogical diversity and the prediction of the percentage of observed mineral species that would differ if Earth’s history were replayed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  • Baayen RH (1993) Statistical models for word frequency distributions: a linguistic evaluation. Comput Humanit 26:347–363

    Article  Google Scholar 

  • Baayen RH (2001) Word frequency distributions, text, speech and language technology, vol 18. Kluwer Academic Publishers, Dordrecht

    Book  Google Scholar 

  • Baroni M, Evert S (2007) Words and echoes: assessing and mitigating the non-randomness problem in word frequency distribution modeling. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 904–911

  • Baroni M, Evert S (2005) Testing the extrapolation quality of word frequency models. In: Danielsson P, Wagenmakers M (eds) Proceedings of corpus linguistics 2005, Birmingham, UK. The corpus linguistics conference series, vol 1

  • Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50(6):971–982

    Article  Google Scholar 

  • Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88(421):364–373

    Google Scholar 

  • Bunge J, Willis A, Walsh F (2014) Estimating the number of species in microbial diversity studies. Annu Rev Stat Appl 1:427–445

    Google Scholar 

  • Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65(3):625–633

    Article  Google Scholar 

  • Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5):927–936

    Article  Google Scholar 

  • Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11(4):265–270

    Google Scholar 

  • Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58(3):531–539

    Article  Google Scholar 

  • Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217

    Article  Google Scholar 

  • Chao A, Ma MC, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193–201

    Article  Google Scholar 

  • Chao A, Hwang WH, Chen YC, Kuo CY (2000) Estimating the number of shared species in two communities. Stat Sin 10:227–246

    Google Scholar 

  • Efron B, Thisted R (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrica 63(3):435–447

    Google Scholar 

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, monographs on statistics and applied probability, vol 57. Chapman & Hall/CRC, London

    Book  Google Scholar 

  • Evert S (2004) A simple LNRE model for random character sequences. In: Proceedings of the 7èmes Journées Internationales d’Analyse Statistique des Données Textuelles, Louvain-la-Neuve, pp 411–422

  • Evert S, Baroni M (2007) zipfR: word frequency distributions in R. In: Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session, Prague, pp 29–32

  • Evert S, Baroni M (2008) Statistical models for word frequency distributions, package zipfR. Accessed 10 Nov 2008

  • Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12(1):42–58

    Article  Google Scholar 

  • Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264

  • Hazen RM, Grew ES, Downs RT, Golden J, Hystad G (2015) Mineral ecology: chance and necessity in the mineral diversity of terrestrial planets. Can Mineral 53(2). doi:10.3749/canmin.1400086

  • Heller G (1997) Estimation of the number of classes. S Afr Stat J 31:65–90

    Google Scholar 

  • Keating KA, Quinn JF, Ivie MA, Ivie LL (1998) Estimating the effectiveness of further sampling in species inventories. Ecol Appl 8(4):1239–1249

    Google Scholar 

  • Khmaladze EV (1987) The statistical analysis of large number of rare events. Tech. Rep. MS-R8804, Department of Mathematical Statistics, Center for Mathematics and Computer Science, CWI, Amsterdam, Netherlands

  • Khmaladze EV, Chitashvili RJ (1989) Statistical analysis of large number of rare events and related problems. Trans Tbilisi Math Inst 91:196–245

    Google Scholar 

  • Kyselý J (2010) Coverage probability of bootstrap confidence intervals in heavy-tailed frequency models, with application to precipitation data. Theor Appl Climatol 101:345–361

    Article  Google Scholar 

  • Ma C, Beckett JR, Rossman GR (2014) Monipite, MoNiP, a new phosphide mineral in a Ca-Al-rich inclusion from the Allende meteorite. Am Mineral 99(1):198–205

    Article  Google Scholar 

  • Miller RI, Wiegert RG (1989) Documenting completeness, species-area relations, and the species-abundance distribution of a regional flora. Ecology 70(1):16–22

    Article  Google Scholar 

  • Norris JL, Pollock KH (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environ Ecol Stat 5(4):391–402

    Article  Google Scholar 

  • Shen TJ, Chao A, Lin CF (2003) Predicting the number of new species in further taxonomic sampling. Ecology 84(3):798–804

    Article  Google Scholar 

  • Sichel HS (1971) On a family of discrete distributions particularly suited to represent long-tailed frequency data. In: Proceedings of the third symposium on mathematical statistics, Pretoria, pp 51–97

  • Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547

    Google Scholar 

  • Sichel HS (1986) Word frequency distributions and type-token characteristics. Math Sci 11:45–72

    Google Scholar 

  • Soberón J, Llorente J (1993) The use of species accumulation functions for the prediction of species richness. Conserv Biol 7(3):480–488

    Article  Google Scholar 

  • Solow AR, Polasky S (1999) A quick estimator for taxonomic surveys. Ecology 80(8):2799–2803

    Article  Google Scholar 

  • Wang JP (2010) Estimating species richness by a Poisson-compound Gamma model. Biometrika 97(3):727–740

    Article  Google Scholar 

  • Wang JP (2011) SPECIES: an R package for species richness estimation. J Stat Softw 40(9):1–15

    Google Scholar 

Download references


Joshua Golden, Edward Grew, and Dimitri Sverjensky provided valuable advice and discussions. We thank the Deep Carbon Observatory, the Keck Foundation, and a private foundation for support.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Grethe Hystad.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hystad, G., Downs, R.T. & Hazen, R.M. Mineral Species Frequency Distribution Conforms to a Large Number of Rare Events Model: Prediction of Earth’s Missing Minerals. Math Geosci 47, 647–661 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: