Mineral Species Frequency Distribution Conforms to a Large Number of Rare Events Model: Prediction of Earth’s Missing Minerals

Abstract

A population model is introduced to describe the mineral species frequency distribution. Mineral species coupled with their localities conform to a large number of rare events (LNRE) distribution: 100 common mineral species occur at more than 1,000 localities, whereas \(34 \,\%\) of the approved 4,831 mineral species are found at only one or two localities. LNRE models formulated in terms of a structural type distribution allow the estimation of Earth’s undiscovered mineralogical diversity and the prediction of the percentage of observed mineral species that would differ if Earth’s history were replayed.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Baayen RH (1993) Statistical models for word frequency distributions: a linguistic evaluation. Comput Humanit 26:347–363

    Article  Google Scholar 

  2. Baayen RH (2001) Word frequency distributions, text, speech and language technology, vol 18. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  3. Baroni M, Evert S (2007) Words and echoes: assessing and mitigating the non-randomness problem in word frequency distribution modeling. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 904–911

  4. Baroni M, Evert S (2005) Testing the extrapolation quality of word frequency models. In: Danielsson P, Wagenmakers M (eds) Proceedings of corpus linguistics 2005, Birmingham, UK. The corpus linguistics conference series, vol 1

  5. Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50(6):971–982

    Article  Google Scholar 

  6. Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88(421):364–373

    Google Scholar 

  7. Bunge J, Willis A, Walsh F (2014) Estimating the number of species in microbial diversity studies. Annu Rev Stat Appl 1:427–445

    Google Scholar 

  8. Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65(3):625–633

    Article  Google Scholar 

  9. Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5):927–936

    Article  Google Scholar 

  10. Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11(4):265–270

    Google Scholar 

  11. Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58(3):531–539

    Article  Google Scholar 

  12. Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217

    Article  Google Scholar 

  13. Chao A, Ma MC, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193–201

    Article  Google Scholar 

  14. Chao A, Hwang WH, Chen YC, Kuo CY (2000) Estimating the number of shared species in two communities. Stat Sin 10:227–246

    Google Scholar 

  15. Efron B, Thisted R (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrica 63(3):435–447

    Google Scholar 

  16. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, monographs on statistics and applied probability, vol 57. Chapman & Hall/CRC, London

    Google Scholar 

  17. Evert S (2004) A simple LNRE model for random character sequences. In: Proceedings of the 7èmes Journées Internationales d’Analyse Statistique des Données Textuelles, Louvain-la-Neuve, pp 411–422

  18. Evert S, Baroni M (2007) zipfR: word frequency distributions in R. In: Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session, Prague, pp 29–32

  19. Evert S, Baroni M (2008) Statistical models for word frequency distributions, package zipfR. http://zipfr.r-forge.r-project.org/materials/zipfR_0.6-5.pdf. Accessed 10 Nov 2008

  20. Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12(1):42–58

    Article  Google Scholar 

  21. Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264

  22. Hazen RM, Grew ES, Downs RT, Golden J, Hystad G (2015) Mineral ecology: chance and necessity in the mineral diversity of terrestrial planets. Can Mineral 53(2). doi:10.3749/canmin.1400086

  23. Heller G (1997) Estimation of the number of classes. S Afr Stat J 31:65–90

    Google Scholar 

  24. Keating KA, Quinn JF, Ivie MA, Ivie LL (1998) Estimating the effectiveness of further sampling in species inventories. Ecol Appl 8(4):1239–1249

    Google Scholar 

  25. Khmaladze EV (1987) The statistical analysis of large number of rare events. Tech. Rep. MS-R8804, Department of Mathematical Statistics, Center for Mathematics and Computer Science, CWI, Amsterdam, Netherlands

  26. Khmaladze EV, Chitashvili RJ (1989) Statistical analysis of large number of rare events and related problems. Trans Tbilisi Math Inst 91:196–245

    Google Scholar 

  27. Kyselý J (2010) Coverage probability of bootstrap confidence intervals in heavy-tailed frequency models, with application to precipitation data. Theor Appl Climatol 101:345–361

    Article  Google Scholar 

  28. Ma C, Beckett JR, Rossman GR (2014) Monipite, MoNiP, a new phosphide mineral in a Ca-Al-rich inclusion from the Allende meteorite. Am Mineral 99(1):198–205

    Article  Google Scholar 

  29. Miller RI, Wiegert RG (1989) Documenting completeness, species-area relations, and the species-abundance distribution of a regional flora. Ecology 70(1):16–22

    Article  Google Scholar 

  30. Norris JL, Pollock KH (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environ Ecol Stat 5(4):391–402

    Article  Google Scholar 

  31. Shen TJ, Chao A, Lin CF (2003) Predicting the number of new species in further taxonomic sampling. Ecology 84(3):798–804

    Article  Google Scholar 

  32. Sichel HS (1971) On a family of discrete distributions particularly suited to represent long-tailed frequency data. In: Proceedings of the third symposium on mathematical statistics, Pretoria, pp 51–97

  33. Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547

    Google Scholar 

  34. Sichel HS (1986) Word frequency distributions and type-token characteristics. Math Sci 11:45–72

    Google Scholar 

  35. Soberón J, Llorente J (1993) The use of species accumulation functions for the prediction of species richness. Conserv Biol 7(3):480–488

    Article  Google Scholar 

  36. Solow AR, Polasky S (1999) A quick estimator for taxonomic surveys. Ecology 80(8):2799–2803

    Article  Google Scholar 

  37. Wang JP (2010) Estimating species richness by a Poisson-compound Gamma model. Biometrika 97(3):727–740

    Article  Google Scholar 

  38. Wang JP (2011) SPECIES: an R package for species richness estimation. J Stat Softw 40(9):1–15

    Google Scholar 

Download references

Acknowledgments

Joshua Golden, Edward Grew, and Dimitri Sverjensky provided valuable advice and discussions. We thank the Deep Carbon Observatory, the Keck Foundation, and a private foundation for support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Grethe Hystad.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hystad, G., Downs, R.T. & Hazen, R.M. Mineral Species Frequency Distribution Conforms to a Large Number of Rare Events Model: Prediction of Earth’s Missing Minerals. Math Geosci 47, 647–661 (2015). https://doi.org/10.1007/s11004-015-9600-3

Download citation

Keywords

  • Statistical mineralogy
  • Mineral ecology
  • Mineral frequency distribution