# Measuring the representativeness of a germplasm collection

**Part of the following topical collections:**

## Abstract

Many germplasm collections aim to preserve most of the genetic diversity present in a population so that the population could be regenerated, which provides genetic resources to ensure food security. This paper proposes a way to measure how well a germplasm collection achieve this goal. In the most common scenario, one has little information regarding the number and statistical distribution of alleles at every locus, and it is thus very difficult to measure the representativeness of the accession. Here, we show how to use samples of allelic diversity at a sample of loci to estimate the representativeness of an accession based on the coverage of a sample with point and interval estimates. Our approach avoids making unrealistic assumptions regarding the number of loci, the bounds for the number of alleles or their frequency distributions. Depending on the sampling scheme of a collection, we differentiate between absolute or relative coverage. Here, we demonstrate this methodology using data from the germplasm collection at the Leibniz Institute of Plant Genetics and Crop Plant Research.

## Keywords

Coverage Allele conservation Seed accession## Notes

### Acknowledgements

The author would like to thank Dr. Marion Roder, who kindly shared the data set used in Huang et al. (2002) paper.

### Author contributions

Carlos Hernandez-Suarez developed the methodology, performed the simulations, wrote the manuscript.

### Compliance with ethical standards

### Conflict of interest

The author declares no conflict of interest.

## References

- Brown AHD (1995) The core collection at the crossroads. In: Hodgkin T, Brown AHD, van Hintum TJL, Morales EAV (eds) Core collections of plant genetic resources. Wiley, Chichester, pp 3–19Google Scholar
- Chao A (1981) On estimating the probability of discovering a new species. Ann Stat 9(6):1339–1342CrossRefGoogle Scholar
- Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217CrossRefGoogle Scholar
- Chao A, Lee SM (1993) Estimating population size for continuous-time capture-recapture models via sample coverage. Biom J 35(1):29–45CrossRefGoogle Scholar
- Darwin C (1866) On the origin of species by means of natural selection: or the preservation of favoured races in the struggle for life. John Murray, LondonGoogle Scholar
- Esty WW (1982) Confidence intervals for the coverage of low coverage samples. Ann Stat 10(1):190–196CrossRefGoogle Scholar
- Esty WW (1983) A normal limit law for a nonparametric estimator of the coverage of a random sample. Ann Stat 11(3):905–912CrossRefGoogle Scholar
- Esty W (1985) Estimation of the number of classes in a population and the coverage of a sample. Math Sci 10:41–50Google Scholar
- Esty WW (1986) The efficiency of good’s nonparametric coverage estimator. Ann Stat 14(3):1257–1260CrossRefGoogle Scholar
- Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4):237–264CrossRefGoogle Scholar
- Good I, Toulmin G (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1–2):45–63CrossRefGoogle Scholar
- Harris B (1959) Determining bounds on integrals with applications to cataloging problems. Ann Math Stat 30(2):521–548CrossRefGoogle Scholar
- Huang SP, Weir B (2001) Estimating the total number of alleles using a sample coverage method. Genetics 159(3):1365–1373PubMedPubMedCentralGoogle Scholar
- Huang X, Börner A, Röder M, Ganal M (2002) Assessing genetic diversity of wheat (
*triticum aestivum*l.) germplasm using microsatellite markers. Theor Appl Genet 105(5):699–707CrossRefPubMedGoogle Scholar - Knott M (1967) Models for cataloguing problems. Ann Math Stat 38(4):1255–1260CrossRefGoogle Scholar
- Lee SM, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97CrossRefPubMedGoogle Scholar
- Lo SH (1992) From the species problem to a general coverage problem via a new interpretation. Ann Stat 20(2):1094–1109CrossRefGoogle Scholar
- Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci 70(12):3321–3323CrossRefPubMedPubMedCentralGoogle Scholar
- Robbins HE (1968) Estimating the total probability of the unobserved outcomes of an experiment. Ann Math Stat 39(1):256–257CrossRefGoogle Scholar
- Starr N (1979) Linear estimation of the probability of discovering a new species. Ann Stat 7(3):644–652CrossRefGoogle Scholar
- van Hintum TJ, Brown AHD, Spillane C, Hodkin T (2000) Core collections of plant genetic resources (IPGRI Technical Bulletin No. 3., Rome, Italy, 2000)Google Scholar
- Zhang C-H, Zhang Z (2009) Asymptotic normality of a nonparametric estimator of sample coverage. Ann Stat 37:2582–2595CrossRefGoogle Scholar