# Encyclopedia of Behavioral Medicine

Living Edition
| Editors: Marc Gellman

# Weighted Sample

• Jane Monaco
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-6439-6_1082-2

## Definition

In a weighted sample, not all sample observations contribute equally to the estimate of a population parameter.

Investigators are often interested in estimating quantities (such as means, counts, or proportions) in a population by using a representative sample selected from that population. Probability samples, defined as samples in which each sampling unit has a known, nonzero probability of selection based on the sampling design, allow investigators to compute estimates of population parameters. The most straightforward type of probability sampling design, a simple random sample (SRS), is a selection method in which each sample has the same probability of being selected. In an SRS, the probability of selection of each member in the population is the same.

The estimation of the population mean is straightforward for the SRS design. Let n = sample size, N = population size. Also, let {Y1, …, YN} be the population values and {y1, …, yn} be the sample values. We define the overall sampling fraction as $$f=\frac{n}{N}$$. Then $$\overline{Y}=\frac{1}{N}\sum \limits_{i=1}^N{Y}_i$$ the population mean, can be estimated by the statistic,
$$\overline{y}=\frac{1}{n}\sum \limits_{i=1}^n{y}_i$$

Under this SRS design, each sample observation, yi, contributes equally to the estimate, $$\overline{y}$$, of the population mean.

More complicated sampling designs, such as stratified sampling, may be chosen by investigators for various reasons including potential efficiency and the ability to use different sampling methods for different strata. In stratified sampling, the population is grouped by some characteristic (such as gender, geographic location, or age category), and a sample is selected within each subgroup separately. In this stratified design, the probability of selecting an individual is likely not the same for all individuals, but rather depends on the individual’s subgroup (stratum). For example, in a study of illicit drug use among adolescents in a particular city, the population could be stratified into two age groups, middle school and high school. The probability of selection of a particular student will depend on the sample size and population size within that student’s age group. Therefore, the sample statistics using a stratified design must be weighted to account for the unequal selection probability of observations.

To compute a weighted sample mean for a stratified sample, first consider the partition of the population into H mutually exclusive strata. Let N h = the population size in the hth stratum, n h = the sample size in the hth stratum so that $$N=\sum \limits_{h=1}^H{N}_h$$ and $$n=\sum \limits_{h=1}^H{n}_h$$. The stratum specific sampling fraction is $${f}_h=\frac{n_h}{N_h}$$. We can compute each stratum-specific mean, $${\overline{y}}_h$$, as the average of the nh units in the hth stratum. The weighted sample mean is computed as weighted sum of the stratum specific means: $${\overline{y}}_w=\frac{1}{N}\sum \limits_{h=1}^H{N}_h{\overline{y}}_h$$. This weighted sample mean can be shown to provide an unbiased estimate of the population mean, $$\overline{Y}$$.

## Cross-References

1. Foreman, E. K. (1991). Survey sampling principles. New York: M. Dekker.Google Scholar
2. Kish, L. (1965). Survey sampling. New York: Wiley.Google Scholar
3. Korn, E. L., & Graubard, B. I. (1995). Examples of differing weighted and unweighted estimates from a sample survey. The American Statistician, 49(3), 291–295.Google Scholar