Combining Evidence Across Independent Strata

Finkelstein, Michael O.; Levin, Bruce

doi:10.1007/978-1-4419-5985-0_8

Michael O. Finkelstein⁴ &
Bruce Levin⁵

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

1824 Accesses

Abstract

Quite often, a party seeking to show statistical significance combines data from different sources to create larger numbers, and hence greater significance for a given disparity. Conversely, a party seeking to avoid finding significance disaggregates data insofar as possible. In a discrimination suit brought by female faculty members of a medical school, plaintiffs aggregated faculty data over several years, while the school based its statistics on separate departments and separate years (combined, however, as discussed below).

The original version of the book was revised. An erratum can be found at DOI 10.1007/978-1-4419-5985-0_15

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is an instance of “Simpson’s Paradox.” In the Berkeley data, there was substantial variation in the disparity between acceptance rates from department to department, although, even if there were a fixed level of disparity, comparing aggregated rates could still be misleading. Suppose, for example, Department A had an overall acceptance rate of 50%, which was the same for men and women, while Department B had an overall acceptance rate of 10%, also the same the men and women. If 80 men and 20 women apply to Department A, while 20 men and 80 women apply to Department B, then of the 100 men, 42 will be accepted while of the 100 women, only 18 will be accepted. Thus, the odds ratios equal 1 for each department separately, but the odds ratio in the aggregated data is 3.3.
2.
If we assume there is a constant odds ratio on promotion comparing black and white employees, then it can be shown that the random variable b has a Bin(n, P) distribution with P given by \( P = \Omega /\left(\Omega +1\right) \). A maximum conditional likelihood estimate of P given n is P = b/n, and a maximum likelihood estimate of the odds ratio on promotion is \( \Omega =b/c \). Binomial confidence intervals for P can be transformed into corresponding intervals for Ω via \( \Omega =P/\left(1-P\right) \).
3.
Note that if both tests are conducted with the intention to quote the more (or less) significant result, then Bonferroni’s correction indicates that to limit this procedure’s Type I error, each component test should be conducted at the α/2 level.
4.
In general, if X is a random variable with cumulative distribution function F, then the random variable Y = F(X) is called the probability transform of X. If F is continuous, Y has a uniform distribution on the unit interval from 0 to 1, because the event [Y < p] occurs if and only if X is below its p ^th quantile, which occurs with probability p. In the present application, if X is a random outcome of the test statistic T with null distribution F, then the attained level of significance is Y = P[T ≤ X] = F(X).
5.
See Lancaster, The combination of probabilities arising from data in discrete distributions, 36 Biometrika 370 (1949). We are indebted to Joseph L. Gastwirth for calling our attention to this adjustment. The Lancaster adjustment actually overcorrects. The exact P-value is 0.024, which is above the P-value given by Fisher’s test with Lancaster’s correction. In this case, the binomial P-value referred to in the text comes considerably closer to the correct figure.
6.
Robins, Greenland, & Breslow, A general estimator for the variance of the Mantel-Haenszel odds ratio, 124 Am. J. Epidemiology 719 (1986).
7.
Conveniently, the variance-covariance matrix has a known form that again depends only on the table margins. The multivariate generalization takes the form \( {X}^2=\left\{S-{E}_0(S)\right\}{\left\{Co{v}_0(S)\right\}}^{-1}\left\{S-{E}_0(S)\right\}\mathit{\hbox{'}} \) in standard matrix notation, where {Cov ₀(S)}^–1 is the inverse of the (k–1)×(k–1) covariance matrix for S under the null hypothesis.
8.
This point is quite apart from sampling variability. Even in large samples where sampling variability may be ignored, systematic differences between studies remain.
9.
For values of ψ not close to SRR, V may need to be recomputed when it depends on ψ.
10.
For combining the evidence about odds ratios from several independent fourfold tables, the Mantel-Haenszel chi-squared procedure and its associated estimate of the assumed common odds ratio are available. See Section 8.1 at p. 250. The method given here will be equivalent to the Mantel-Haenszel procedure when each of the fourfold tables has large margins.
11.
Technically, we assume only that E[b _i|β _i, σ _i ²] = β _i, Var[b _i|β _i, σ _i ²] = σ _i ², E[β _i|σ _i ²] = β, and Var[β _i|σ _i ²] = τ ². The normal assumption for b _i given β _i and σ _i ² is used when making large sample inferences about the summary estimate of β. For this purpose it is often assumed that the distribution F is also approximately normal, in which case the summary estimate of β is too. However, the assumption of normality for F is often questionable and a source of vulnerability for the analyst.

Author information

Authors and Affiliations

New York, NY, USA
Michael O. Finkelstein
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
Bruce Levin

Authors

Michael O. Finkelstein
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Levin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Finkelstein, M.O., Levin, B. (2015). Combining Evidence Across Independent Strata. In: Statistics for Lawyers. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5985-0_8

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5985-0_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5984-3
Online ISBN: 978-1-4419-5985-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics