Multiple hypothesis testing in experimental economics

List, John A.; Shaikh, Azeem M.; Xu, Yang

doi:10.1007/s10683-018-09597-5

Multiple hypothesis testing in experimental economics

Original Paper
Published: 29 January 2019

Volume 22, pages 773–793, (2019)
Cite this article

Experimental Economics Aims and scope Submit manuscript

8762 Accesses
228 Citations
58 Altmetric
Explore all metrics

Abstract

The analysis of data from experiments in economics routinely involves testing multiple null hypotheses simultaneously. These different null hypotheses arise naturally in this setting for at least three different reasons: when there are multiple outcomes of interest and it is desired to determine on which of these outcomes a treatment has an effect; when the effect of a treatment may be heterogeneous in that it varies across subgroups defined by observed characteristics and it is desired to determine for which of these subgroups a treatment has an effect; and finally when there are multiple treatments of interest and it is desired to determine which treatments have an effect relative to either the control or relative to each of the other treatments. In this paper, we provide a bootstrap-based procedure for testing these null hypotheses simultaneously using experimental data in which simple random sampling is used to assign treatment status to units. Using the general results in Romano and Wolf (Ann Stat 38:598–633, 2010), we show under weak assumptions that our procedure (1) asymptotically controls the familywise error rate—the probability of one or more false rejections—and (2) is asymptotically balanced in that the marginal probability of rejecting any true null hypothesis is approximately equal in large samples. Importantly, by incorporating information about dependence ignored in classical multiple testing procedures, such as the Bonferroni and Holm corrections, our procedure has much greater ability to detect truly false null hypotheses. In the presence of multiple treatments, we additionally show how to exploit logical restrictions across null hypotheses to further improve power. We illustrate our methodology by revisiting the study by Karlan and List (Am Econ Rev 97(5):1774–1793, 2007) of why people give to charitable causes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Mixed methods research: what it is and what it could be

Article Open access 29 March 2019

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

References

Anderson, M. (2008). Multiple inference and gender differences in the effects of early intervention: A re-evaluation of the abecedarian, perry preschool, and early training projects. Journal of the American Statistical Association, 103(484), 1481–1495.
Article Google Scholar
Bettis, R. A. (2012). The search for asterisks: Compromised statistical tests and flawed theories. Strategic Management Journal, 33(1), 108–113.
Article Google Scholar
Bhattacharya, J., Shaikh, A. M., & Vytlacil, E. (2012). Treatment effect bounds: An application to swan-ganz catheterization. Journal of Econometrics, 168(2), 223–243.
Article Google Scholar
Bonferroni, C. E. (1935). Il calcolo delle assicurazioni su gruppi di teste. Rome: Tipografia del Senato.
Google Scholar
Bugni, F., Canay, I., & Shaikh, A. (2015). Inference under covariate-adaptive randomization. Technical report, cemmap working paper, Centre for Microdata Methods and Practice.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., et al. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280), 1433–1436.
Article Google Scholar
Fink, G., McConnell, M., & Vollmer, S. (2014). Testing for heterogeneous treatment effects in experimental data: False discovery risks and correction procedures. Journal of Development Effectiveness, 6(1), 44–57.
Article Google Scholar
Flory, J. A., Gneezy, U., Leonard, K. L., & List, J. A. (2015a). Gender, age, and competition: The disappearing gap. Unpublished Manuscript.
Flory, J. A., Leibbrandt, A., & List, J. A. (2015b). Do competitive workplaces deter female workers? A large-scale natural field experiment on job-entry decisions. The Review of Economic Studies, 82(1), 122–155.
Article Google Scholar
Gneezy, U., Niederle, M., & Rustichini, A. (2003). Performance in competitive environments: Gender differences. The Quarterly Journal of Economics, 118(3), 1049–1074.
Article Google Scholar
Heckman, J., Moon, S. H., Pinto, R., Savelyev, P., & Yavitz, A. (2010). Analyzing social experiments as implemented: A reexamination of the evidence from the highscope perry preschool program. Quantitative Economics, 1(1), 1–46.
Article Google Scholar
Heckman, J. J., Pinto, R., Shaikh, A. M., & Yavitz, A. (2011). Inference with imperfect randomization: The case of the perry preschool program. National Bureau of Economic Research Working Paper w16935.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.
Google Scholar
Hossain, T., & List, J. A. (2012). The behavioralist visits the factory: Increasing productivity using simple framing manipulations. Management Science, 58(12), 2151–2167.
Article Google Scholar
Ioannidis, J. (2005). Why most published research findings are false. PLoS Med, 2(8), e124.
Article Google Scholar
Jennions, M. D., & Moller, A. P. (2002). Publication bias in ecology and evolution: An empirical assessment using the ‘trim and fill’ method. Biological Reviews of the Cambridge Philosophical Society, 77(02), 211–222.
Article Google Scholar
Karlan, D., & List, J. A. (2007). Does price matter in charitable giving? Evidence from a large-scale natural field experiment. The American Economic Review, 97(5), 1774–1793.
Article Google Scholar
Kling, J., Liebman, J., & Katz, L. (2007). Experimental analysis of neighborhood effects. Econometrica, 75(1), 83–119.
Article Google Scholar
Lee, S., & Shaikh, A. M. (2014). Multiple testing and heterogeneous treatment effects: Re-evaluating the effect of progresa on school enrollment. Journal of Applied Econometrics, 29(4), 612–626.
Article Google Scholar
Lehmann, E., & Romano, J. (2005). Generalizations of the familywise error rate. The Annals of Statistics, 33(3), 1138–1154.
Article Google Scholar
Lehmann, E. L., & Romano, J. P. (2006). Testing statistical hypotheses. Berlin: Springer.
Google Scholar
Levitt, S. D., List, J. A., Neckermann, S., & Sadoff, S. (2012). The behavioralist goes to school: Leveraging behavioral economics to improve educational performance. National Bureau of Economic Research w18165.
List, J. A., & Samek, A. S. (2015). The behavioralist as nutritionist: Leveraging behavioral economics to improve child food choice and consumption. Journal of Health Economics, 39, 135–146.
Article Google Scholar
Machado, C., Shaikh, A., Vytlacil, E., & Lunch, C. (2013). Instrumental variables, and the sign of the average treatment effect. Unpublished Manuscript, Getulio Vargas Foundation, University of Chicago, and New York University. [2049].
Maniadis, Z., Tufano, F., & List, J. A. (2014). One swallow doesn’t make a summer: New evidence on anchoring effects. The American Economic Review, 104(1), 277–290.
Article Google Scholar
Niederle, M., & Vesterlund, L. (2007). Do women shy away from competition? Do men compete too much? The Quarterly Journal of Economics, 122(3), 1067–1101.
Article Google Scholar
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia ii. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615–631.
Article Google Scholar
Romano, J. P., & Shaikh, A. M. (2006a). On stepdown control of the false discovery proportion. In Lecture Notes-Monograph Series (pp. 33–50).
Romano, J. P., & Shaikh, A. M. (2006b). Stepup procedures for control of generalizations of the familywise error rate. The Annals of Statistics, 34, 1850–1873.
Article Google Scholar
Romano, J. P., & Shaikh, A. M. (2012). On the uniform asymptotic validity of subsampling and the bootstrap. The Annals of Statistics, 40(6), 2798–2822.
Article Google Scholar
Romano, J. P., Shaikh, A. M., & Wolf, M. (2008a). Control of the false discovery rate under dependence using the bootstrap and subsampling. Test, 17(3), 417–442.
Article Google Scholar
Romano, J. P., Shaikh, A. M., & Wolf, M. (2008b). Formalized data snooping based on generalized error rates. Econometric Theory, 24(02), 404–447.
Article Google Scholar
Romano, J. P., & Wolf, M. (2005). Stepwise multiple testing as formalized data snooping. Econometrica, 73(4), 1237–1282.
Article Google Scholar
Romano, J. P., & Wolf, M. (2010). Balanced control of generalized error rates. The Annals of Statistics, 38, 598–633.
Article Google Scholar
Sutter, M., & Glätzle-Rützler, D. (2014). Gender differences in the willingness to compete emerge early in life and persist. Management Science, 61(10), 2339–23354.
Article Google Scholar
Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p value adjustment (Vol. 279). New York: Wiley.
Google Scholar

Download references

Acknowledgements

We would like to thank Joseph P. Romano for helpful comments on this paper. We also thank Joseph Seidel for his excellent research assistance. The research of the second author was supported by National Science Foundation Grants DMS-1308260, SES-1227091, and SES-1530661.

Author information

Authors and Affiliations

Department of Economics, University of Chicago, 5757 S University Ave, Chicago, IL, 60637, USA
John A. List, Azeem M. Shaikh & Yang Xu

Authors

John A. List
View author publications
You can also search for this author in PubMed Google Scholar
Azeem M. Shaikh
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Xu.

Additional information

Documentation of our procedures and our Stata and Matlab code can be found at https://github.com/seidelj/mht.

Appendix

1.1 Proof of Theorem 3.1

First note that under Assumption 2.1, $Q\in \omega _{s}$ if and only if $P\in {\tilde{\omega }}_{s}$, where

$${\tilde{\omega }}_{s}=\{P(Q):Q\in \varOmega ,E_{P}[Y_{i,k}|D_{i}=d,Z_{i}=z]=E_{P}[Y_{i,k}|D_{i}=d',Z_{i}=z]\}.$$

The proof of this result now follows by verifying the conditions of Corollary 5.1 in Romano and Wolf (2010). In particular, we verify Assumptions B.1–B.4 in Romano and Wolf (2010).

In order to verify Assumption B.1 in Romano and Wolf (2010), let

$$T_{s,n}^{*}(P)=\sqrt{n}\left( \frac{1}{n_{d,z}}\sum _{1\le i\le n:D_{i}=d,Z_{i}=z}(Y_{i,k}-{\tilde{\mu }}_{k|d,z}(P))-\frac{1}{n_{d',z}}\sum _{1\le i\le n:D_{i}=d',Z_{i}=z}(Y_{i,k}-{\tilde{\mu }}_{k|d',z}(P))\right),$$

and note that

$$T_{n}^{*}(P)=(T_{s,n}^{*}(P):s\in {\mathcal {S}})=f(A_{n}(P),B_{n}),$$

where

$$A_{n}(P)=\frac{1}{\sqrt{n}}\sum _{1\le i\le n}A_{n,i}(P),$$

with $A_{n,i}(P)$ equal to the $2|{\mathcal {S}}|$-dimensional vector formed by stacking vertically for $s\in {\mathcal {S}}$ the terms

$$\left( \begin{array}{c} (Y_{i,k}-{\tilde{\mu }}_{k|d,z}(P))I\{D_{i}=d,Z_{i}=z\}\\ (Y_{i,k}-{\tilde{\mu }}_{k|d',z}(P))I\{D_{i}=d',Z_{i}=z\} \end{array}\right),$$

(10)

and $B_{n}$ is the $2|{\mathcal {S}}|$-dimensional vector formed by stacking vertically for $s\in {\mathcal {S}}$ the terms

$$\left( \begin{array}{c} \frac{1}{\frac{1}{n}\sum _{1\le i\le n}I\{D_{i}=d,Z_{i}=z\}}\\ -\frac{1}{\frac{1}{n}\sum _{1\le i\le n}I\{D_{i}=d',Z_{i}=z\}} \end{array}\right).$$

(11)

and $f:{\mathbf {R}}^{2|{\mathcal {S}}|}\times {\mathbf {R}}^{2|{\mathcal {S}}|}\rightarrow {\mathbf {R}}^{2|{\mathcal {S}}|}$ is the function of $A_{n}(P)$ and $B_{n}$ whose sth argument for $s\in {\mathcal {S}}$ is given by the inner product of the sth pair of terms in $A_{n}(P)$ and the sth pair of terms in $B_{n}$, i.e., the inner product of (10) and (11). The weak law of large numbers and central limit theorem imply that

$$B_{n}{\mathop {\rightarrow }\limits ^{P}}B(P),$$

where B(P) is the $2|{\mathcal {S}}|$-dimensional vector formed by stacking vertically for $s\in {\mathcal {S}}$ the terms

$$\left( \begin{array}{c} \frac{1}{P\{D_{i}=d,Z_{i}=z\}}\\ -\frac{1}{P\{D_{i}=d',Z_{i}=z\}} \end{array}\right).$$

Next, note that $E_{P}[A_{n,i}(P)]=0$. Assumption 2.3 and the central limit theorem therefore imply that

$$A_{n}(P){\mathop {\rightarrow }\limits ^{d}}N(0,V_{A}(P))$$

for an appropriate choice of $V_{A}(P)$. In particular, the diagonal elements of $V_{A}(P)$ are of the form

$${\tilde{\sigma }}_{k|d,z}^{2}(P)P\{D_{i}=d,Z_{i}=z\}.$$

The continuous mapping theorem thus implies that

$$T_{n}^{*}(P){\mathop {\rightarrow }\limits ^{d}}N(0,V(P))$$

for an appropriate variance matrix V(P). In particular, the sth diagonal element of V(P) is given by

$$\frac{{\tilde{\sigma }}_{k|d,z}^{2}(P)}{P\{D_{i}=d,Z_{i}=z\}}+\frac{{\tilde{\sigma }}_{k|d',z}^{2}(P)}{P\{D_{i}=d',Z_{i}=z\}}.$$

(12)

In order to verify Assumptions B.2–B.3 in Romano and Wolf (2010), it suffices to note that (12) is strictly greater than zero under our assumptions. Note that it is not required that V(P) be non-singular for these assumptions to be satisfied.

In order to verify Assumption B.4 in Romano and Wolf (2010), we first argue that

$$T_{n}^{*}(P_{n}){\mathop {\rightarrow }\limits ^{d}}N(0,V(P))$$

(13)

under $P_{n}$ for an appropriate sequence of distributions $P_{n}$ for $(Y_{i},D_{i},Z_{i})$. To this end, assume that

(a)
$P_{n}{\mathop {\rightarrow }\limits ^{d}}P$.
(b)
${\tilde{\mu }}_{k|d,z}(P_{n})\rightarrow {\tilde{\mu }}_{k|d,z}(P)$.
(c)
$B_{n}{\mathop {\rightarrow }\limits ^{P_{n}}}B(P)$.
(d)
$\text {Var}_{P_{n}}[A_{n,i}(P_{n})]\rightarrow \text {Var}_{P}[A_{n,i}(P)]$.

Under (a) and (b), it follows that $A_{n,i}(P_{n}){\mathop {\rightarrow }\limits ^{d}}A_{n,i}(P)$ under $P_{n}$. By arguing as in Theorem 15.4.3 in Lehmann and Romano (2006) and using (d), it follows from the Lindeberg–Feller central limit theorem that

$$A_{n}(P_{n}){\mathop {\rightarrow }\limits ^{d}}N(0,V_{A}(P))$$

under $P_{n}$. It thus follows from (c) and the continuous mapping theorem that (13) holds under $P_{n}$. Assumption B.4 in Romano and Wolf (2010) now follows simply by nothing that the Glivenko-Cantelli theorem, strong law of large numbers and continuous mapping theorem ensure that ${\hat{P}}_{n}$ satisfies (a)–(d) with probability one under P.

Table 1 Multiple outcomes

Full size table

Table 2 Multiple subgroups

Full size table

Table 3 Multiple treatments (Comparing multiple treatments with a control)

Full size table

Table 4 Multiple treatments (All pairwise comparisons across multiple treatments and a control)

Full size table

Table 5 Multiple outcomes, subgroups, and treatments

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

List, J.A., Shaikh, A.M. & Xu, Y. Multiple hypothesis testing in experimental economics. Exp Econ 22, 773–793 (2019). https://doi.org/10.1007/s10683-018-09597-5

Download citation

Received: 09 October 2017
Revised: 10 November 2018
Accepted: 19 November 2018
Published: 29 January 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10683-018-09597-5

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple hypothesis testing in experimental economics

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Mixed methods research: what it is and what it could be

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Theorem 3.1

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Multiple hypothesis testing in experimental economics

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Mixed methods research: what it is and what it could be

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Theorem 3.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation