Abstract
This chapter provides an overview of missing data issues that can occur in a meta-analysis. Common approaches to missing data in meta-analysis are discussed. The chapter focuses on the problem of missing data in moderators of effect size. The examples demonstrate the use of maximum likelihood methods and multiple imputation, the only two methods that produce unbiased estimates under the assumption that data are missing at random. The methods discussed in this chapter are most useful in testing the sensitivity of results to missing data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allison, P.D. 2002. Missing data. Thousand Oaks: Sage.
Begg, C.B., and J.A. Berlin. 1988. Publication bias: A problem in interpreting medical data (with discussion). Journal of the Royal Statistical Society Series A 151(2): 419–463.
Buck, S.F. 1960. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society Series B 22(2): 302–303.
Chan, A.-W., A. Hrobjartsson, M.T. Haahr, P.C. Gotzsche, and D.G. Altman. 2004. Empirical evidence for selective reporting of outcomes in randomized trials. Journal of the American Medical Association 291(20): 2457–2465.
Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1): 1–38.
Duval, S. 2005. The Trim and Fill method. In Publication bias in meta-analysis: Prevention, assessment and adjustments, ed. H.R. Rothstein, A.J. Sutton, and M. Borenstein. West Sussex: Wiley.
Duval, S., and R. Tweedie. 2000. Trim and fill: A simple funnel plot based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2): 455–463.
Eagly, A.H., M.C. Johannesen-Schmidt, and M.L. van Engen. 2003. Transformational, transactional, and laissez-faire leadership styles: A meta-analysis comparing women and men. Psychological Bulletin 129(4): 569–592.
Egger, M., G.D. Smith, M. Schneider, and C. Minder. 1997. Bias in meta-analysis detected by a simple, graphical test. British Medical Journal 315(7109): 629–634.
Enders, C.K. 2010. Applied missing data analysis. Methodology in the Social Sciences. New York: Guilford.
Fahrbach, K.R. 2001. An investigation of methods for mixed-model meta-analysis in the presence of missing data. Lansing: Michigan State University.
Glasser, M. 1964. Linear regression analysis with missing observations among the independent variables. Journal of the American Statistical Association 59(307): 834–844.
Hackshaw, A.K., M.R. Law, and N.J. Wald. 1997. The accumulated evidence on lung cancer and environmentaly tobacco smoke. British Medical Journal 315(7114): 980–988.
Haitovsky, Y. 1968. Missing data in regression analysis. Journal of the royal Statistical Society Series B 30(1): 67–82.
Hemminki, E. 1980. Study of information submitted by drug companies to licensing authorities. British Medical Journal 280(6217): 833–836.
Honaker, J., G. King, and M. Blackwell (2011) Amelia II: A program for missing data. http://r.iq.harvard.edu/src/contrib/
Kim, J.-O., and J. Curry. 1977. The treatment of missing data in multivariate analysis. Sociological Methods and Research 6(2): 215–240.
Lipsey, M.W., and D.B. Wilson. 2001. Practical meta-analysis. Thousand Oaks: Sage Publications.
Little, R.J.A., and D.B. Rubin. 1987. Statistical analysis with missing data. New York: Wiley.
Orwin, R.G., and D.S. Cordray. 1985. Effects of deficient reporting on meta-analysis: A conceptual framework and reanalysis. Psychological Bulletin 97(1): 134–147.
Rosenthal, R. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86(3): 638–641.
Rothstein, H.R., A.J. Sutton, and M. Borenstein. 2005. Publication bias in meta-analysis: Prevention, Assessment and Adjustments. West Sussex: Wiley.
Rubin, D.B. 1976. Inference and missing data. Biometrika 63(3): 581–592.
Rubin, D.B. 1987. Multiple imputation for nonresponse in surveys. Wiley, New York, NY
Schafer, J.L. 1997. Analysis of incomplete multivariate data. London: Chapman Hall.
Schafer, J.L. 1999. NORM: Multiple imputation of incomplete multivariate data under a normal model. Software for Windows. University Park: Department of Statistics, Penn State University.
Schafer, J.L., and J.W. Graham. 2002. Missing data: Our view of the state of the art. Psychological Methods 7(2): 147–177.
Shadish, W.R., L. Robinson, and C. Lu. 1999. ES: A computer program and manual for effect size calculation. St. Paul: Assessment Systems Corporation.
Sirin, S.R. 2005. Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research 75(3): 417–453. doi:10.3102/00346543075003417.
Smith, M.L. 1980. Publication bias and meta-analysis. Evaluation in Education 4: 22–24.
Sterne, J.A.C., B.J. Becker, and M. Egger. 2005. The funnel plot. In Publication bias in meta-analysis: Prevention, assessment and adjustment, ed. H.R. Rothstein, A.J. Sutton, and M. Borenstein. West Sussex: Wiley.
Vevea, J.L., and C.M. Woods. 2005. Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychological Methods 10(4): 428–443.
Williamson, P.R., C. Gamble, D.G. Altman, and J.L. Hutton. 2005. Outcome selection biase in meta-analysis. Statistical Methods in Medical Research 14(5): 515–524.
Wilson, D.B. 2010. Practical meta-analysis effect size calculator. Campbell Collaboration. http://www.campbellcollaboration.org/resources/effect_size_input.php. Accessed 16 July 2011.
Yuan, Y.C. 2000. Multiple imputation for missing data: Concepts and new developments. http://support.sas.com/rnd/app/papers/multipleimputation.pdf. Accessed 2 April 2011.
Author information
Authors and Affiliations
Appendix
Appendix
7.1.1 Computing Packages for Computation of the Multiple Imputation Results
There are a number of options for obtaining multiple imputation results in a meta-analysis model. Two freeware programs are available. The first is the program Norm by Schafer and available at http://www.stat.psu.edu/~jls/misoftwa.html. The Norm program runs as a stand alone program on Windows 95/98/NT. The second is a program available in R by Honaker et al. called Amelia II and available at http://gking.harvard.edu/amelia/. Schafer’s norm program was used for the example given earlier.
The program SAS includes two procedures, one for generating the multiple imputations, PROC MI, and a second for analyzing the completed data sets, PROC MIANALYZE. For obtaining the weighted regression results for meta-analysis, the SAS procedure PROC MIANALYZE will have limited utility since the standard errors of the weighted regression coefficients will need to be adjusted as detailed by Lipsey and Wilson (2001). Below is an illustration of the use of PROC MI for the leadership data.
7.1.1.1 R Programs
One program available in R for generating multiple imputations is Amelia II (Honaker et al. 2011). Directions for using the program are available at http://gking.harvard.edu/amelia/. Once the program is loaded into R, the following command was used to generate m = 5 imputed data sets.
> a.out < −amelia(leadimp, m = 5, idvars = "ID")
The imputed data sets can be saved for export into another program to complete the analyses using the command,
>write.amelia(obj = a.out, file.stem = "outdata").
where “obj” refers to the name given to the object with the imputed data sets (the result of using the command Amelia), and “file.stem” provides the name of the data sets that will be written from the program.
Table 7.12 are the weighted regression estimates for the effect size model from each imputation obtained in Amelia. The two variables missing observations are average age of subjects and percent of male leaders. There is variation among the five data sets in their estimates of the regression coefficients. This variation signals that there is some uncertainty in the data set due to missing observations.
Table 7.13 provides the multiply-imputed estimates for the linear model of effect size. These estimates were combined in Excel, and are fairly consistent with the earlier multiple imputation analysis using Schafer’s Norm program. None of the coefficients are significantly different from zero.
7.1.1.2 SAS Proc MI
The SAS procedure PROC MI provides a number of options for analyzing data with missing data. For the example illustrated in this chapter, we use the Monte Carlo Markov Chain with a single chain for the multiple imputations. We also use the EM estimates as the initial starting values for the MCMC analysis. The commands below were used with the leadership data to produce the five imputed data sets:
proc mi data = work.leader out = work.leaderimp seed = 101897;
var year ageave perlead gen2 sizeorg2 rndm2 effsize;
mcmc;
The first line of the command gives the name of the data set to use, the name of the created SAS data set with the imputations, and the seed number for the pseudo-random number generator. The second command line provides the variables to use in the imputations. Note that the effect size is included in this analysis. The third line specifies the use of Markov Chain Monte Carlo to obtain the estimates of the joint posterior distribution as described by Rubin (1987). Note that the number of imputations are not specified; the default number of imputed data sets generated is five, the number recommended by Schafer (1997).
SAS Proc MI provides a number of useful tables, including one outlining the missing data patterns and the group means for each variable within each missing data pattern. Once the imputations are generated, the procedure gives the estimates for the mean and standard error of the variables with missing data as illustrated below.
Multiple Imputation Parameter Estimates
Variable | Mean | SE | 95% confidence limits | DF | |
---|---|---|---|---|---|
Average age of sample | 44.109 | 1.619 | 40.341 | 47.877 | 7.596 |
Percent of male leaders | 65.691 | 2.898 | 59.743 | 71.640 | 26.869 |
Variable | Minimum | Maximum | Mu0 | t for Mean = Mu0 | Pr > |t| |
---|---|---|---|---|---|
Average age of sample | 42.659 | 45.481 | 0 | 27.25 | <.0001 |
Percent of male leaders | 64.586 | 67.390 | 0 | 22.67 | <.0001 |
To obtain the weighted regression results for each imputation, we use Proc Reg with weights. The command lines are shown below.
proc reg data = work.leaderimp outest = work.regout covout;
model effsize = year ageave perlead gen2 sizeorg2 rndm2;
weight wt;
by _Imputation_;
run;
The lines given above use the SAS data set generated by Proc MI, and estimate the coefficients for the effect size model using weighted regression. The results are computed for each imputation as indicated in the by statement. Table 7.14 provides the weighted regression results for each imputation.
Table 7.15 gives the multiply-imputed estimates for the weighted regression results. As in the prior analyses, none of the regression coefficients were significantly different from zero.
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Pigott, T.D. (2012). Missing Data in Meta-analysis: Strategies and Approaches. In: Advances in Meta-Analysis. Statistics for Social and Behavioral Sciences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-2278-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-4614-2278-5_7
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-2277-8
Online ISBN: 978-1-4614-2278-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)