Skip to main content

Missing Data in Meta-analysis: Strategies and Approaches

  • Chapter
  • First Online:
Advances in Meta-Analysis

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

Abstract

This chapter provides an overview of missing data issues that can occur in a meta-analysis. Common approaches to missing data in meta-analysis are discussed. The chapter focuses on the problem of missing data in moderators of effect size. The examples demonstrate the use of maximum likelihood methods and multiple imputation, the only two methods that produce unbiased estimates under the assumption that data are missing at random. The methods discussed in this chapter are most useful in testing the sensitivity of results to missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allison, P.D. 2002. Missing data. Thousand Oaks: Sage.

    MATH  Google Scholar 

  • Begg, C.B., and J.A. Berlin. 1988. Publication bias: A problem in interpreting medical data (with discussion). Journal of the Royal Statistical Society Series A 151(2): 419–463.

    Google Scholar 

  • Buck, S.F. 1960. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society Series B 22(2): 302–303.

    MathSciNet  MATH  Google Scholar 

  • Chan, A.-W., A. Hrobjartsson, M.T. Haahr, P.C. Gotzsche, and D.G. Altman. 2004. Empirical evidence for selective reporting of outcomes in randomized trials. Journal of the American Medical Association 291(20): 2457–2465.

    Article  Google Scholar 

  • Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1): 1–38.

    MathSciNet  MATH  Google Scholar 

  • Duval, S. 2005. The Trim and Fill method. In Publication bias in meta-analysis: Prevention, assessment and adjustments, ed. H.R. Rothstein, A.J. Sutton, and M. Borenstein. West Sussex: Wiley.

    Google Scholar 

  • Duval, S., and R. Tweedie. 2000. Trim and fill: A simple funnel plot based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2): 455–463.

    Article  MATH  Google Scholar 

  • Eagly, A.H., M.C. Johannesen-Schmidt, and M.L. van Engen. 2003. Transformational, transactional, and laissez-faire leadership styles: A meta-analysis comparing women and men. Psychological Bulletin 129(4): 569–592.

    Article  Google Scholar 

  • Egger, M., G.D. Smith, M. Schneider, and C. Minder. 1997. Bias in meta-analysis detected by a simple, graphical test. British Medical Journal 315(7109): 629–634.

    Article  Google Scholar 

  • Enders, C.K. 2010. Applied missing data analysis. Methodology in the Social Sciences. New York: Guilford.

    Google Scholar 

  • Fahrbach, K.R. 2001. An investigation of methods for mixed-model meta-analysis in the presence of missing data. Lansing: Michigan State University.

    Google Scholar 

  • Glasser, M. 1964. Linear regression analysis with missing observations among the independent variables. Journal of the American Statistical Association 59(307): 834–844.

    Article  MathSciNet  Google Scholar 

  • Hackshaw, A.K., M.R. Law, and N.J. Wald. 1997. The accumulated evidence on lung cancer and environmentaly tobacco smoke. British Medical Journal 315(7114): 980–988.

    Article  Google Scholar 

  • Haitovsky, Y. 1968. Missing data in regression analysis. Journal of the royal Statistical Society Series B 30(1): 67–82.

    MATH  Google Scholar 

  • Hemminki, E. 1980. Study of information submitted by drug companies to licensing authorities. British Medical Journal 280(6217): 833–836.

    Article  Google Scholar 

  • Honaker, J., G. King, and M. Blackwell (2011) Amelia II: A program for missing data. http://r.iq.harvard.edu/src/contrib/

  • Kim, J.-O., and J. Curry. 1977. The treatment of missing data in multivariate analysis. Sociological Methods and Research 6(2): 215–240.

    Article  Google Scholar 

  • Lipsey, M.W., and D.B. Wilson. 2001. Practical meta-analysis. Thousand Oaks: Sage Publications.

    Google Scholar 

  • Little, R.J.A., and D.B. Rubin. 1987. Statistical analysis with missing data. New York: Wiley.

    MATH  Google Scholar 

  • Orwin, R.G., and D.S. Cordray. 1985. Effects of deficient reporting on meta-analysis: A conceptual framework and reanalysis. Psychological Bulletin 97(1): 134–147.

    Article  Google Scholar 

  • Rosenthal, R. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86(3): 638–641.

    Article  Google Scholar 

  • Rothstein, H.R., A.J. Sutton, and M. Borenstein. 2005. Publication bias in meta-analysis: Prevention, Assessment and Adjustments. West Sussex: Wiley.

    Book  MATH  Google Scholar 

  • Rubin, D.B. 1976. Inference and missing data. Biometrika 63(3): 581–592.

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin, D.B. 1987. Multiple imputation for nonresponse in surveys. Wiley, New York, NY

    Google Scholar 

  • Schafer, J.L. 1997. Analysis of incomplete multivariate data. London: Chapman Hall.

    Book  MATH  Google Scholar 

  • Schafer, J.L. 1999. NORM: Multiple imputation of incomplete multivariate data under a normal model. Software for Windows. University Park: Department of Statistics, Penn State University.

    Google Scholar 

  • Schafer, J.L., and J.W. Graham. 2002. Missing data: Our view of the state of the art. Psychological Methods 7(2): 147–177.

    Article  Google Scholar 

  • Shadish, W.R., L. Robinson, and C. Lu. 1999. ES: A computer program and manual for effect size calculation. St. Paul: Assessment Systems Corporation.

    Google Scholar 

  • Sirin, S.R. 2005. Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research 75(3): 417–453. doi:10.3102/00346543075003417.

    Article  Google Scholar 

  • Smith, M.L. 1980. Publication bias and meta-analysis. Evaluation in Education 4: 22–24.

    Article  Google Scholar 

  • Sterne, J.A.C., B.J. Becker, and M. Egger. 2005. The funnel plot. In Publication bias in meta-analysis: Prevention, assessment and adjustment, ed. H.R. Rothstein, A.J. Sutton, and M. Borenstein. West Sussex: Wiley.

    Google Scholar 

  • Vevea, J.L., and C.M. Woods. 2005. Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychological Methods 10(4): 428–443.

    Article  Google Scholar 

  • Williamson, P.R., C. Gamble, D.G. Altman, and J.L. Hutton. 2005. Outcome selection biase in meta-analysis. Statistical Methods in Medical Research 14(5): 515–524.

    Article  MathSciNet  Google Scholar 

  • Wilson, D.B. 2010. Practical meta-analysis effect size calculator. Campbell Collaboration. http://www.campbellcollaboration.org/resources/effect_size_input.php. Accessed 16 July 2011.

  • Yuan, Y.C. 2000. Multiple imputation for missing data: Concepts and new developments. http://support.sas.com/rnd/app/papers/multipleimputation.pdf. Accessed 2 April 2011.

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

7.1.1 Computing Packages for Computation of the Multiple Imputation Results

There are a number of options for obtaining multiple imputation results in a meta-analysis model. Two freeware programs are available. The first is the program Norm by Schafer and available at http://www.stat.psu.edu/~jls/misoftwa.html. The Norm program runs as a stand alone program on Windows 95/98/NT. The second is a program available in R by Honaker et al. called Amelia II and available at http://gking.harvard.edu/amelia/. Schafer’s norm program was used for the example given earlier.

The program SAS includes two procedures, one for generating the multiple imputations, PROC MI, and a second for analyzing the completed data sets, PROC MIANALYZE. For obtaining the weighted regression results for meta-analysis, the SAS procedure PROC MIANALYZE will have limited utility since the standard errors of the weighted regression coefficients will need to be adjusted as detailed by Lipsey and Wilson (2001). Below is an illustration of the use of PROC MI for the leadership data.

7.1.1.1 R Programs

One program available in R for generating multiple imputations is Amelia II (Honaker et al. 2011). Directions for using the program are available at http://gking.harvard.edu/amelia/. Once the program is loaded into R, the following command was used to generate m = 5 imputed data sets.

> a.out < −amelia(leadimp, m = 5, idvars = "ID")

The imputed data sets can be saved for export into another program to complete the analyses using the command,

>write.amelia(obj = a.out, file.stem = "outdata").

where “obj” refers to the name given to the object with the imputed data sets (the result of using the command Amelia), and “file.stem” provides the name of the data sets that will be written from the program.

Table 7.12 are the weighted regression estimates for the effect size model from each imputation obtained in Amelia. The two variables missing observations are average age of subjects and percent of male leaders. There is variation among the five data sets in their estimates of the regression coefficients. This variation signals that there is some uncertainty in the data set due to missing observations.

Table 7.12 Regression estimates from each imputation generated using Amelia

Table 7.13 provides the multiply-imputed estimates for the linear model of effect size. These estimates were combined in Excel, and are fairly consistent with the earlier multiple imputation analysis using Schafer’s Norm program. None of the coefficients are significantly different from zero.

Table 7.13 Multiply-imputed estimates from Amelia

7.1.1.2 SAS Proc MI

The SAS procedure PROC MI provides a number of options for analyzing data with missing data. For the example illustrated in this chapter, we use the Monte Carlo Markov Chain with a single chain for the multiple imputations. We also use the EM estimates as the initial starting values for the MCMC analysis. The commands below were used with the leadership data to produce the five imputed data sets:

proc mi data = work.leader out = work.leaderimp seed = 101897;

var year ageave perlead gen2 sizeorg2 rndm2 effsize;

mcmc;

The first line of the command gives the name of the data set to use, the name of the created SAS data set with the imputations, and the seed number for the pseudo-random number generator. The second command line provides the variables to use in the imputations. Note that the effect size is included in this analysis. The third line specifies the use of Markov Chain Monte Carlo to obtain the estimates of the joint posterior distribution as described by Rubin (1987). Note that the number of imputations are not specified; the default number of imputed data sets generated is five, the number recommended by Schafer (1997).

SAS Proc MI provides a number of useful tables, including one outlining the missing data patterns and the group means for each variable within each missing data pattern. Once the imputations are generated, the procedure gives the estimates for the mean and standard error of the variables with missing data as illustrated below.

Multiple Imputation Parameter Estimates

Variable

Mean

SE

95% confidence limits

DF

Average age of sample

44.109

1.619

40.341

47.877

 7.596

Percent of male leaders

65.691

2.898

59.743

71.640

26.869

Variable

Minimum

Maximum

Mu0

t for Mean = Mu0

Pr > |t|

Average age of sample

42.659

45.481

0

27.25

<.0001

Percent of male leaders

64.586

67.390

0

22.67

<.0001

To obtain the weighted regression results for each imputation, we use Proc Reg with weights. The command lines are shown below.

proc reg data = work.leaderimp outest = work.regout covout;

model effsize = year ageave perlead gen2 sizeorg2 rndm2;

weight wt;

by _Imputation_;

run;

The lines given above use the SAS data set generated by Proc MI, and estimate the coefficients for the effect size model using weighted regression. The results are computed for each imputation as indicated in the by statement. Table 7.14 provides the weighted regression results for each imputation.

Table 7.14 Multiple imputations generated using SAS Proc MI

Table 7.15 gives the multiply-imputed estimates for the weighted regression results. As in the prior analyses, none of the regression coefficients were significantly different from zero.

Table 7.15 Multiply-imputed estimates generated by SAS

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Pigott, T.D. (2012). Missing Data in Meta-analysis: Strategies and Approaches. In: Advances in Meta-Analysis. Statistics for Social and Behavioral Sciences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-2278-5_7

Download citation

Publish with us

Policies and ethics