Skip to main content

Abstract

Studies in the social and behavioral sciences frequently suffer from missing data. For instance, sample surveys often have some individuals who either refuse to participate or do not supply answers to certain questions, and panel studies often have incomplete data due to attrition. Recent comprehensive treatments of the subject of missing data include three volumes produced by the Panel on Incomplete Data of the Committee on National Statistics (Madow, Nisselson, and Olkin 1983; Madow and Olkin 1983; Madow, Olkin, and Rubin 1983) and Little and Rubin (1987).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Afifi, A. A., and Elashoff, R. M. (1967), “Missing observations in multivariate statistics II: point estimation in simple linear regression,” Journal of the American Statistical Association, 62, 595–604.

    Google Scholar 

  • Anderson, A. B., Basilevsky, A., and Hum, D. P. J. (1983), “Missing data: a review of the literature,” in P.H. Rossi, J. D. Wright, and A. B. Anderson (eds.), Handbook of Survey Research, pp. 415–494, New York: Academic Press.

    Google Scholar 

  • Anderson, T. W. (1957), “Maximum likelihood estimation for the multivariate normal distribution when some observations are missing,” Journal of the American Statistical Association, 52, 200–203.

    Article  Google Scholar 

  • Azen, S. P., Van Guilder, M., and Hill, M. A. (1989), “Estimation of parameters and missing values under a regression model with non-normally distributed and non-randomly incomplete data,” Statistics in Medicine, 8, 217–228.

    Article  PubMed  Google Scholar 

  • Baker, S. G., and Laird, N. M. (1988), “Regression analysis for categorical variables with outcome subject to nonignorable nonresponse,” Journal of the American Statistical Association, 81, 29–41.

    Google Scholar 

  • Basu, D. (1971), “An essay on the logical foundations of survey sampling, Part 1,” in V. R Godambe and D. A. Sprott (eds.), Foundations of Statistical Inference, pp. 203242. Toronto: Holt, Rinehart, and Winston.

    Google Scholar 

  • Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975), Discrete Multivariate Analysis: Theory and Practice, Cambridge, MA: MIT Press.

    Google Scholar 

  • Casella, G., and George, E. I. (1992), “Explaining the Gibbs sampler,” The American Statistician, 46, 167–174.

    Google Scholar 

  • Chen, T., and Fienberg, S. E. (1974), “Two-dimensional contingency tables with both completely and partially classified data, ” Biometrics, 30, 629–642.

    Article  Google Scholar 

  • Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991), “Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression,” Journal of the American Statistical Association, 86, 68–78.

    Article  Google Scholar 

  • Czajka, J. L., Hirabayashi, S. M., Little, R. J. A., and Rubin, D. B. (1992), “Projecting from advance data using propensity modeling: an application to income and tax statistics,” Journal of Business and Economic Statistics, 10, 117.-132.

    Google Scholar 

  • David, M. H., Little, R. J. A., Samuhel, M. E., and Triest, R. K. (1986), “Alternative methods for CPS income imputation,” Journal of the American Statistical Association, 81, 29–41.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Ser. B, 39, 1–38.

    Google Scholar 

  • Dixon, W. J., ed. (1988), BMDP Statistical Software, Los Angeles: University of California Press.

    Google Scholar 

  • Dorey, F. J.,Little, R. J. A., and Schenker, N. (1993), “Multiple imputation for threshold-crossing data with interval censoring,” UCLA Statistics Series,No. 81. To appear in Statistics in Medicine.

    Google Scholar 

  • Efron, B. (1979), “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, 7, 1–26.

    Article  Google Scholar 

  • Fay, R. E. (1986), “Causal models for patterns of nonresponse,” Journal of the American Statistical Association, 81, 354–365.

    Article  Google Scholar 

  • Fuchs, C. (1982), “Maximum likelihood estimation and model selection in contingency tables with missing data,” Journal of the American Statistical Association, 77, 270278.

    Google Scholar 

  • Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990), “Illustration of Bayesian inference in normal data models using Gibbs sampling,” Journal of the American Statistical Association, 85, 972–985.

    Article  Google Scholar 

  • Gelfand, A. E., and Smith, A. F. M. (1990), “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, 85, 398–409.

    Article  Google Scholar 

  • Gelman, A., and Rubin, D. B. (1992), “Inference from iterative simulation using multiple sequences (with discussion),” Statistical Science, 4, 457–511.

    Article  Google Scholar 

  • Geman, S., and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

    Article  PubMed  Google Scholar 

  • Geyer, C. J. (1992), “Practical Markov chain Monte Carlo (with discussion),” Statistical Science, 4, 473–511.

    Article  Google Scholar 

  • Glasser, M. (1964), “Linear regression analysis with missing observations among the independent variables,” Journal of the American Statistical Association, 59, 834844.

    Google Scholar 

  • Glynn, R., Laird, N. M., and Rubin, D. B. (1986), “Selection modeling versus mixture modeling with nonignorable nonresponse,” in H. Wainer (ed.), Drawing Inferences from Self-Selected Samples, pp. 119–146, New York: Springer-Verlag.

    Google Scholar 

  • Göksel, H., Judkins, D. R., and Mosher, W. D. (1991), “Nonresponse adjustments for a telephone follow-up to a national in-person survey,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 581–586.

    Google Scholar 

  • Greenlees, J. S., Reece, W. S., and Zieschang, K. 0. (1982), “Imputation of missing values when the probability of response depends on the variable being imputed,” Journal of the American Statistical Association, 77, 251–261.

    Article  Google Scholar 

  • Haitovsky, Y. (1968), “Missing data in regression analysis,” Journal of the Royal Statistical Society, Ser. B, 30, 67–81.

    Google Scholar 

  • Hanson, R. H. (1978), “The current population survey: design and methodology,” Technical Paper No. 40, Washington, DC: U.S. Bureau of the Census.

    Google Scholar 

  • Harte, J. M. (1982), “Post-stratification approaches in the Corporation Statistics of Income Program,” in American Statistical Association, 1982, Proceedings of the section on Survey Research Methods, 250–253.

    Google Scholar 

  • Heckman, J. (1976), “The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models,” Annals of Economic and Social Measurement, 5, 475–492.

    Google Scholar 

  • Heitjan, D. F., and Little, R. J. A. (1991), “Multiple imputation for the fatal accident reporting system,” Applied Statistics, 40, 13–29.

    Article  Google Scholar 

  • Herzog, T. N., and Rubin, D. B. (1983), “Using multiple imputations to handle nonresponse in sample surveys,” in W. G. Madow, I.O1kin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 209–245, New York: Academic Press.

    Google Scholar 

  • Jennrich, R. I., and Schluchter, M. D. (1986), “Unbalanced repeated-measures models with structured covariance matrices,”Biometrics, 42, 805–820.

    Google Scholar 

  • Kass, R. E., and Steffey, D. (1989), “Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical B ayes models),” Journal of the American Statistical Association, 84, 717–726.

    Article  Google Scholar 

  • Kennickell, A. B. (1991), “Imputation of the 1989 Survey of Consumer Finances: stochastic relaxation and multiple imputation,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 1–10.

    Google Scholar 

  • Kim, J. 0., and Curry, J. (1977), “The treatment of missing data in multivariate analysis,” Sociological Methods and Research, 6, 215–240.

    Article  Google Scholar 

  • Laird, N. M. (1988), “Missing data in longitudinal studies,” Statistics in Medicine, 7, 305–315.

    Article  PubMed  Google Scholar 

  • Laird, N. M., and Ware, J. H. (1982), “Random-effects models for longitudinal data,” Biometrics, 38, 963–974.

    Article  PubMed  Google Scholar 

  • Lange, K. L., Little, R. J. A., and Taylor, J. M. G. (1989), “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association, 84, 88 1896.

    Google Scholar 

  • Lazzeroni, L. C., Schenker, N., and Taylor, J. M. G. (1990), “Robustness of multiple imputation techniques to model specification,” in American Statistical Association, 1990, Proceedings of the section on Survey Research Methods, pp. 260–265.

    Google Scholar 

  • Lee, L. F. (1982), “Some approaches to the correction of selectivity bias,” Review of Economic Studies, 49, 355–372.

    Article  Google Scholar 

  • Li, K. H. (1988), “Imputation using Markov chains,” Journal of Statistical Computation and Simulation, 30, 57–79.

    Article  Google Scholar 

  • Li, K. H., Meng, X. L., Raghunathan, T. E., and Rubin, D. B. (1991), “Significance levels from repeated p-values with multiply imputed data,” Statistica Sinica, 1, 65–92.

    Google Scholar 

  • Li, K. H., Raghunathan, T. E., and Rubin, D. B. (1991), “Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution,” Journal of the American Statistical Association, 86, 1065–1073.

    Google Scholar 

  • Liang, K. Y., Zeger, S. L., and Qaqish, B. (1992), “Multivariate regression analysis for categorical data,” Journal of the Royal Statistical Society, Ser. B, 54, 3–40.

    Google Scholar 

  • Lillard, L., Smith, J. P., and Welch, F. (1986), “What do we really know about wages: the importance of nonreporting and census imputation,” Journal of Political Economy, 94, 489–506.

    Article  Google Scholar 

  • Lindstrom, M. J., and Bates, D. M. (1988), “Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data,” Journal of the American Statistical Association, 88, 1014–1022.

    Google Scholar 

  • Lipsitz, S. R., Laird, N. M., and Harrington, D. P. (1990), “Using the jackknife to estimate the variance of regression estimators from repeated measures studies,” Communications in Statistics, Ser. A, 19, 821–845.

    Google Scholar 

  • Little, R. J. A. (1985a), “Nonresponse adjustments in longitudinal surveys: models for categorical data,” Bulletin of the International Statistical Institute, 15. 1, 1–15.

    Google Scholar 

  • Little, R. J. A. (1985b), “A note about models for selectivity bias,” Econometrica, 53, 1469–1474.

    Article  Google Scholar 

  • Rubin, D. B. (1986), “Survey nonresponse adjustments,” International Statistical Review, 54, 139–157.

    Article  Google Scholar 

  • Rubin, D. B. (1988a), “A test of missing completely at random for multivariate data with missing values, ” Journal of the American Statistical Association, 83, 1198–1202.

    Article  Google Scholar 

  • Rubin, D. B. (1988b), “Missing data adjustments in large surveys,” Journal of Business and Economic Statistics, 6, 287–301.

    Google Scholar 

  • c), “Robust estimation of the mean and covariance matrix from data with

    Google Scholar 

  • missing values,” Applied Statistics,37, 23–38.

    Google Scholar 

  • Rubin, D. B. (1992), “Regression with incomplete X’s; a review,” Journal of the American Statistical Association, 87, 1227–1237.

    Google Scholar 

  • Rubin, D. B. (1993a), “Post-stratification: a modeler’s perspective,” Journal of the American Statistical Association, 88, 1001–1012.

    Article  Google Scholar 

  • Rubin, D. B. (1993b), “Pattern-mixture models for multivariate incomplete data,” Journal of the American Statistical Association, 88, 125–134.

    Google Scholar 

  • Rubin, D. B. (1993c), “A class of pattern-mixture models for normal incomplete data,” To appear in Biometrika.

    Google Scholar 

  • Little, R. J. A., and Rubin, D. B. (1987), Statistical Analysis with Missing Data, New York: Wiley.

    Google Scholar 

  • Little, R. J. A., and Schluchter, M.D. (1985), “Maximum likelihood estimation for mixed continuous and categorical data with missing values,” Biometrika, 72, 497512.

    Google Scholar 

  • Little, R. J. A., and Su, H. L. (1989), “Item nonresponse in panel surveys,” in D. Kasprzyk, G. Duncan, G. Kalton, and M. P. Singh (eds.), Panel Surveys, pp. 400 125, New York: Wiley.

    Google Scholar 

  • Madow, W. G., Nisselson, H., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 1: Report and Case Studies. Academic Press, New York.

    Google Scholar 

  • Madow, W. G., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 3: Proceedings of the Symposium. Academic Press, New York.

    Google Scholar 

  • Madow, W. G., Olkin, I., and Rubin, D. B. (eds.) (1983), Incomplete Data in Sample

    Google Scholar 

  • Surveys,Volume 2: Theory and Bibliographies. Academic Press, New York. Marini, M. M., Olsen, A. R., and Rubin, D. B. (1980), “Maximum-likelihood estimation in panel studies with missing data,” Sociological Methodology, 11,314–357.

    Google Scholar 

  • McCullagh, P., and Neider, J. A. (1989), Generalized Linear Models, second edition, London: Chapman and Hall.

    Google Scholar 

  • McKendrick, A. G. (1926), “Applications of mathematics to medical problems,” Proceedings of the Edinburgh Mathematics Society, 44, 98–130.

    Article  Google Scholar 

  • Meng, X. L., and Rubin, D. B. (1991), “Using EM to obtain aysmptotic variance-covariance matrices: the SEM algorithm,” Journal of the American Statistical Association 86, 899–909.

    Article  Google Scholar 

  • Rubin, D. B. (1992), “Performing likelihood ratio tests with multiply-imputed data sets,” Biometrika, 79, 103–111.

    Article  Google Scholar 

  • Rubin, D. B. (1993), “Maximum likelihood estimation via the ECM algorithm: a general framework,”Biometrika, 80, 267–278.

    Google Scholar 

  • Moulton, L. H., and Zeger, S. L. (1989), “Analyzing repeated measures on generalized linear models via the bootstrap, ” Biometrics, 45, 381–394.

    Article  Google Scholar 

  • Muthén, B., Kaplan, D., and Hollis, M. (1987), “On structural equation modeling with data that are not missing completely at random,” Psychometrika, 52, 431–462.

    Article  Google Scholar 

  • Nelson, F. D. (1984), “Efficiency of the two-step estimator for models with endogenous sample selection,” Journal of Econometrics, 24, 181–196.

    Article  Google Scholar 

  • Oh, H. L., and Scheuren, F. S. (1983), “Weighting adjustments for unit nonresponse,” in W. G. Madow, I.Olkin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 143–184, New York: Academic Press.

    Google Scholar 

  • Olkin, I., and Tate, R. F. (1961), “Multivariate correlation models with mixed discrete and continuous variables,”Biometrika, 72, 448–465.

    Google Scholar 

  • Olsen, R. J. (1982), “Distributional tests for selectivity bias and a more robust likelihood estimator,” International Economic Review, 23, 223–240.

    Article  Google Scholar 

  • Orchard, T., and Woodbury, M. A. (1972), “A missing information principle: theory and applications,” Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 1, 697–715.

    Google Scholar 

  • Prentice, R. L., and Zhao, L. P. (1991), “Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses, ” Biometrics, 47, 825–839.

    Article  PubMed  Google Scholar 

  • Rosenbaum, P. R., and Rubin, D. B. (1983), “The central role of the propensity score in observational studies for causal effects, ” Biometrika, 70, 41–55.

    Article  Google Scholar 

  • Rubin, D. B. (1976a), “Inference and missing data,” Biometrika, 63, 581–592. (1976b), “Comparing regressions when some predictor values are missing,” Tech- nometrics, 18, 201–205.

    Google Scholar 

  • Rubin, D. B. (1977), “Formalizing subjective notions about the effect of nonrespondents in sample surveys,” Journal of the American Statistical Association, 72, 538–543.

    Article  Google Scholar 

  • Rubin, D. B. (1978), “Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse,” in American Statistical Association, 1978, Proceedings of the section on Survey Research Methods, pp. 20–34.

    Google Scholar 

  • Rubin, D. B. (1986), “Statistical matching and file concatenation with adjusted weights and multiple imputations,” Journal of Business and Economic Statistics, 4, 87–94.

    Google Scholar 

  • Rubin, D. B. and Schenker, N. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.

    Book  Google Scholar 

  • Rubin, D. B. and Schenker, N. (1988), “An overview of multiple imputation,” in American Statistical Association, 1988, Proceedings of the section on Survey Research Methods, pp. 79–84.

    Google Scholar 

  • Rubin, D. B., Schafer, J. L. and Schenker, N. (1988), “Imputation strategies for missing values in post-enumeration surveys,” Survey Methodology, 14, 209–221.

    Google Scholar 

  • Rubin, D. B. and Schenker, N. (1986), “Multiple imputation for interval estimation from simple random samples with ignorable nonresponse,” Journal of the American Statistical Association, 81, 366–374.

    Article  Google Scholar 

  • Rubin, D. B. and Schenker, N. (1987), “Interval estimation from multiply-imputed data: A case study using census agriculture industry codes,” Journal of Official Statistics, 3, 375–387.

    Google Scholar 

  • Rubin, D. B. and Schenker, N. (1991), “Multiple imputation in health-care databases: An overview and some applications,” Statistics in Medicine, 10, 585–598.

    Article  PubMed  Google Scholar 

  • SAS (1992), “The MIXED Procedure,” chapter 16 in: SAS/STAT Software: Changes and Enhancements, Release 6.07. Technical Report P-229, SAS Institute, Inc., Cary, NC.

    Google Scholar 

  • Schafer, J. L. (1991), Algorithms for Multiple Imputation andPosterior Simulationfrom Incomplete Multivariate Data with Ignorable Nonresponse. Ph.D. Thesis, Department of Statistics, Harvard University.

    Google Scholar 

  • Schenker, N., Treiman, D.J., and Weidman, L. (1993), “Analyses of public-use data with multiply-imputed industry and occupation codes,” Applied Statistics, 42, 545–556.

    Article  PubMed  Google Scholar 

  • Schenker, N., and Welsh, A. H. (1988), “Asymptotic results for multiple imputation,” The Annals of Statistics, 16, 1550–1566.

    Article  Google Scholar 

  • Schluchter, M. D. (1988), “Analysis of incomplete multivariate data using linear models with structured covariance matrices,” Statistics in Medicine, 7, 317–324.

    Article  PubMed  Google Scholar 

  • Schoenberg, R. S. (1988), “MISS: a program for missing data,” in GAUSS Programming Language, Aptech Systems Inc., P.O. Box 6487, Kent, WA 98064.

    Google Scholar 

  • Stolzenberg, R. M. and Relies, D. A. (1990), “Theory testing in a world of constrained research design–The significance of Heckman’s censored sampling bias correction for nonexperimental research,” Sociological Methods and Research, 18, 395–415.

    Article  Google Scholar 

  • Tanner, M. A. (1991), Tools for Statistical Inference: Observed Data and Data Augmentation Methods, New York: Springer-Verlag.

    Book  Google Scholar 

  • Tanner, M. A., and Wong, W. H. (1987), “The calculation of posterior distributions by data augmentation,” Journal of the American Statistical Association, 82, 528–550.

    Article  Google Scholar 

  • Treiman, D. J., Bielby, W. T., and Cheng, M. T. (1988), “Evaluating a multiple-imputation method for recalibrating 1970 U.S. census detailed industry codes to the 1980 standard,” Sociological Methodology, 18, 309–345.

    Article  Google Scholar 

  • Van Praag, B. M. S., Dijkstra, T. K., and Van Velzen, J. (1985), “Least-squares theory based on general distributional assumptions with an application to the incomplete observations problem,” Psychometrika, 50, 25–36.

    Article  Google Scholar 

  • Waterton, J., and Lievesley, D. (1987), “Attrition in a panel study of attitudes,” Journal of Official Statistics, 3, 267–282.

    Google Scholar 

  • Weidman, L. (1989), “Final report: industry and occupation imputation,” Statistical Research Division Report Number Census/SRD/89/03, Washington, DC: U.S. Bureau of the Census.

    Google Scholar 

  • Wilks, S. S. (1932), “Moments and distribution of estimates of population parameters from fragmentary samples,” The Annals of Mathematical Statistics 3, 163–195.

    Article  Google Scholar 

  • Woodburn, L. (1991), “Using auxiliary information to investigate nonresponse bias,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 278–283.

    Google Scholar 

  • Zeger, S. L., and Liang, K. Y. (1986), “Longitudinal data analysis for discrete and continuous outcomes,” Biometrics, 42, 121–130.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer Science+Business Media New York

About this chapter

Cite this chapter

Little, R.J.A., Schenker, N. (1995). Missing Data. In: Arminger, G., Clogg, C.C., Sobel, M.E. (eds) Handbook of Statistical Modeling for the Social and Behavioral Sciences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-1292-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-1292-3_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4899-1294-7

  • Online ISBN: 978-1-4899-1292-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics