Basic Steps in Weighting

  • Richard Valliant
  • Jill A. Dever
  • Frauke Kreuter
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


Survey weights are a key component to producing population estimates. There are a series of steps in weighting that are carried out in most, if not all, surveys. In addition to an overview of weighting and the general theoretical approaches used to justify the use of weights in estimation, this chapter covers the first three weighting steps–base weights (inverse probability of selection), adjustments for unknown eligibility, and nonresponse adjustments. Examples of base weight calculation are presented for various designs. Methods of adjusting for nonresponse using propensity models and machine learning methods are covered.


  1. Breiman L. (2001). Random forests. Machine Learning 45:5–32.CrossRefGoogle Scholar
  2. Breiman L., Friedman J., Stone C., Olshen R. (1993). Classification and Regression Trees. Chapman & Hall, London.zbMATHGoogle Scholar
  3. Cochran W. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24:295–313.MathSciNetCrossRefGoogle Scholar
  4. Czajka J., Hirabayashi S., Little R. J. A., Rubin D. B. (1992). Projecting from advance data using propensity modeling: An application to income and tax statistics. Journal of Business and Economic Statistics 10:117–131.Google Scholar
  5. D’Agostino R. B. (1998). Propensity score methods for bias reduction for the comparison of a treatment to a non-randomized control group. Statistics in Medicine 17:2265–2281.CrossRefGoogle Scholar
  6. Gelman A., Carlin J., Stern H., Rubin D. B. (1995). Data Analysis. Chapman & Hall/CRC., Boca Raton, FLGoogle Scholar
  7. Harder V., Stuart E., Anthony J. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods 15(3):234–249.CrossRefGoogle Scholar
  8. Haziza D., Beaumont J. (2007). On the construction of imputation classes in surveys. Biometrika 75(2):25–43.Google Scholar
  9. Hothorn T., Buehlmann P., Dudoit S., Molinaro A., Van der Laan M. (2006). Survival ensembles. Biostatistics 7:355–373.CrossRefGoogle Scholar
  10. Hothorn T., Hornik K., Strobl C., Zeileis A. (2016). Party: A Laboratory for Recursive Partytioning. URL, r package version 1.2-2.
  11. Judkins D., Hao H., Barrett B., Adhikari P. (2005). Modeling and polishing of nonresponse propensity. In: Proceedings of the Survey Research Methods Section, American Statistical Association, pp 3159–3166.Google Scholar
  12. Kalton G., Maligalig D. S. (1991). A comparison of methods of weighting adjustment for nonresponse. Proceedings of the US Bureau of the Census Annual Research Conference pp 409–428.Google Scholar
  13. Kass G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2):119–127.CrossRefGoogle Scholar
  14. Kim J. J., Li J., Valliant R. (2007). Cell collapsing in poststratification. Survey Methodology 33(2):139–150.Google Scholar
  15. Kish L. (1965). Survey Sampling. John Wiley & Sons, Inc., New York.zbMATHGoogle Scholar
  16. Kott P. S. (2012). Why one should incorporate the design weights when adjusting for unit nonresponse using response homogeneity groups. Survey Methodology 38(1):95–99.Google Scholar
  17. Kreuter F., Olson K. (2011). Multiple auxiliary variables in nonresponse adjustment. Sociological Methods and Research 40:311–332.MathSciNetCrossRefGoogle Scholar
  18. Kreuter F., Couper M. P., Lyberg L. (2010). The use of paradata to monitor and manage survey data collection. In: Proceedings of the Survey Research Methods Section, American Statistical Association, pp 282–296.Google Scholar
  19. Little R. J. A. (1986). Survey nonresponse adjustments for estimates of means. International Statistical Review 54(2):139–157.CrossRefGoogle Scholar
  20. Little R. J. A., Rubin D. B. (2002). Statistical Analysis with Missing Data. John Wiley & Sons, Inc., New Jersey.CrossRefGoogle Scholar
  21. Little R. J. A., Vartivarian S. (2003). On weighting the rates in non-response weights. Statistics in Medicine 22:1589–1599.CrossRefGoogle Scholar
  22. Little R. J. A., Vartivarian S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology 31:161–168.Google Scholar
  23. Lohr S. L. (1999). Sampling: Design and Analysis. Duxbury Press, Pacific Grove, CA.zbMATHGoogle Scholar
  24. Lumley T. (2017). survey: analysis of complex survey samples R package v. 3.32. URL
  25. Michie D. (1989). Problems of computer-aided concept formation. In: Quinlan J. R. (ed) Applications of Expert Systems. Turing Institute Press/Addison-Wesley, pp 310–333.Google Scholar
  26. Morgan J. N., Sonquist J. A. (1963). Problems in the analysis of survey data and a proposal. Journal of the American Statistical Association 58:415–434.CrossRefGoogle Scholar
  27. Rizzo L., Kalton G., Brick J. M. (1996). A comparison of some weighting adjustments for panel nonresponse. Survey Methodology 22:43–53.Google Scholar
  28. Rosenbaum P., Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55.MathSciNetCrossRefGoogle Scholar
  29. Royall R. M. (1976). Current advances in sampling theory: Implications for human observational studies. American Journal of Epidemiology 104:463–473.CrossRefGoogle Scholar
  30. Särndal C., Swensson B., Wretman J. (1992). Model Assisted Survey Sampling. Springer, New York.CrossRefGoogle Scholar
  31. Smith T. M. F. (1976). The foundations of survey sampling: A review. Journal of the Royal Statistical Society A 139:183–204.MathSciNetCrossRefGoogle Scholar
  32. Smith T. M. F. (1984). Present position and potential developments: Some personal views, sample surveys. Journal of the Royal Statistical Society A 147:208–221.CrossRefGoogle Scholar
  33. Smith T. M. F. (1994) Sample surveys 1975–1990; an age of reconciliation? International Statistical Review 62:5–34.CrossRefGoogle Scholar
  34. Strobl C., Boulesteix A., Zeileis A., Hothorn T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8(25), URL CrossRefGoogle Scholar
  35. Strobl C., Boulesteix A., Kneib T., Augustin T., Zeileis A. (2008). Conditional variable importance for random forests. BMC Bioinformatics 9(307), URL CrossRefGoogle Scholar
  36. Stuart E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25(1):1–21, URL MathSciNetCrossRefGoogle Scholar
  37. Therneau T., Atkinson B., Ripley B. D. (2012). rpart: Recursive Partitioning. URL
  38. Valliant R., Dever J. A. (2018). Survey Weights: A Step-by-Step Guide to Calculation. Stata Press, College Station, TX.Google Scholar
  39. Valliant R., Dorfman A. H., Royall R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. John Wiley & Sons, Inc., New York.zbMATHGoogle Scholar
  40. Vapnik V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.CrossRefGoogle Scholar
  41. Venables W. N., Ripley B. D. (2002). Modern Applied Statistics with S, 4th edn. Springer, New York.CrossRefGoogle Scholar
  42. Weisstein E. W. (2010). Extreme Value Distribution. URL, from MathWorld–A Wolfram Web Resource.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Richard Valliant
    • 1
    • 2
  • Jill A. Dever
    • 3
  • Frauke Kreuter
    • 2
    • 4
  1. 1.University of MichiganAnn ArborUSA
  2. 2.University of MarylandCollege ParkUSA
  3. 3.RTI InternationalWashington, DCUSA
  4. 4.University of MannheimMannheimGermany

Personalised recommendations