Nonprobability Sampling

  • Richard Valliant
  • Jill A. Dever
  • Frauke Kreuter
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


In the last decade many sources of data other than probability samples have become available as a consequence of the ubiquity of electronic data collection. For example, some vendors and survey organizations have formed large panels of persons who are willing to participate in surveys via the Internet. Many of these sources, despite being large, are not probability samples, but analysts want to project them to full finite populations. This chapter reviews the types of nonprobability data sources that are available and criteria that can be used to judge their quality. We also cover the methods used to make inferences to finite populations using these datasets: quasirandomization, model-based, and doubly robust which is a combination of quasirandomization and model-based techniques.


  1. Alvarez R., Sherman R., Van Beselaere C. (2003). Subject acquisition for web-based surveys. Political Analysis 11:23–43.CrossRefGoogle Scholar
  2. Austin P. (2014). A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine 33(6):1057–1069, URL MathSciNetCrossRefGoogle Scholar
  3. Baker R., Brick J. M., Bates N. A., Couper M. P., Courtright M., Dennis J. M., Dillman D. A., Frankel M. R., Garland P., Groves R. M., Kennedy C., Krosnick J., Lavrakas P. J., Lee S., Link M. W., Piekarski L., Rao K., Thomas R. K., Zahs D. (2010). AAPOR report on online panels. Public Opinion Quarterly 74:711–781.CrossRefGoogle Scholar
  4. Baker R., Brick J. M., Bates N. A., Battaglia M. P., Couper M. P., Dever J. A., Gile K., Tourangeau R. (2013). Report of the AAPOR task force on non-probability sampling. Tech. rep., The American Association for Public Opinion Research, Deerfield, IL.Google Scholar
  5. Biemer P. P. (2010). Total survey error design, implementation, and evaluation. Public Opinion Quarterly 74(5):827–848.CrossRefGoogle Scholar
  6. Callegaro M., Baker R., Bethlehem J., Göritz A., Krosnick J., Lavrakas P. (eds) (2014). Online Panel Research: A Data Quality Perspective. John Wiley & Sons, Ltd., United Kingdom.Google Scholar
  7. Cowling D. (2015). Election 2015: How the opinion polls got it wrong., [BBC News online; accessed 06-November-2016].
  8. Dever J. A. (2008). Sampling weight calibration with estimated control totals. PhD thesis, University of Maryland.Google Scholar
  9. Dever J. A., Valliant R. (2010). A comparison of variance estimators for poststratification to estimated control totals. Survey Methodology 36:45–56.Google Scholar
  10. Dever J. A., Valliant R. (2016). General regression estimation adjusted for undercoverage and estimated control totals. Journal of Survey Statistics and Methodology 4:289–318.CrossRefGoogle Scholar
  11. Elliott M. R., Valliant R. (2017). Inference for nonprobability samples. Statistical Science 32:249–264.MathSciNetCrossRefGoogle Scholar
  12. Enten H. (2014). Flying Blind Toward Hogan’s Upset Win In Maryland., [FiveThirtyEight online; accessed 06-November-2016].
  13. Folsom R. E., Singh A. C. (2000). The generalized exponential model for sampling weight calibration for extreme values, nonresponse, and poststratification. In: Proceedings of the Survey Research Methods Section, American Statistical Association, pp 598–603.Google Scholar
  14. Frost S., Brouwer K., Firestone-Cruz M., Ramos R., Ramos M., Lozada R., Magis-Rodriguez C., Strathdee S. (2006). Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: Recruitment dynamics and impact on estimates of hiv and syphilis prevalence. Journal of Urban Health 83(6):83–97.CrossRefGoogle Scholar
  15. Gelman A. (2007). Struggles with survey weighting and regression modeling. Statistical Science 22(2):153–164.MathSciNetCrossRefGoogle Scholar
  16. Gelman A., Carlin J., Stern H., Rubin D. B. (1995). Data Analysis. Chapman & Hall/CRC., Boca Raton, FLGoogle Scholar
  17. Ghosh M. (2009). Bayesian developments in survey sampling. In: Pfeffermann D., Rao C. (eds) Handbook of Statistics, Volume 29B Sample Surveys: Inference and Analysis. Elsevier, Amsterdam, chap 29, pp 153–188.Google Scholar
  18. Ghosh M., Meeden G. (1997). Bayesian Methods for Finite Population Sampling. Chapman & Hall, London.CrossRefGoogle Scholar
  19. Gile K., Handcock M. (2010). Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40:285–327.CrossRefGoogle Scholar
  20. Gilks W., Richardson S., Spiegelhalter D. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC Press, Boca Raton, FL.zbMATHGoogle Scholar
  21. Gosnell H. F. (1937) How accurate were the polls? Public Opinion Quarterly 1:97–105.Google Scholar
  22. Heckathorn D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 44:174–199.CrossRefGoogle Scholar
  23. Ho D., Imai K., King G., Stuart E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15:199–236.CrossRefGoogle Scholar
  24. Ho D., Imai K., King G., Stuart E. (2011). Matchit: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software 42(8), URL
  25. Kang J. D. Y., Schafer J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22(4):523–539.MathSciNetCrossRefGoogle Scholar
  26. Keiding N., Louis T. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A 179:319–376.MathSciNetCrossRefGoogle Scholar
  27. Kennedy C., Blumenthal M., Clement S., Clinton J., Durand C., Franklin C., McGeeney K., Miringoff L., Olson K., Rivers D., Saad L., Witt E., Wiezien C. (2017). An evaluation of 2016 election polls in the U.S. ad hoc committee on 2016 election polling. Tech. rep., The American Association for Public Opinion Research, Deerfield, IL, URL
  28. Kott P. S. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology 32(2):133–142.Google Scholar
  29. Kreuter F., Presser S., Tourangeau R. (2008). Social desirability bias in CATI, IVR, and web surveys: The effects of mode and question sensitivity. Public Opinion Quarterly 72(5):847–865. DOI 10.1093/poq/nfn063, URL CrossRefGoogle Scholar
  30. Lee S. (2006). Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal of Official Statistics 22:329–349.Google Scholar
  31. Lee S., Valliant R. (2009). Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociological Methods & Research 37(3):319–343.MathSciNetCrossRefGoogle Scholar
  32. Liebermann O. (2015). Why were the israeli election polls so wrong?, [CNN online; accessed 06-November-2016].
  33. Little R. J. A. (2003). Bayesian methods for unit and item nonresponse. In: Chambers R., Skinner C. (eds) Analysis of Survey Data. John Wiley, Chichester, chap 18.Google Scholar
  34. Long J. S., Ervin L. H. (2000). Using heteroscedasticity consisten standard errors in the linear regression model. The American Statistician 54:217–224.Google Scholar
  35. Mercer A., Kreuter F., Keeter S., Stuart E. (2017). Theory and practice in nonprobability surveys: Parallels between causal inference and survey inference. Public Opinion Quarterly 81:250–279.CrossRefGoogle Scholar
  36. National Academies of Sciences, Engineering, and Medicine. (2017). Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps. The National Academies Press, Washington, DC, DOI 10.17226/24893, URL
  37. Neyman J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97(Part 4):558–625.CrossRefGoogle Scholar
  38. Rivers D. (2007). Sampling for web surveys. Amazon Web Services,\_JSM.pdf
  39. Robins J. M., Hernan M. A., Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560.CrossRefGoogle Scholar
  40. Rosenbaum P., Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55.MathSciNetCrossRefGoogle Scholar
  41. Rubin D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association 74:318–328.zbMATHGoogle Scholar
  42. Särndal C., Lundström S. (2005). Estimation in Surveys with Nonresponse. John Wiley & Sons, Inc., Chichester.CrossRefGoogle Scholar
  43. Schonlau M., Couper M. P. (2017). Options for conducting web surveys. Statistical Science 32:279–292.MathSciNetCrossRefGoogle Scholar
  44. Schonlau M., van Soest A., Kapteyn A. (2007). Are “Webographic” or attitudinal questions useful for adjusting estimates from web surveys using propensity scoring? Survey Research Methods 1(3):155–163.Google Scholar
  45. Schonlau M., Weidmer B., Kapteyn A. (2014). Recruiting an internet panel using respondent-driven sampling. Journal of Official Statistics 30(2):291–310.CrossRefGoogle Scholar
  46. Si Y., Trangucci R., Gabry J., Gelman A. (2017). Bayesian hierarchical weighting adjustment and survey inference URL Google Scholar
  47. Silver N. (2016). Pollsters Probably Didn’t Talk to Enough White Voters Without College Degrees., [FiveThirtyEight online; accessed 21-August-2017]
  48. Simon H. A. (1956). Rational choice and the structure of the environment. Psychological Review 63:129–138.CrossRefGoogle Scholar
  49. Sirken M. (1970). Household surveys with multiplicity. Journal of the American Statistical Association 65:257–266.CrossRefGoogle Scholar
  50. Smith T. M. F. (1976). The foundations of survey sampling: A review. Journal of the Royal Statistical Society A 139:183–204.MathSciNetCrossRefGoogle Scholar
  51. Squire P. (1988). Why the 1936 literary digest poll failed. Public Opinion Quarterly 52:125–133.CrossRefGoogle Scholar
  52. Statistics Canada. (2017). Statistics Canada Quality Framework, 3rd edn. Ottawa, CA, URL Google Scholar
  53. Stuart E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25(1):1–21, URL MathSciNetCrossRefGoogle Scholar
  54. Sturgis P., Baker N., Callegaro M., Fisher S., Green J., Jennings W., Kuha J., Lauderdale B., Smith P. (2016). Report of the Inquiry into the 2015 British general election opinion polls., [accessed 06-November-2016].Google Scholar
  55. Tourangeau R., Conrad F. G., Couper M. P. (2013). The Science of Web Surveys. Oxford University Press, New York.CrossRefGoogle Scholar
  56. Valliant R., Dever J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research 40:105–137.MathSciNetCrossRefGoogle Scholar
  57. Valliant R., Dever J. A. (2018). Survey Weights: A Step-by-Step Guide to Calculation. Stata Press, College Station, TX.Google Scholar
  58. Valliant R., Dorfman A. H., Royall R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. John Wiley & Sons, Inc., New York.zbMATHGoogle Scholar
  59. Wickham H., Francois R., Henry L., M\(\ddot{u}\) ller K. (2017). dplyr: A Grammar of Data Manipulation. URL, r package version 0.7.4.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Richard Valliant
    • 1
    • 2
  • Jill A. Dever
    • 3
  • Frauke Kreuter
    • 2
    • 4
  1. 1.University of MichiganAnn ArborUSA
  2. 2.University of MarylandCollege ParkUSA
  3. 3.RTI InternationalWashington, DCUSA
  4. 4.University of MannheimMannheimGermany

Personalised recommendations