Skip to main content

Abstract

In the last decade many sources of data other than probability samples have become available as a consequence of the ubiquity of electronic data collection. For example, some vendors and survey organizations have formed large panels of persons who are willing to participate in surveys via the Internet. Many of these sources, despite being large, are not probability samples, but analysts want to project them to full finite populations. This chapter reviews the types of nonprobability data sources that are available and criteria that can be used to judge their quality. We also cover the methods used to make inferences to finite populations using these datasets: quasirandomization, model-based, and doubly robust which is a combination of quasirandomization and model-based techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Neyman (1934) also presented the randomization theory for stratified and cluster sampling that we have relied on in earlier chapters.

  2. 2.

    https://www.google.com/analytics/surveys/

  3. 3.

    Note that matching does require that the population for inference must be defined. In some cases, a target population may not have a clearly defined target population, e.g., a mall intercept survey.

  4. 4.

    When varM(y) = V, a more complicated estimator of the total turns out to be the best linear unbiased predictor, but we will not cover that here. See Theorem 2.2.1 in Valliant et al. (2000) for details.

  5. 5.

    The steps described here may change in the future. Consequently, you may need to search the Internet for updated installation instructions.

  6. 6.

    Although there is a version of rstanarm on CRAN, it does not include the option to use a structured prior.

References

  • Alvarez R., Sherman R., Van Beselaere C. (2003). Subject acquisition for web-based surveys. Political Analysis 11:23–43.

    Article  Google Scholar 

  • Austin P. (2014). A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine 33(6):1057–1069, URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4285163/

    Article  MathSciNet  Google Scholar 

  • Baker R., Brick J. M., Bates N. A., Couper M. P., Courtright M., Dennis J. M., Dillman D. A., Frankel M. R., Garland P., Groves R. M., Kennedy C., Krosnick J., Lavrakas P. J., Lee S., Link M. W., Piekarski L., Rao K., Thomas R. K., Zahs D. (2010). AAPOR report on online panels. Public Opinion Quarterly 74:711–781.

    Article  Google Scholar 

  • Baker R., Brick J. M., Bates N. A., Battaglia M. P., Couper M. P., Dever J. A., Gile K., Tourangeau R. (2013). Report of the AAPOR task force on non-probability sampling. Tech. rep., The American Association for Public Opinion Research, Deerfield, IL.

    Google Scholar 

  • Biemer P. P. (2010). Total survey error design, implementation, and evaluation. Public Opinion Quarterly 74(5):827–848.

    Article  Google Scholar 

  • Callegaro M., Baker R., Bethlehem J., Göritz A., Krosnick J., Lavrakas P. (eds) (2014). Online Panel Research: A Data Quality Perspective. John Wiley & Sons, Ltd., United Kingdom.

    Google Scholar 

  • Cowling D. (2015). Election 2015: How the opinion polls got it wrong. http://www.bbc.com/news/uk-politics-32751993, [BBC News online; accessed 06-November-2016].

  • Dever J. A. (2008). Sampling weight calibration with estimated control totals. PhD thesis, University of Maryland.

    Google Scholar 

  • Dever J. A., Valliant R. (2010). A comparison of variance estimators for poststratification to estimated control totals. Survey Methodology 36:45–56.

    Google Scholar 

  • Dever J. A., Valliant R. (2016). General regression estimation adjusted for undercoverage and estimated control totals. Journal of Survey Statistics and Methodology 4:289–318.

    Article  Google Scholar 

  • Elliott M. R., Valliant R. (2017). Inference for nonprobability samples. Statistical Science 32:249–264.

    Article  MathSciNet  Google Scholar 

  • Enten H. (2014). Flying Blind Toward Hogan’s Upset Win In Maryland. http://fivethirtyeight.com/datalab/governor-maryland-surprise-brown-hogan/, [FiveThirtyEight online; accessed 06-November-2016].

  • Folsom R. E., Singh A. C. (2000). The generalized exponential model for sampling weight calibration for extreme values, nonresponse, and poststratification. In: Proceedings of the Survey Research Methods Section, American Statistical Association, pp 598–603.

    Google Scholar 

  • Frost S., Brouwer K., Firestone-Cruz M., Ramos R., Ramos M., Lozada R., Magis-Rodriguez C., Strathdee S. (2006). Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: Recruitment dynamics and impact on estimates of hiv and syphilis prevalence. Journal of Urban Health 83(6):83–97.

    Article  Google Scholar 

  • Gelman A. (2007). Struggles with survey weighting and regression modeling. Statistical Science 22(2):153–164.

    Article  MathSciNet  Google Scholar 

  • Gelman A., Carlin J., Stern H., Rubin D. B. (1995). Data Analysis. Chapman & Hall/CRC., Boca Raton, FL

    Google Scholar 

  • Ghosh M. (2009). Bayesian developments in survey sampling. In: Pfeffermann D., Rao C. (eds) Handbook of Statistics, Volume 29B Sample Surveys: Inference and Analysis. Elsevier, Amsterdam, chap 29, pp 153–188.

    Google Scholar 

  • Ghosh M., Meeden G. (1997). Bayesian Methods for Finite Population Sampling. Chapman & Hall, London.

    Book  Google Scholar 

  • Gile K., Handcock M. (2010). Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40:285–327.

    Article  Google Scholar 

  • Gilks W., Richardson S., Spiegelhalter D. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC Press, Boca Raton, FL.

    MATH  Google Scholar 

  • Gosnell H. F. (1937) How accurate were the polls? Public Opinion Quarterly 1:97–105.

    Google Scholar 

  • Heckathorn D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 44:174–199.

    Article  Google Scholar 

  • Ho D., Imai K., King G., Stuart E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15:199–236.

    Article  Google Scholar 

  • Ho D., Imai K., King G., Stuart E. (2011). Matchit: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software 42(8), URL https://www.jstatsoft.org/article/view/v042i08

  • Kang J. D. Y., Schafer J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22(4):523–539.

    Article  MathSciNet  Google Scholar 

  • Keiding N., Louis T. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A 179:319–376.

    Article  MathSciNet  Google Scholar 

  • Kennedy C., Blumenthal M., Clement S., Clinton J., Durand C., Franklin C., McGeeney K., Miringoff L., Olson K., Rivers D., Saad L., Witt E., Wiezien C. (2017). An evaluation of 2016 election polls in the U.S. ad hoc committee on 2016 election polling. Tech. rep., The American Association for Public Opinion Research, Deerfield, IL, URL http://www.aapor.org/Education-Resources/Reports/An-Evaluation-of-2016-Election-Polls-in-the-U-S.aspx

  • Kott P. S. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology 32(2):133–142.

    Google Scholar 

  • Kreuter F., Presser S., Tourangeau R. (2008). Social desirability bias in CATI, IVR, and web surveys: The effects of mode and question sensitivity. Public Opinion Quarterly 72(5):847–865. DOI 10.1093/poq/nfn063, URL http://dx.doi.org/10.1093/poq/nfn063

    Article  Google Scholar 

  • Lee S. (2006). Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal of Official Statistics 22:329–349.

    Google Scholar 

  • Lee S., Valliant R. (2009). Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociological Methods & Research 37(3):319–343.

    Article  MathSciNet  Google Scholar 

  • Liebermann O. (2015). Why were the israeli election polls so wrong? http://www.cnn.com/2015/03/18/middleeast/israel-election-polls/, [CNN online; accessed 06-November-2016].

  • Little R. J. A. (2003). Bayesian methods for unit and item nonresponse. In: Chambers R., Skinner C. (eds) Analysis of Survey Data. John Wiley, Chichester, chap 18.

    Google Scholar 

  • Long J. S., Ervin L. H. (2000). Using heteroscedasticity consisten standard errors in the linear regression model. The American Statistician 54:217–224.

    Google Scholar 

  • Mercer A., Kreuter F., Keeter S., Stuart E. (2017). Theory and practice in nonprobability surveys: Parallels between causal inference and survey inference. Public Opinion Quarterly 81:250–279.

    Article  Google Scholar 

  • National Academies of Sciences, Engineering, and Medicine. (2017). Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps. The National Academies Press, Washington, DC, DOI 10.17226/24893, URL https://www.nap.edu/catalog/24893/federal-statistics-multiple-data-sources-and-privacy-protection-next-steps

  • Neyman J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97(Part 4):558–625.

    Article  Google Scholar 

  • Rivers D. (2007). Sampling for web surveys. Amazon Web Services, https://s3.amazonaws.com/yg-public/Scientific/Sample+Matching\_JSM.pdf

  • Robins J. M., Hernan M. A., Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560.

    Article  Google Scholar 

  • Rosenbaum P., Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55.

    Article  MathSciNet  Google Scholar 

  • Rubin D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association 74:318–328.

    MATH  Google Scholar 

  • Särndal C., Lundström S. (2005). Estimation in Surveys with Nonresponse. John Wiley & Sons, Inc., Chichester.

    Book  Google Scholar 

  • Schonlau M., Couper M. P. (2017). Options for conducting web surveys. Statistical Science 32:279–292.

    Article  MathSciNet  Google Scholar 

  • Schonlau M., van Soest A., Kapteyn A. (2007). Are “Webographic” or attitudinal questions useful for adjusting estimates from web surveys using propensity scoring? Survey Research Methods 1(3):155–163.

    Google Scholar 

  • Schonlau M., Weidmer B., Kapteyn A. (2014). Recruiting an internet panel using respondent-driven sampling. Journal of Official Statistics 30(2):291–310.

    Article  Google Scholar 

  • Si Y., Trangucci R., Gabry J., Gelman A. (2017). Bayesian hierarchical weighting adjustment and survey inference URL https://arxiv.org/abs/1707.08220

    Google Scholar 

  • Silver N. (2016). Pollsters Probably Didn’t Talk to Enough White Voters Without College Degrees. https://fivethirtyeight.com/features/pollsters-probably-didnt-talk-to-enough-white-voters-without-college-degrees/, [FiveThirtyEight online; accessed 21-August-2017]

  • Simon H. A. (1956). Rational choice and the structure of the environment. Psychological Review 63:129–138.

    Article  Google Scholar 

  • Sirken M. (1970). Household surveys with multiplicity. Journal of the American Statistical Association 65:257–266.

    Article  Google Scholar 

  • Smith T. M. F. (1976). The foundations of survey sampling: A review. Journal of the Royal Statistical Society A 139:183–204.

    Article  MathSciNet  Google Scholar 

  • Squire P. (1988). Why the 1936 literary digest poll failed. Public Opinion Quarterly 52:125–133.

    Article  Google Scholar 

  • Statistics Canada. (2017). Statistics Canada Quality Framework, 3rd edn. Ottawa, CA, URL http://www.statcan.gc.ca/pub/12-586-x/12-586-x2017001-eng.pdf

    Google Scholar 

  • Stuart E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25(1):1–21, URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943670/

    Article  MathSciNet  Google Scholar 

  • Sturgis P., Baker N., Callegaro M., Fisher S., Green J., Jennings W., Kuha J., Lauderdale B., Smith P. (2016). Report of the Inquiry into the 2015 British general election opinion polls. http://eprints.ncrm.ac.uk/3789/1/Report_final_revised.pdf, [accessed 06-November-2016].

    Google Scholar 

  • Tourangeau R., Conrad F. G., Couper M. P. (2013). The Science of Web Surveys. Oxford University Press, New York.

    Book  Google Scholar 

  • Valliant R., Dever J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research 40:105–137.

    Article  MathSciNet  Google Scholar 

  • Valliant R., Dever J. A. (2018). Survey Weights: A Step-by-Step Guide to Calculation. Stata Press, College Station, TX.

    Google Scholar 

  • Valliant R., Dorfman A. H., Royall R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. John Wiley & Sons, Inc., New York.

    MATH  Google Scholar 

  • Wickham H., Francois R., Henry L., M\(\ddot{u}\) ller K. (2017). dplyr: A Grammar of Data Manipulation. URL https://CRAN.R-project.org/package=dplyr, r package version 0.7.4.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Valliant, R., Dever, J.A., Kreuter, F. (2018). Nonprobability Sampling. In: Practical Tools for Designing and Weighting Survey Samples. Statistics for Social and Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93632-1_18

Download citation

Publish with us

Policies and ethics