Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices

  • Marc DupuisEmail author
  • Emanuele Meier
  • Félix Cuneo


With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.


Botnet Functional method Mahalanobis distance Mechanical Turk Person–total correlation Random responding Response coherence 


  1. Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440. CrossRefPubMedPubMedCentralGoogle Scholar
  2. Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E. (2016). Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. Journal of Personality and Social Psychology, 111, 218–229. CrossRefPubMedGoogle Scholar
  3. Briones, E. M., & Benham, G. (2017). An examination of the equivalency of self-report measures obtained from crowdsourced versus undergraduate student samples. Behavior Research Methods, 49, 320–334. CrossRefPubMedGoogle Scholar
  4. Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods. Advance online publication.
  5. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. CrossRefPubMedGoogle Scholar
  6. Caldwell-Andrews, A., Baer, R. A., & Berry, D. T. R. (2000). Effects of response sets on NEO-PI-R scores and their relations to external criteria. Journal of Personality Assessment, 74, 472–488. CrossRefPubMedGoogle Scholar
  7. Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29, 2156–2160. CrossRefGoogle Scholar
  8. Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaivete among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130. CrossRefPubMedGoogle Scholar
  9. Chandler, J., & Paolacci, G. (2017). Lie for a Dime. Social Psychological and Personality Science, 8, 500–508. CrossRefGoogle Scholar
  10. Clifford, S., & Jerit, J. (2014). Is there a cost to convenience? An experimental comparison of data quality in laboratory and online studies. Journal of Experimental Political Science, 1, 120–131. CrossRefGoogle Scholar
  11. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.CrossRefPubMedGoogle Scholar
  12. Costa, P. T., & McCrae, R. R. (1992). Revised Neo Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI). Lutz, FL: Psychological Assesment Resources.Google Scholar
  13. Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70, 596–612. CrossRefGoogle Scholar
  14. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.CrossRefGoogle Scholar
  15. Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. CrossRefGoogle Scholar
  16. DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44, 837–845. CrossRefPubMedGoogle Scholar
  17. DeSimone, J. A., DeSimone, A. J., Harms, P. D., & Wood, D. (2018). The differential impacts of two forms of insufficient effort responding. Applied Psychology, 67, 309–338. CrossRefGoogle Scholar
  18. Dupuis, M., Capel, R., Meier, E., Rudaz, D., Strippoli, M.-P. F., Castelao, E., . . . Vandeleur, C. L. (2016). Do bipolar subjects’ responses to personality questionnaires lack reliability? Evidence from the PsyCoLaus study. Psychiatry Research, 238, 299–303.
  19. Dupuis, M., Meier, E., Capel, R., & Gendre, F. (2015). Measuring individuals’ response quality in self-administered psychological tests: An introduction to Gendre’s functional method. Frontiers in Psychology, 6, 629:1–12. Google Scholar
  20. Fronczyk, K. (2014). The identification of random or careless responding in questionnaires: The example of the NEO-FFI. Rczniki Psychologiczne, 17, 457–473.Google Scholar
  21. Gleibs, I. H. (2017). Are all “research fields” equal? Rethinking practice for the use of data from crowdsourcing market places. Behavior Research Methods, 49, 1333–1342. CrossRefPubMedGoogle Scholar
  22. Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26, 213–224. CrossRefGoogle Scholar
  23. Holden, R. R., Wheeler, S., & Marjanovic, Z. (2012). When does random responding distort self-report personality assessment? An example with the NEO PI-R. Personality and Individual Differences, 52, 15–20. CrossRefGoogle Scholar
  24. Holtzman, N. S., & Donnellan, M. B. (2017). A simulator of the degree to which random responding leads to biases in the correlations between two individual differences. Personality and Individual Differences, 114, 187–192. CrossRefGoogle Scholar
  25. Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2011). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114. CrossRefGoogle Scholar
  26. Huang, J. L., Liu, M., & Bowling, N. A. (2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100, 828–845. CrossRefPubMedGoogle Scholar
  27. Johnson, J. A. (2005). Ascertaining the validity of individual protocols from Web-based personality inventories. Journal of Research in Personality, 39, 103–129. CrossRefGoogle Scholar
  28. Litman, L., Robinson, J., & Abberbock, T. (2017). A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49, 433–442. CrossRefPubMedGoogle Scholar
  29. Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavior Research Methods, 47, 519–528. CrossRefPubMedGoogle Scholar
  30. Liu, M., Bowling, N., Huang, J., & Kent, T. (2013). Insufficient effort responding to surveys as a threat to validity: The perceptions and practices of SIOP members. Industrial–Organizational Psychologist, 51, 32–38.Google Scholar
  31. Liu, M., & Wronski, L. (2018). Trap questions in online surveys: Results from three web survey experiments. International Journal of Market Research, 60, 32–49. CrossRefGoogle Scholar
  32. Mahalanobis, P. C. (1960). A method of fractile graphical analysis. Econometrica, 28, 325–351. CrossRefGoogle Scholar
  33. Marjanovic, Z., Struthers, C. W., Cribbie, R., & Greenglass, E. R. (2014). The Conscientious Responders Scale: A new tool for discriminating between conscientious and random responders. SAGE Open, 4, 215824401454596. CrossRefGoogle Scholar
  34. McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under-appreciated problem in work and organisational health psychology research. Applied Psychology, 65, 287–321. CrossRefGoogle Scholar
  35. McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136.
  36. Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455. CrossRefPubMedGoogle Scholar
  37. Necka, E. A., Cacioppo, S., Norman, G. J., & Cacioppo, J. T. (2016). Measuring the prevalence of problematic respondent behaviors among MTurk, campus, and community participants. PLoS ONE, 11, e0157732. CrossRefPubMedPubMedCentralGoogle Scholar
  38. Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1–11. CrossRefGoogle Scholar
  39. Osborne, J. W., & Blanchard, M. R. (2011). Random responding from participants is a threat to the validity of social science research results. Frontiers in Psychology, 2, 220. Google Scholar
  40. Poncet, A., Courvoisier, D. S., Combescure, C., & Perneger, T. V. (2016). Normality and sample size do not matter for the selection of an appropriate statistical test for two-group comparisons. Methodology, 12, 61–71. CrossRefGoogle Scholar
  41. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2, 110–114. CrossRefPubMedGoogle Scholar
  43. Sharpe Wessling, K., Huber, J., & Netzer, O. (2017). MTurk character misrepresentation: Assessment and solutions. Journal of Consumer Research, 44, 211–230. CrossRefGoogle Scholar
  44. Ward, M. K., & Meade, A. W. (2018). Applying social psychology to prevent careless responding during online surveys. Applied Psychology, 67, 231–263. CrossRefGoogle Scholar
  45. Ward, M. K., & Pond, S. B. (2015). Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys. Computers in Human Behavior, 48, 554–568. CrossRefGoogle Scholar
  46. Welch, B. L. (1947). The generalization of students problem when several different population variances are involved. Biometrika, 34, 28–35. PubMedGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Institute of PsychologyUniversity of LausanneLausanneSwitzerland

Personalised recommendations