Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices
Abstract
With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.
Keywords
Botnet Functional method Mahalanobis distance Mechanical Turk Person–total correlation Random responding Response coherenceReferences
- Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440. https://doi.org/10.1007/s11336-006-1447-6 CrossRefPubMedPubMedCentralGoogle Scholar
- Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E. (2016). Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. Journal of Personality and Social Psychology, 111, 218–229. https://doi.org/10.1037/pspp0000085 CrossRefPubMedGoogle Scholar
- Briones, E. M., & Benham, G. (2017). An examination of the equivalency of self-report measures obtained from crowdsourced versus undergraduate student samples. Behavior Research Methods, 49, 320–334. https://doi.org/10.3758/s13428-016-0710-8 CrossRefPubMedGoogle Scholar
- Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods. Advance online publication. https://doi.org/10.3758/s13428-018-1035-6
- Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980 CrossRefPubMedGoogle Scholar
- Caldwell-Andrews, A., Baer, R. A., & Berry, D. T. R. (2000). Effects of response sets on NEO-PI-R scores and their relations to external criteria. Journal of Personality Assessment, 74, 472–488. https://doi.org/10.1207/S15327752jpa7403_10 CrossRefPubMedGoogle Scholar
- Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29, 2156–2160. https://doi.org/10.1016/j.chb.2013.05.009 CrossRefGoogle Scholar
- Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaivete among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130. https://doi.org/10.3758/s13428-013-0365-7 CrossRefPubMedGoogle Scholar
- Chandler, J., & Paolacci, G. (2017). Lie for a Dime. Social Psychological and Personality Science, 8, 500–508. https://doi.org/10.1177/1948550617698203 CrossRefGoogle Scholar
- Clifford, S., & Jerit, J. (2014). Is there a cost to convenience? An experimental comparison of data quality in laboratory and online studies. Journal of Experimental Political Science, 1, 120–131. https://doi.org/10.1017/xps.2014.5 CrossRefGoogle Scholar
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.CrossRefPubMedGoogle Scholar
- Costa, P. T., & McCrae, R. R. (1992). Revised Neo Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI). Lutz, FL: Psychological Assesment Resources.Google Scholar
- Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70, 596–612. https://doi.org/10.1177/0013164410366686 CrossRefGoogle Scholar
- Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.CrossRefGoogle Scholar
- Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006 CrossRefGoogle Scholar
- DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44, 837–845. https://doi.org/10.2307/2531595 CrossRefPubMedGoogle Scholar
- DeSimone, J. A., DeSimone, A. J., Harms, P. D., & Wood, D. (2018). The differential impacts of two forms of insufficient effort responding. Applied Psychology, 67, 309–338. https://doi.org/10.1111/apps.12117 CrossRefGoogle Scholar
- Dupuis, M., Capel, R., Meier, E., Rudaz, D., Strippoli, M.-P. F., Castelao, E., . . . Vandeleur, C. L. (2016). Do bipolar subjects’ responses to personality questionnaires lack reliability? Evidence from the PsyCoLaus study. Psychiatry Research, 238, 299–303. https://doi.org/10.1016/j.psychres.2016.02.050
- Dupuis, M., Meier, E., Capel, R., & Gendre, F. (2015). Measuring individuals’ response quality in self-administered psychological tests: An introduction to Gendre’s functional method. Frontiers in Psychology, 6, 629:1–12. https://doi.org/10.3389/fpsyg.2015.00629 Google Scholar
- Fronczyk, K. (2014). The identification of random or careless responding in questionnaires: The example of the NEO-FFI. Rczniki Psychologiczne, 17, 457–473.Google Scholar
- Gleibs, I. H. (2017). Are all “research fields” equal? Rethinking practice for the use of data from crowdsourcing market places. Behavior Research Methods, 49, 1333–1342. https://doi.org/10.3758/s13428-016-0789-y CrossRefPubMedGoogle Scholar
- Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26, 213–224. https://doi.org/10.1002/bdm.1753 CrossRefGoogle Scholar
- Holden, R. R., Wheeler, S., & Marjanovic, Z. (2012). When does random responding distort self-report personality assessment? An example with the NEO PI-R. Personality and Individual Differences, 52, 15–20. https://doi.org/10.1016/j.paid.2011.08.021 CrossRefGoogle Scholar
- Holtzman, N. S., & Donnellan, M. B. (2017). A simulator of the degree to which random responding leads to biases in the correlations between two individual differences. Personality and Individual Differences, 114, 187–192. https://doi.org/10.1016/j.paid.2017.04.013 CrossRefGoogle Scholar
- Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2011). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114. https://doi.org/10.1007/s10869-011-9231-8 CrossRefGoogle Scholar
- Huang, J. L., Liu, M., & Bowling, N. A. (2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100, 828–845. https://doi.org/10.1037/a0038510 CrossRefPubMedGoogle Scholar
- Johnson, J. A. (2005). Ascertaining the validity of individual protocols from Web-based personality inventories. Journal of Research in Personality, 39, 103–129. https://doi.org/10.1016/j.jrp.2004.09.009 CrossRefGoogle Scholar
- Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49, 433–442. https://doi.org/10.3758/s13428-016-0727-z CrossRefPubMedGoogle Scholar
- Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavior Research Methods, 47, 519–528. https://doi.org/10.3758/s13428-014-0483-x CrossRefPubMedGoogle Scholar
- Liu, M., Bowling, N., Huang, J., & Kent, T. (2013). Insufficient effort responding to surveys as a threat to validity: The perceptions and practices of SIOP members. Industrial–Organizational Psychologist, 51, 32–38.Google Scholar
- Liu, M., & Wronski, L. (2018). Trap questions in online surveys: Results from three web survey experiments. International Journal of Market Research, 60, 32–49. https://doi.org/10.1177/1470785317744856 CrossRefGoogle Scholar
- Mahalanobis, P. C. (1960). A method of fractile graphical analysis. Econometrica, 28, 325–351. https://doi.org/10.2307/1907724 CrossRefGoogle Scholar
- Marjanovic, Z., Struthers, C. W., Cribbie, R., & Greenglass, E. R. (2014). The Conscientious Responders Scale: A new tool for discriminating between conscientious and random responders. SAGE Open, 4, 215824401454596. https://doi.org/10.1177/2158244014545964 CrossRefGoogle Scholar
- McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under-appreciated problem in work and organisational health psychology research. Applied Psychology, 65, 287–321. https://doi.org/10.1111/apps.12058 CrossRefGoogle Scholar
- McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136. https://doi.org/10.1037/a001921620438146
- Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455. https://doi.org/10.1037/a0028085 CrossRefPubMedGoogle Scholar
- Necka, E. A., Cacioppo, S., Norman, G. J., & Cacioppo, J. T. (2016). Measuring the prevalence of problematic respondent behaviors among MTurk, campus, and community participants. PLoS ONE, 11, e0157732. https://doi.org/10.1371/journal.pone.0157732 CrossRefPubMedPubMedCentralGoogle Scholar
- Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010 CrossRefGoogle Scholar
- Osborne, J. W., & Blanchard, M. R. (2011). Random responding from participants is a threat to the validity of social science research results. Frontiers in Psychology, 2, 220. https://doi.org/10.3389/fpsyg.2010.00220 Google Scholar
- Poncet, A., Courvoisier, D. S., Combescure, C., & Perneger, T. V. (2016). Normality and sample size do not matter for the selection of an appropriate statistical test for two-group comparisons. Methodology, 12, 61–71. https://doi.org/10.1027/1614-2241/a000110 CrossRefGoogle Scholar
- Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77. https://doi.org/10.1186/1471-2105-12-77 CrossRefPubMedPubMedCentralGoogle Scholar
- Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2, 110–114. https://doi.org/10.2307/3002019 CrossRefPubMedGoogle Scholar
- Sharpe Wessling, K., Huber, J., & Netzer, O. (2017). MTurk character misrepresentation: Assessment and solutions. Journal of Consumer Research, 44, 211–230. https://doi.org/10.1093/jcr/ucx053 CrossRefGoogle Scholar
- Ward, M. K., & Meade, A. W. (2018). Applying social psychology to prevent careless responding during online surveys. Applied Psychology, 67, 231–263. https://doi.org/10.1111/apps.12118 CrossRefGoogle Scholar
- Ward, M. K., & Pond, S. B. (2015). Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys. Computers in Human Behavior, 48, 554–568. https://doi.org/10.1016/j.chb.2015.01.070 CrossRefGoogle Scholar
- Welch, B. L. (1947). The generalization of students problem when several different population variances are involved. Biometrika, 34, 28–35. https://doi.org/10.2307/2332510 PubMedGoogle Scholar