Sloppiness mitigation in crowdsourcing: detecting and correcting bias for crowd scoring tasks

  • Lingyu LyuEmail author
  • Mehmed Kantardzic
  • Tegjyot Singh Sethi
Regular Paper


Due to different expertise levels, personal preference, or fatigue from long working of the crowd workers, the data obtained through crowdsourcing are usually unreliable. One big challenge is to obtain true information from such noisy data. Sloppiness, which represents the phenomena of observed labels which fluctuate around the true labels, is one type of the errors that has rarely been discussed in research. Moreover, most existing approaches try to derive truths in binary labeling tasks. In this paper, we deal with the sloppiness in a crowd scoring task, to obtain high-quality estimated labels. Crowd scoring task consists of ordinal and multiple labels, instead of just two labels. The worker in crowdsourcing can exhibit sloppiness, which can lead to unreliable scoring. We show that sloppy workers with biases, who constantly give higher (or lower) answers compared with true labels, can be effectively utilized to improve the quality of the estimated labels. To make use of the labels from crowd workers with biased sloppy behavior, we propose an iterative two-step model to infer the true labels. The first step identifies the biased workers and corrects the biases. The second step uses an optimization-based truth discovery framework to derive true labels from high-quality observed labels and the corrected labels from first step. We also present a hierarchical categorization for different types of crowd workers. Experiments on synthetic data as well as real-world datasets are conducted on the proposed model. The effectiveness of the proposed framework is demonstrated by comparing results with baseline models such as majority voting and expectation maximization-based aggregating algorithm; up to 16% improvement could be obtained for the accuracy.


Crowdsourcing Sloppiness Reliability Bias Scoring Truth discovery 


  1. 1.
    Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M. Crowdsourcing for multiple-choice question answering. In: AAAI, pp. 2946–2953 (2014)Google Scholar
  2. 2.
    Bertsekas, D.P. Non-linear programming. In: Athena scientific (1999)Google Scholar
  3. 3.
    Buckley, C., Lease, M., Smucker, M.D., Jung, H.J., Grady, C. Overview of the TREC 2010 relevance feedback track (notebook). In: The Nineteenth Text Retrieval Conference (TREC) Notebook (2010)Google Scholar
  4. 4.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)CrossRefGoogle Scholar
  5. 5.
    De Alfaro, L., Shavlovsky, M. Crowdgrader: a tool for crowdsourcing the evaluation of homework assignments. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education. ACM, pp. 415–420 (2014)Google Scholar
  6. 6.
    Dekel, O., Shamir, O.: Vox populi: collecting high-quality labels from a crowd (2009)Google Scholar
  7. 7.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. ACM, pp. 469–478 (2012)Google Scholar
  8. 8.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endow. 2(1), 550–561 (2009)CrossRefGoogle Scholar
  9. 9.
    Ertekin, S., Hirsh, H., Rudin, C.: Learning to predict the wisdom of crowds (2012). Preprint. arXiv:1204.3611
  10. 10.
    Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the third ACM International Conference on Web Search and Data Mining. ACM, pp. 131–140 (2010)Google Scholar
  11. 11.
    Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endow. 8(12), 2048–2049 (2015)CrossRefGoogle Scholar
  12. 12.
    Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, vol. 2. CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
  13. 13.
    Gneezy, U., Rustichini, A.: Pay enough or don’t pay at all. Q. J. Econ. 115(3), 791–810 (2000)CrossRefGoogle Scholar
  14. 14.
    Hama, A.: Predictably irrational: the hidden forces that shape our decisions. Mank. Q. 50(3), 257 (2010)Google Scholar
  15. 15.
    Ipeirotis, P.G., Gabrilovich, E.: Quizz: targeted crowdsourcing with a billion (potential) users. In: Proceedings of the 23rd International Conference on World Wide Web. ACM, pp. 143–154 (2014)Google Scholar
  16. 16.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, pp. 64–67 (2010)Google Scholar
  17. 17.
    Kamar, E., Kapoor, A., Horvitz, E.: Identifying and accounting for task-dependent bias in crowdsourcing. In: Third AAAI Conference on Human Computation and Crowdsourcing (2015)Google Scholar
  18. 18.
    Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Proceeding Systems, pp. 1953–1961 (2011)Google Scholar
  19. 19.
    Kazai, G., Kamps, J., Milic-Frayling, N.: Worker types and personality traits in crowdsourcing relevance labels. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, pp. 1941–1944 (2011)Google Scholar
  20. 20.
    Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, pp. 1187–1198 (2014)Google Scholar
  21. 21.
    Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. ACM Sigkdd Explor. Newsl. 17(2), 1–16 (2016)CrossRefGoogle Scholar
  22. 22.
    Meng, C., Jiang, W., Li, Y., Gao, J., Su, L., Ding, H., Cheng, Y.: Truth discovery on crowd sensing of correlated entities. In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, pp. 169–182 (2015)Google Scholar
  23. 23.
    Passonneau, R.J., Carpenter, B.: The benefits of a model of annotation. Trans. Assoc. Comput. Linguist. 2, 311–326 (2014)CrossRefGoogle Scholar
  24. 24.
    Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 877–885 (2010)Google Scholar
  25. 25.
    Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  27. 27.
    Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 614–622 (2008)Google Scholar
  28. 28.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 254–263 (2008)Google Scholar
  29. 29.
    Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp. 319–326 (2004)Google Scholar
  30. 30.
    Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: Recaptcha: human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2013)CrossRefGoogle Scholar
  32. 32.
    Vuurens, J., de Vries, A.P., Eickhoff, C.: How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In: Proceedings of hte ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR11), pp. 21–26 (2011)Google Scholar
  33. 33.
    Wauthier, F.L., Jordan, M.I.: Bayesian bias mitigation for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 1800–1808 (2011)Google Scholar
  34. 34.
    Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, pp. 2424–2432 (2010)Google Scholar
  35. 35.
    Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)Google Scholar
  36. 36.
    Yan, Y., Rosales, R., Fung, G., Dy, J.G.: Active learning from crowds. ICML 11, 1161–1168 (2011)Google Scholar
  37. 37.
    Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)CrossRefGoogle Scholar
  38. 38.
    Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Association for Computational Linguistics, pp. 1220–1229 (2011)Google Scholar
  39. 39.
    Zhang, J., Sheng, V.S., Li, Q., Wu, J., Wu, X.: Consensus algorithms for biased labeling in crowdsourcing. Inf. Sci. 382, 254–273 (2017)CrossRefGoogle Scholar
  40. 40.
    Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: Advances in Neural Information Processing Systems, pp. 2195–2203 (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Engineering and Computer ScienceUniversity of LouisvilleLouisvilleUSA

Personalised recommendations