Journal of Computer Science and Technology

, Volume 33, Issue 2, pp 351–365 | Cite as

Collusion-Proof Result Inference in Crowdsourcing

  • Peng-Peng Chen
  • Hai-Long SunEmail author
  • Yi-Li FangEmail author
  • Jin-Peng Huai
Regular Paper


In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. To pursue more rewards with little effort, some workers may collude to provide repeated answers, which will damage the quality of the aggregated results. Nonetheless, there are few efforts considering the negative impact of collusion on result inference in crowdsourcing. In this paper, we are specially concerned with the Collusion-Proof result inference problem for general crowdsourcing tasks in public platforms. To that end, we design a metric, the worker performance change rate, to identify the colluded answers by computing the difference of the mean worker performance before and after removing the repeated answers. Then we incorporate the collusion detection result into existing result inference methods to guarantee the quality of the aggregated results even with the occurrence of collusion behaviors. With real-world and synthetic datasets, we conducted an extensive set of evaluations of our approach. The experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.


crowdsourcing quality control collusion collaborative crowdsourcing result inference 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2018_1823_MOESM1_ESM.pdf (222 kb)
ESM 1 (PDF 221 kb)


  1. 1.
    Li G L, Wang J N, Zheng Y D, Franklin M J. Crowdsourced data management: A survey. IEEE Trans. Knowledge and Data Engineering, 2016, 28(9): 2296-2319.CrossRefGoogle Scholar
  2. 2.
    Chen L, Lee D, Milo T. Data-driven crowdsourcing: Management, mining, and applications. In Proc. the 31st Int. Conf. Data Engineering, April 2015, pp.1527-1529.Google Scholar
  3. 3.
    Deng J, Dong W, Socher R et al. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.Google Scholar
  4. 4.
    Liu X, Lu M Y, Ooi B C, Shen Y Y, Wu S, Zhang M H. CDAS: A crowdsourcing data analytics system. Proceedings of the VLDB Endowment, 2012, 5(10): 1040-1051.CrossRefGoogle Scholar
  5. 5.
    Fang Y L, Sun H L, Li G L, Zhang R C, Huai J P. Effective result inference for context-sensitive tasks in crowdsourcing. In Proc. the 21st Int. Conf. Database Systems for Advanced Applications, April 2016, pp.33-48.Google Scholar
  6. 6.
    von Ahn L, Maurer B, McMillen C, Abraham D, Blum M. reCAPTCHA: Human-based character recognition via web security measures. Science, 2008, 321(5895): 1465-1468.MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fang Y L, Sun H L, Zhang R C, Huai J P, Mao Y Y. A model for aggregating contributions of synergistic crowdsourcing workflows. In Proc. the 28th AAAI Conf. Artificial Intelligence, July 2014, pp.3102-3103.Google Scholar
  8. 8.
    Zaidan O F, Callison-Burch C. Crowdsourcing translation: Professional quality from non-professionals. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, June 2011, pp.1220-1229.Google Scholar
  9. 9.
    Bernstein M S, Little G, Miller R C, Hartmann B, Ackerman M S, Karger D R, Crowell D, Panovich K. Soylent: A word processor with a crowd inside. Communications of the ACM, 2015, 58(8): 85-94.CrossRefGoogle Scholar
  10. 10.
    Zhu Y S, Yue S C, Yu C, Shi Y C. CEPT: Collaborative editing tool for non-native authors. In Proc. ACM Conf. Computer Supported Cooperative Work and Social Computing, February 25-March 1, 2017, pp.273-285.Google Scholar
  11. 11.
    Nebeling M, To A, Guo A H, De Freitas A A, Teevan J, Dow S P, Bigham J P. WearWrite: Crowd-assisted writing from smartwatches. In Proc. CHI Conf. Human Factors in Computing Systems, May 2016, pp.3834-3846.Google Scholar
  12. 12.
    Gray M L, Suri S, Ali S S, Kulkarni D. The crowd is a collaborative network. In Proc. the 19th ACM Conf. Computer-Supported Cooperative Work & Social Computing, February 27-March 2, 2016, pp.134-147.Google Scholar
  13. 13.
    Yin M, Gray M L, Suri S, Vaughan J W. The communication network within the crowd. In Proc. the 25th Int. Conf. World Wide Web, April 2016, pp.1293-1303.Google Scholar
  14. 14.
    Salehi N, McCabe A, Valentine M, Bernstein M. Huddler: Convening stable and familiar crowd teams despite unpredictable availability. In Proc. ACM Conf. Computer Supported Cooperative Work and Social Computing, February 25-March 1, 2017, pp.1700-1713.Google Scholar
  15. 15.
    Gadiraju U, Kawase R, Dietze S, Demartini G. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proc. the 33rd Annual ACM Conf. Human Factors in Computing Systems, April 2015, pp.1631-1640.Google Scholar
  16. 16.
    Sodré I, Brasileiro F. An analysis of the use of qualifications on the Amazon mechanical Turk online labor market. Computer Supported Cooperative Work, 2017, 26(4/5/6): 837-872.Google Scholar
  17. 17.
    Chang J C, Amershi S, Kamar E. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proc. CHI Conf. Human Factors in Computing Systems, May 2017, pp.2334-2346.Google Scholar
  18. 18.
    Wang G, Wilson C, Zhao X H, Zhu Y B, Mohanlal M, Zheng H T, Zhao B Y. Serf and turf: Crowdturfing for fun and profit. In Proc. the 21st Int. Conf. World Wide Web, April 2012, pp.679-688.Google Scholar
  19. 19.
    Adams S A. Maintaining the collision of accounts: Crowdsourcing sites in health care as brokers in the co-production of pharmaceutical knowledge. Information Communication & Society, 2014, 17(6): 657-669.CrossRefGoogle Scholar
  20. 20.
    Douceur J R. The Sybil attack. In Proc. the 1st Int. Workshop on Peer-to-Peer Systems, March 2002, pp.251-260.Google Scholar
  21. 21.
    Lev O, Polukarov M, Bachrach Y, Rosenschein J S. Mergers and collusion in all-pay auctions and crowdsourcing contests. In Proc. Int. Conf. Autonomous Agents and Multi-Agent Systems, May 2013, pp.675-682.Google Scholar
  22. 22.
    KhudaBukhsh A R, Carbonell J G, Jansen P J. Detecting non-adversarial collusion in crowdsourcing. In Proc. the 2nd AAAI Conf. Human Computation and Crowdsourcing, November 2014, pp.104-111.Google Scholar
  23. 23.
    Xiang Q K, Nevat I, Zhang P F, Zhang J. Collusionresistant spatial phenomena crowdsourcing via mixture of Gaussian processes regression. In Proc. the 18th Int. Conf. Trust in Agent Societies, May 2016, pp.30-41.Google Scholar
  24. 24.
    Fang Y L, Chen P P, Sun K, Sun H L. A decision tree based quality control framework for multi-phase tasks in crowdsourcing. In Proc. the 12th Chinese Conf. Computer Supported Cooperative Work and Social Computing, September 2017, pp.10-17.Google Scholar
  25. 25.
    Fang Y L, Sun H L, Chen P P, Deng T. Improving the quality of crowdsourced image labeling via label similarity. Journal of Computer Science and Technology, 2017, 32(5): 877-889.CrossRefGoogle Scholar
  26. 26.
    Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. the 14th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2008, pp.614-622.Google Scholar
  27. 27.
    Snow R, O’Connor B, Jurafsky D, Ng A Y. Cheap and fast-but is it good?: Evaluating non-expert annotations for natural language tasks. In Proc. Conf. Empirical Methods in Natural Language Processing, October 2008, pp.254-263.Google Scholar
  28. 28.
    Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, 1979, 28(1): 20-28.Google Scholar
  29. 29.
    Raykar V C, Yu S P, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11: 1297-1322.MathSciNetGoogle Scholar
  30. 30.
    Gao C, Lu Y, Zhou D Y. Exact exponent in optimal rates for crowdsourcing. In Proc. the 33rd Int. Conf. Machine Learning, June 2016, pp.603-611.Google Scholar
  31. 31.
    Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. the 22nd Int. Conf. Neural Information Processing Systems, December 2009, pp.2035-2043.Google Scholar
  32. 32.
    Garcia-Molina H, Joglekar M, Marcus A, Parameswaran A, Verroios V. Challenges in data crowdsourcing. IEEE Trans Knowledge and Data Engineering, 2016, 28(4): 901-911.CrossRefGoogle Scholar
  33. 33.
    Shin H, Park T, Kang S, Lee B, Song J, Chon Y, Cha H. CoSMiC: Designing a mobile crowd-sourced collaborative application to find a missing child in situ. In Proc. the 16th Int. Conf. Human-Computer Interaction with Mobile Devices & Services, September 2014, pp.389-398.Google Scholar
  34. 34.
    Ambati V, Vogel S, Carbonell J. Collaborative workflow for crowdsourcing translation. In Proc. ACM Conf. Computer Supported Cooperative Work, February 2012, pp.1191-1194.Google Scholar
  35. 35.
    Teevan J, Iqbal S T, Von Veh C. Supporting collaborative writing with microtasks. In Proc. CHI Conf. Human Factors in Computing Systems, May 2016, pp.2657-2668.Google Scholar
  36. 36.
    Rahman H, Roy S B, Thirumuruganathan S, Amer-Yahia S, Das G. Task assignment optimization in collaborative crowdsourcing. In Proc. IEEE Int. Conf. Data Mining, November 2015, pp.949-954.Google Scholar
  37. 37.
    Torshiz M N, Amintoosi H. Collusion-resistant worker selection in social crowdsensing systems. Journal of Computer and Knowledge Engineering, 2017, 1(1): 9-20.Google Scholar
  38. 38.
    Celis L E, Reddy S P, Singh I P, Vaya S. Assignment techniques for crowdsourcing sensitive tasks. In Proc. the 19th ACM Conf. Computer-Supported Cooperative Work & Social Computing, February 27-March 2, 2016, pp.836-847.Google Scholar
  39. 39.
    Wang L, Zhou Z H. Cost-saving effect of crowdsourcing learning. In Proc. the 25th Int. Joint Conf. Artificial Intelligence, July 2016, pp.2111-2117.Google Scholar
  40. 40.
    Welinder P, Branson S, Belongie S, Perona P. The multidimensional wisdom of crowds. In Proc. the 23rd Int. Conf. Neural Information Processing Systems, December 2010, pp.2424-2432.Google Scholar
  41. 41.
    Ipeirotis P G, Provost F, Wang J. Quality management on Amazon Mechanical Turk. In Proc. ACM SIGKDD Workshop on Human Computation, July 2010, pp.64-67.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Software Development Environment, School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.Beijing Advanced Innovation Center for Big Data and Brain ComputingBeijingChina

Personalised recommendations