Overview of the crowdsourcing process

  • Lobna NassarEmail author
  • Fakhri Karray
Survey Paper


A decade ago, the crowdsourcing term was first coined and used to represent a method for expressing the wisdom of the crowd in accomplishing two types of tasks. One type includes tasks that need human intelligence rather than machines, and the other type covers those tasks that can be accomplished with a higher time and cost efficiency using the crowd rather than employing experts. The crowdsourcing process contains five modules: The first is designing incentives to mobilize the crowd to do the required task. This step is followed by four modules for collecting and assuring quality and then verifying and aggregating the received information. The verification and quality control can be done for the tasks, collected data and the participants by having more participants answer the same question or accepting answers only from experts to avoid errors from unreliable participants. Methods of discovering topic experts are utilized to discover reliable candidates in the crowd who have relevant experience in the discussed topic. Expert discovery reduces the number of needed participants per question which reduces the overall cost. This work summarizes and reviews the methods used to accomplish each processing step. Yet, choosing a specific method remains application dependent.


Crowdsourcing Incentives Verification Aggregation Quality assurance Quality control (QC) Expert discovery 


  1. 1.
    Aaron S, John H, Daniel C (2011) Designing incentives for inexpert human raters. In: Proceedings of the ACM conference on computer supported cooperative work, CSCW’11Google Scholar
  2. 2.
    Aditya P, Scott C (2011) Identifying topical authorities in microblogs. In Proceedings of ACM conference on web search and data mining (WSDM), pp 45–54.
  3. 3.
    Alexander S, David F (2008) Utility data annotation with amazon mechanical turk. In: First IEEE workshop on internet vision at CVPR’08Google Scholar
  4. 4.
    Bernardo AH, Daniel MR, Fang W (2009) Crowdsourcing, attention and productivity. J Inf Sci 35:758–765. CrossRefGoogle Scholar
  5. 5.
    Bin Y, Yan W, Ling L (2015) CrowdTrust: a context-aware trust model for workers selection in crowdsourcing environments. In: 22nd IEEE international conference on web services (IEEE ICWS, research track, acceptance rate 17.4%), June 27–July 2, 2015, New York, USAGoogle Scholar
  6. 6.
    Catherine G, Matthew L (2010) Crowdsourcing document relevance assessment with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk, pp 172–179Google Scholar
  7. 7.
    Charles LAC, Gordon VC, Elizabeth AT (2000) Relevance ranking for one to three term queries. Inf Process Manag 36(2):291–311CrossRefGoogle Scholar
  8. 8.
    Damon H, Sepandar DK (2010) The anatomy of a large-scale social search engine. In: Proceedings of 19th ACM international conference on world wide web. ACM, New York, pp 431–440.
  9. 9.
    Gabriella K, Jaap K, Natasa M (2013) An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Inf Retr 16(2):138–178. CrossRefGoogle Scholar
  10. 10.
    Galen P, Iyad R, Wei P et al (2011) Time-critical social mobilization. Science 334:509–512. CrossRefGoogle Scholar
  11. 11.
    Gianluca S, Gang W, Manuel E et al (2013) Follow the green: growth and dynamics in twitter follower markets. In: Proceedings of IMCGoogle Scholar
  12. 12.
    Haoqi Z, Eric H, Yiling C et al (2012) Task routing for prediction tasks. In: Proceeding of 11th international conference autonomous agents and multiagent systems, vol 2. International foundation for autonomous agents and multi-agent Systems, Richland, pp 889–896Google Scholar
  13. 13.
    Huiji G, Geoffrey B, Goolsby Rebecca (2011) Harnessing the crowdsourcing power of social media for disaster relief. Intell Syst IEEE 26:10–14Google Scholar
  14. 14.
    Iyad R, Sohan D, Alex R et al (2013) Global manhunt pushes the limits of social mobilization. Computer 46:68–75. Google Scholar
  15. 15.
    Jacob W, Paul R, Ting-fan W et al (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Neural information processing systems (NIPS)’09, pp 2035–2043Google Scholar
  16. 16.
    Jia D, Wei D, Richard S et al (2009) ImageNet: a large-scale hierarchical image database. In: CVPRGoogle Scholar
  17. 17.
    Jianshu W, Ee Peng L, Jing J et al (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of ACM conference on web search and data mining (WSDM)Google Scholar
  18. 18.
    Jing W, Panagiotis GI, Foster P (2016) Cost-effective quality assurance in crowd labeling. Inf Syst Res (Forthcoming), Dec 2016, NYU Working Paper No. 2451/31833Google Scholar
  19. 19.
    Ju R, Yaoxue Z, Kuan Z et al (2015) SACRM: social aware crowdsourcing with reputation management in mobile sensing. Comput Commun (Elsevier) 65(1):55–65Google Scholar
  20. 20.
    Julien B, Shourya R, Gueyoung J et al (2013) Crowdsourcing translation by leveraging tournament selection and lattice-based string alignment. In: AAAI conference on human computation and crowdsourcing (HCOMP) (Works in Progress/Demos)Google Scholar
  21. 21.
    Ke M, Licia C, Mark H et al (2015) Survey of the use of crowdsourcing in software engineering. RN 15 (2015), 01Google Scholar
  22. 22.
    Kyumin L, Prithivi T, James C (2013). Crowdturfers, campaigns, and social media: tracking and revealing crowdsourced manipulation of social media. In: Proceedings of ICWSMGoogle Scholar
  23. 23.
    Kyumin L, Steve W, Hancheng G (2014) Characterizing and automatically detecting crowdturfing in Fiverr and Twitter. Soc Netw Anal Min 5(1):1–16Google Scholar
  24. 24.
    Leavitt A, Evan B, David F et al (2009) The influentials: new approaches for analyzing influence on twitter. Web Ecol Proj 4:1–18Google Scholar
  25. 25.
    Liang W, Huan L (2017) Detecting crowdturfing in social media. Encyclopedia of social network analysis and mining. Springer, New York, pp 1–9. CrossRefGoogle Scholar
  26. 26.
    Luis VA (2006) Games with a purpose. Computer 39:92–94. Google Scholar
  27. 27.
    Luis VA, Laura D (2004) Labeling images with a computer game. In: SIGCHI conference on Human factors in computing systems, pp 319–326Google Scholar
  28. 28.
    Manuel C, Lorenzo C, Andrea VA et al (2012) Finding red balloons with “split” contracts: robustness to individuals’ selfishness. In: ACM symposium on theory of computing (STOC)Google Scholar
  29. 29.
    Manuel C, Iyad R, Victoriano I et al (2016) Searching for someone. Illustrated by Beatriz Travieso. Published in MIT media lab. Sponsored by the Data61 Unit at CSIRO.
  30. 30.
    Maribel A, Amrapali Z, Elena S et al (2013) Crowdsourcing linked data quality assessment. In: Harith A et al (eds) ISWC 2013, Part II. LNCS, vol 8219. Springer, Heidelberg, pp 260–276Google Scholar
  31. 31.
    Marti L, Stefan V (2011) Dirty jobs: the role of freelance labor in web service abuse. In: Proceedings of the 20th USENIX security symposium, USESEC’11, San Francisco, CAGoogle Scholar
  32. 32.
    Meeyoung C, Hamed H, Fabriıcio B et al (2010) Measuring user influence in twitter: the million follower fallacy. In: Proceedings of AAAI conference on weblogs and social media (ICWSM)Google Scholar
  33. 33.
    Mohammad A, Boualem B, Aleksandar I et al (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17(2):76–81. CrossRefGoogle Scholar
  34. 34.
    Nguyen Q, Nguyen T, Lam T et al (2013) An evaluation of aggregation techniques in crowdsourcing. WISE 2:1–15Google Scholar
  35. 35.
    Panagiotis I, Foster P, Jing W (2010) Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation (HCOMP’10), pp 64–67Google Scholar
  36. 36.
    Pei-Yun H, Prem M, Vikas S (2009) Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT workshop on active learning for natural language processing. Association for Computational Linguistics, pp 27–35Google Scholar
  37. 37.
    Peter W, Pietro P (2010) Online crowdsourcing: rating annotators and obtaining cost effective labels. In: IEEE conference on computer vision and pattern recognition workshops (ACVHL)Google Scholar
  38. 38.
    Peter W, Steve B, Serge B et al (2010) The multidimensional wisdom of crowds. In: Neural information processing systems conference (NIPS), vol 6Google Scholar
  39. 39.
    Petros V, Hector G-M, Kerui H et al (2012) Max algorithms in crowdsourcing environments. In: Proceedings of the 2012 international conference on the world wide web, 2012, pp 989–998.
  40. 40.
    Philip D, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):20–28Google Scholar
  41. 41.
    Rion S, Brendan O, Daniel J et al (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: EMNLPGoogle Scholar
  42. 42.
    Sam M, Mike JJ, Didier GL (2014) A flexible framework for assessing the quality of crowdsourced data. In: 17th annual international AGILE conference, Castellón, SpainGoogle Scholar
  43. 43.
    Saptarshi G, Naveen S, Fabricio B et al (2012) Cognos: crowdsourcing search for topic experts in microblogs. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp 575–590Google Scholar
  44. 44.
    Shih-Wen H, Wai-Tat F (2013) Enhancing reliability using peer consistency evaluation in human computation. In: Computer supported cooperative work (CSCW), San Antonio, TX, USA, pp 639–648.
  45. 45.
    Stefanie N, Stefan R (2010) How reliable are annotations via crowdsourcing? A study about inter-annotator agreement for multi-label image annotation. In: The 11th ACM international conference on multimedia information retrieval (MIR), Philadelphia, USA, pp 29–31Google Scholar
  46. 48.
    Twitter: Who to Follow.!/who_to_follow
  47. 46.
    Victor N, Iyad R, Manuel C et al (2012) Verification in Referral-Based Crowdsourcing. PLOS One 7(10):e45924Google Scholar
  48. 47.
    Xiaohang Z, Guoliang L, Jianhua F (2016) Crowdsourced top-k algorithms: an experimental evaluation. PVLDB 9(8):612–623Google Scholar
  49. 49.
    Yu-An S, Shourya R, Greg DL (2011) Beyond independent agreement: a tournament selection approach for quality assurance of human computation tasks. In: Proceedings of HCOMP11: the 3rd workshop on human computationGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Electrical and Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations