Realizing Impact Sourcing by Adaptive Gold Questions: A Socially Responsible Measure for Workers’ Trustworthiness

  • Kinda El MaarryEmail author
  • Ulrich Güntzer
  • Wolf-Tilo Balke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


In recent years, crowd sourcing has emerged as a good solution for digitizing voluminous tasks. What’s more, it offers a social solution promising to extend economic opportunities to low-income countries, alleviating the welfare of poor, honest and yet uneducated labor. On the other hand, crowd sourcing’s virtual nature and anonymity encourages fraudulent workers to misuse the service for quick and easy monetary gain. This in turn compromises the quality of results, and forces task providers to employ strict control measures like gold questions or majority voting, which may gravely misjudge honest workers with lower skills, ultimately discarding them from the labor pool. Thus, the problem of fairly distinguishing between fraudulent and honest workers lacking educational skills becomes vital for supporting the vision of Impact Sourcing and its social responsibility. We develop a technique with socially responsible gold questions as an objective measure of workers’ trustworthiness, rather than a mere discarding mechanism. Our statistical model aligns workers’ skill levels and questions’ difficulty levels, which then allows adapting the gold questions’ difficulty for a fair judgment. Moreover, we illustrate how low-skilled workers’ initial payloads, which are usually discarded along with the worker, can be partially recovered for an increased economic gain, and show how low-skilled workers can be seamlessly integrated into high-performing teams. Our experiments prove that about 75% of misjudged workers can be correctly identified and effectively be integrated into teams with high overall result correctness between 70-95%.


Crowd sourcing Impact sourcing Fraud detection Quality control 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    “Digital Jobs Africa: The Rockefeller Foundation,” [Online]. Available:
  2. 2.
    Selke, J., Lofi, C., Balke, W.-T.: Pushing the boundaries of crowd-enabled databases with query-driven schema expansion. In: 38th Int. Conf. VLDB, pp. 538-549 (2012)Google Scholar
  3. 3.
    Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., Dredze, M.: Annotating named entities in twitter data with crowdsourcing. In: CSLDAMT 2010 Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 80–88 (2010)Google Scholar
  4. 4.
    Lofi, C., El Maarry, K., Balke, W.-T.: Skyline queries in crowd-enabled databases. In: EDBT/ICDT Joint Conf., Proc. of the 16th Int. Conf. on Extending Database Technology (2013)Google Scholar
  5. 5.
    Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: the good the bad and the OMG!. In: International AAAI Conf. on Weblogs& Social Media, pp. 538–541 (2011)Google Scholar
  6. 6.
    Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: EMNLP 2009: Proc. of the 2009 Conf. on Empirical Methods in Natural Language Processing, vol. 1, no. 1, pp. 286–295 (2009)Google Scholar
  7. 7.
    Lofi, C., Selke, J., Balke, W.-T.: Information Extraction Meets Crowdsourcing: A Promising Couple. Proc. of the VLDB Endowment 5(6), 538–549 (2012). 23, 2012CrossRefGoogle Scholar
  8. 8.
    Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Journal: Pattern Analysis and Applications -- PAA 6(1), 22–31 (2003)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of Applied Statistics. 28, 20–28 (1979)CrossRefGoogle Scholar
  10. 10.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning From Crowds. The Journal of Machine Learning Research 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  11. 11.
    Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. Proc. of NIPS 22(1), 1–9 (2009)Google Scholar
  12. 12.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proc. of ACM SIGKDD Workshop on Human Computation, pp. 0–3 (2010)Google Scholar
  13. 13.
    El Maarry, K., Balke, W.-T., Cho, H., Hwang, S., Baba, Y.: Skill ontology-based model for quality assurance in crowdsourcing. In: UnCrowd 2014: DASFAA Workshop on Uncertain and Crowdsourced Data, Bali, Indonesia, (2014)Google Scholar
  14. 14.
    Ignjatovic, A., Foo, N., Lee, C.T.L.C.T.: An analytic approach to reputation ranking of participants in online transactions. In: 2008 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. vol. 1 (2008)Google Scholar
  15. 15.
    Noorian, Z., Ulieru, M.: The State of the Art in Trust and Reputation Systems: A Framework for Comparison. Journal of theoretical and applied electronic commerce research 5(2) (2010)Google Scholar
  16. 16.
    Kazai, G.: In search of quality in crowdsourcing for search engine evaluation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 165–176. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Traub, R.E.: Applications of item response theory to practical testing problems. Book’s Publisher: Erlbaum Associates 5, 539–543 (1980)Google Scholar
  18. 18.
    Rasch, G.: Probabilistic Models for Some Intelligence and Attainment Tests. Book’s Publisher: Nielsen & Lydiche (1960)Google Scholar
  19. 19.
    Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: WISE (2013)Google Scholar
  20. 20.
    Wang, J., Ipeirotis, P.G.. Provost, F.: Managing crowdsourced workers. In: Winter Conf. on Business Intelligence (2011)Google Scholar
  21. 21.
    Batchelder, W.H., Romney, A.K.: Test theory without an answer key. Journal Psychometrika 53(1), 71–92 (1988)zbMATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    El Maarry, K., Balke, W.-T.: Retaining rough diamonds: towards a fairer elimination of low-skilled workers. In: 20th Int. Conf. on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam (2015)Google Scholar
  23. 23.
    Karabatsos, G.: A critique of Rasch residual fit statistics. Journal of Applied Measures. 1(2), 152–176 (2000)Google Scholar
  24. 24.
    Mair, P.: Extended Rasch Modeling: The eRm Package for the Application of IRT Models in R. Journal of Statistical Software 20(9), 1–20 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Kinda El Maarry
    • 1
    Email author
  • Ulrich Güntzer
    • 2
  • Wolf-Tilo Balke
    • 1
  1. 1.IFIS, TU BraunschweigBraunschweigGermany
  2. 2.Inst. f. Informatik, Universität TübingenTübingenGermany

Personalised recommendations