Realizing Impact Sourcing by Adaptive Gold Questions: A Socially Responsible Measure for Workers’ Trustworthiness
In recent years, crowd sourcing has emerged as a good solution for digitizing voluminous tasks. What’s more, it offers a social solution promising to extend economic opportunities to low-income countries, alleviating the welfare of poor, honest and yet uneducated labor. On the other hand, crowd sourcing’s virtual nature and anonymity encourages fraudulent workers to misuse the service for quick and easy monetary gain. This in turn compromises the quality of results, and forces task providers to employ strict control measures like gold questions or majority voting, which may gravely misjudge honest workers with lower skills, ultimately discarding them from the labor pool. Thus, the problem of fairly distinguishing between fraudulent and honest workers lacking educational skills becomes vital for supporting the vision of Impact Sourcing and its social responsibility. We develop a technique with socially responsible gold questions as an objective measure of workers’ trustworthiness, rather than a mere discarding mechanism. Our statistical model aligns workers’ skill levels and questions’ difficulty levels, which then allows adapting the gold questions’ difficulty for a fair judgment. Moreover, we illustrate how low-skilled workers’ initial payloads, which are usually discarded along with the worker, can be partially recovered for an increased economic gain, and show how low-skilled workers can be seamlessly integrated into high-performing teams. Our experiments prove that about 75% of misjudged workers can be correctly identified and effectively be integrated into teams with high overall result correctness between 70-95%.
KeywordsCrowd sourcing Impact sourcing Fraud detection Quality control
Unable to display preview. Download preview PDF.
- 1.“Digital Jobs Africa: The Rockefeller Foundation,” [Online]. Available: http://www.rockefellerfoundation.org/our-work/current-work/digital-jobs-africa/impact-sourcing.
- 2.Selke, J., Lofi, C., Balke, W.-T.: Pushing the boundaries of crowd-enabled databases with query-driven schema expansion. In: 38th Int. Conf. VLDB, pp. 538-549 (2012)Google Scholar
- 3.Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., Dredze, M.: Annotating named entities in twitter data with crowdsourcing. In: CSLDAMT 2010 Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 80–88 (2010)Google Scholar
- 4.Lofi, C., El Maarry, K., Balke, W.-T.: Skyline queries in crowd-enabled databases. In: EDBT/ICDT Joint Conf., Proc. of the 16th Int. Conf. on Extending Database Technology (2013)Google Scholar
- 5.Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: the good the bad and the OMG!. In: International AAAI Conf. on Weblogs& Social Media, pp. 538–541 (2011)Google Scholar
- 6.Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: EMNLP 2009: Proc. of the 2009 Conf. on Empirical Methods in Natural Language Processing, vol. 1, no. 1, pp. 286–295 (2009)Google Scholar
- 11.Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. Proc. of NIPS 22(1), 1–9 (2009)Google Scholar
- 12.Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proc. of ACM SIGKDD Workshop on Human Computation, pp. 0–3 (2010)Google Scholar
- 13.El Maarry, K., Balke, W.-T., Cho, H., Hwang, S., Baba, Y.: Skill ontology-based model for quality assurance in crowdsourcing. In: UnCrowd 2014: DASFAA Workshop on Uncertain and Crowdsourced Data, Bali, Indonesia, (2014)Google Scholar
- 14.Ignjatovic, A., Foo, N., Lee, C.T.L.C.T.: An analytic approach to reputation ranking of participants in online transactions. In: 2008 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. vol. 1 (2008)Google Scholar
- 15.Noorian, Z., Ulieru, M.: The State of the Art in Trust and Reputation Systems: A Framework for Comparison. Journal of theoretical and applied electronic commerce research 5(2) (2010)Google Scholar
- 17.Traub, R.E.: Applications of item response theory to practical testing problems. Book’s Publisher: Erlbaum Associates 5, 539–543 (1980)Google Scholar
- 18.Rasch, G.: Probabilistic Models for Some Intelligence and Attainment Tests. Book’s Publisher: Nielsen & Lydiche (1960)Google Scholar
- 19.Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: WISE (2013)Google Scholar
- 20.Wang, J., Ipeirotis, P.G.. Provost, F.: Managing crowdsourced workers. In: Winter Conf. on Business Intelligence (2011)Google Scholar
- 22.El Maarry, K., Balke, W.-T.: Retaining rough diamonds: towards a fairer elimination of low-skilled workers. In: 20th Int. Conf. on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam (2015)Google Scholar
- 23.Karabatsos, G.: A critique of Rasch residual fit statistics. Journal of Applied Measures. 1(2), 152–176 (2000)Google Scholar