Abstract
Crowdsourcing is popular for large-scale data processing endeav ors that require hu man input. However, working with a large community of users raises new chal lenges. In particular, both possible misjudgment and disho nesty threaten the quality of the results. Common countermeasures are based on redundancy, giving way to a tradeoff between result quality and throughput. Ideally, measures should (1) maintain high throughput and (2) ensure high result quality at the same time. Existing work on crowdsourcing mostly focuses on result quality, paying little attention to throughput or even to that tradeoff. One reason is that the number of tasks (individual atomic units of work) is usually small. A further problem is that the tasks users work on are small as well. In consequence, existing result-improvement mecha nisms do not scale to the number or complexity of tasks that arise, for instance, in proofreading and processing of digitized legacy literature. This paper proposes novel result-improvement mechanisms that (1) are independent of the size and complexity of tasks and (2) allow to trade result quality for throughput to a significant extent. Both mathematical analyses and extensive simulations show the effectiveness of the proposed mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The Amazon Mechanical Turk, http://www.mturk.com
Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., Popovic, Z.: Predicting protein structures with a multiplayer online game. Nature 466 (2010)
Eckert, K., Niepert, M., Niemann, C., Buckner, C., Allen, C., Stuckenschmidt, H.: Crowdsourcing the assembly of concept hierarchies. In: Proceedings of JCDL 2010, Brisbane, Australia (2010)
Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., Murray, P., Vandenberg, J.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389 (2008), doi:10.1111/j.1365-2966.2008.13689.x
Newby, G.B., Franks, C.: Distributed proofreading. In: Proceedings of JCDL 2003, Houston, TX (2003), doi:10.1109/JCDL.2003.1204888
Sautter, G., Böhm, K., Agosti, D., Klingenberg, C.: Digital Resources from Legacy Documents - an Experience Report from the Biosystematics Domain. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 738–752. Springer, Heidelberg (2009)
Siorpaes, K., Hepp, M.: OntoGame: Towards overcoming the incentive bottleneck in ontology building. In: Chung, S., Herrero, P. (eds.) OTM-WS 2007, Part II. LNCS, vol. 4806, pp. 1222–1232. Springer, Heidelberg (2007)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP 2008, Morristown, NJ, USA (2008)
Von Ahn, L., Blum, M., Hopper, N., Langford, J.: CAPTCHA: Using Hard AI Problems for Security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003), doi:10.1007/3-540-39200-9_18
Von Ahn, L.: Games with a Purpose. IEEE Computer 29(6), 92–94 (2006)
Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321(5895) (2008), doi:10.1126/science.1160379
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sautter, G., Böhm, K. (2011). High-Throughput Crowdsourcing Mechanisms for Complex Tasks. In: Datta, A., Shulman, S., Zheng, B., Lin, SD., Sun, A., Lim, EP. (eds) Social Informatics. SocInfo 2011. Lecture Notes in Computer Science, vol 6984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24704-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-24704-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24703-3
Online ISBN: 978-3-642-24704-0
eBook Packages: Computer ScienceComputer Science (R0)