Skip to main content

Robust Crowd Labeling Using Little Expertise

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8140))

Abstract

Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. But the problem of obtaining good quality labels from a crowd and their integration is still unresolved. To address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels supported by limited number of “ground truth” labels from experts. The ground truth labels help to estimate the individual expertise of crowd labelers and difficulty of each instance, both of which are used to aggregate the labels. We show through extensive experiments that unlike other state-of-the-art approaches, our method is robust even in the presence of a large proportion of bad labelers in the crowd. We derive a lower bound on the number of expert labels needed to judge crowd and dataset as well as to get better quality labels.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)

    Google Scholar 

  3. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics 28, 20–28 (1979)

    Article  Google Scholar 

  4. Dekel, O., Shamir, O.: Good learners for evil teachers. In: ICML, p. 30 (2009)

    Google Scholar 

  5. Donmez, P., Carbonell, J.G., Schneider, J.: Efficiently learning the accuracy of labeling sources for selective sampling. In: KDD, pp. 259–268 (2009)

    Google Scholar 

  6. Ipeirotis, P.G., Paritosh, P.K.: Managing crowdsourced human computation: a tutorial. In: WWW (Companion Volume), pp. 287–288 (2011)

    Google Scholar 

  7. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation (2010)

    Google Scholar 

  8. Ju Ho, C., Jabbari, S., Vaughan, J.W.: Adaptive task assignment for crowdsourced classification. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013). JMLR Workshop and Conference Proceedings, pp. 534–542 (2013)

    Google Scholar 

  9. Karger, D., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. CoRR (2011)

    Google Scholar 

  10. Karger, D., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, Granada, Spain (2011)

    Google Scholar 

  11. Khattak, F.K., Salleb-Aouissi, A.: Quality control of crowd labeling through expert evaluation. In: Second Workshop on Computational Social Science and the Wisdom of Crowds, NIPS, Granada, Spain (2011)

    Google Scholar 

  12. Khattak, F.K., Salleb-Aouissi, A.: Improving Crowd Labeling through Expert Evaluation. In: AAAI Symposium on the Wisdom of the Crowd (2012)

    Google Scholar 

  13. Le, J., Edmonds, A., Hester, V., Biewald, L., Street, V., Francisco, S.: Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In: Evaluation, pp. 17–20 (2010)

    Google Scholar 

  14. Oleson, D., Sorokin, A., Laughlin, G.P., Hester, V., Le, J., Biewald, L.: Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In: Human Computation. AAAI Workshops, WS-11-11. AAAI (2011)

    Google Scholar 

  15. Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running experiments on amazon mechanical turk. Judgment and Decision Making 5(5), 411–419 (2010)

    Google Scholar 

  16. Passonneau, R.J., Salleb-Aouissi, A., Ide, N.: Making sense of word sense variation. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, DEW 2009, pp. 2–9. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  17. Raykar, V., Yu, S., Zhao, L., Jerebko, A., Florin, C., Valadez, G., Bogoni, L., Moy, L.: Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: ICML 2009, pp. 889–896 (2009)

    Google Scholar 

  18. Raykar, V.C., Yu, S., Zhao, L., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. JMLR 11, 1297–1322 (2010)

    MathSciNet  Google Scholar 

  19. Sheng, V., Provost, F., Ipeirotis, P.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD 2008, pp. 614–622 (2008)

    Google Scholar 

  20. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP 2008, pp. 254–263. Association for Computational Linguistics, Morristown (2008)

    Google Scholar 

  21. Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: Computer Vision and Pattern Recognition Workshops (January 2008)

    Google Scholar 

  22. von Ahn, L., Maurer, B., Mcmillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  23. Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Who should label what? Instance allocation in multiple expert active learning. In: In Proc. of the SIAM International Conference on Data Mining, SDM (2011)

    Google Scholar 

  24. Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)

    Google Scholar 

  25. Yan, Y., Rómer, R., Glenn, F., Mark, S., Gerardo, H., Luca, B., Linda, M., Jennifer, G.: Modeling annotator expertise: Learning when everybody knows a bit of something. In: AISTAT (2010)

    Google Scholar 

  26. Yan, Y., Rosales, R., Fung, G., Dy, J.: Active learning from crowds. In: ICML 2011, pp. 1161–1168. ACM (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Khattak, F.K., Salleb-Aouissi, A. (2013). Robust Crowd Labeling Using Little Expertise. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds) Discovery Science. DS 2013. Lecture Notes in Computer Science(), vol 8140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40897-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40897-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40896-0

  • Online ISBN: 978-3-642-40897-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics