Skip to main content

Quality Control

  • Chapter
  • First Online:
  • 424 Accesses

Abstract

The results collected from crowd workers may not be reliable because (1) there are some malicious workers that randomly return the answers and (2) some tasks are hard and workers may not be good at these tasks. Thus it is important to exploit the different characteristics of workers and tasks and control the quality in crowdsourcing. Existing studies propose various quality-control techniques to address these issues.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amazon mechanical turk. https://www.mturk.com/

  2. Chi-squared distribution. https://en.wikipedia.org/wiki/Chi-squared_distribution

  3. Crowdflower. http://www.crowdflower.com

  4. External hit. http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/Welcome.html

  5. Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M.: Crowdsourcing for multiple-choice question answering. In: AAAI, pp. 2946–2953 (2014)

    Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  7. Boim, R., Greenshpan, O., Milo, T., Novgorodov, S., Polyzotis, N., Tan, W.C.: Asking the right questions in crowd data sourcing. In: ICDE, pp. 1261–1264 (2012)

    Google Scholar 

  8. Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: EMNLP, pp. 286–295 (2009)

    Google Scholar 

  9. Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)

    Google Scholar 

  10. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics pp. 20–28 (1979)

    Article  Google Scholar 

  11. Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)

    Google Scholar 

  12. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J.R.Statist.Soc.B 30(1), 1–38 (1977)

    Google Scholar 

  13. Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030 (2015)

    Google Scholar 

  14. Fang, Y., Sun, H., Li, G., Zhang, R., Huai, J.: Effective result inference for context-sensitive tasks in crowdsourcing. In: DASFAA, pp. 33–48 (2016)

    Chapter  Google Scholar 

  15. Feng, J., Li, G., Wang, H., Feng, J.: Incremental quality inference in crowdsourcing. In: DASFAA, pp. 453–467 (2014)

    Chapter  Google Scholar 

  16. Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)

    Google Scholar 

  17. Ho, C.J., Jabbari, S., Vaughan, J.W.: Adaptive task assignment for crowdsourced classification. In: ICML, pp. 534–542 (2013)

    Google Scholar 

  18. Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: AAAI (2012)

    Google Scholar 

  19. Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)

    Google Scholar 

  20. Ipeirotis, P., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: SIGKDD Workshop, pp. 64–67 (2010)

    Google Scholar 

  21. Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Evaluating the crowd with confidence. In: SIGKDD, pp. 686–694 (2013)

    Google Scholar 

  22. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)

    Google Scholar 

  23. Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: AISTATS, pp. 619–627 (2012)

    Google Scholar 

  24. Koller, D., Friedman, N.: Probabilistic Graphical Models - Principles and Techniques. MIT Press (2009)

    Google Scholar 

  25. Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: Overview and challenges. In: SIGMOD, pp. 1711–1716 (2017)

    Google Scholar 

  26. Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. PVLDB 8(4), 425–436 (2014)

    Google Scholar 

  27. Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: SIGMOD, pp. 1187–1198 (2014)

    Google Scholar 

  28. Li, Q., Ma, F., Gao, J., Su, L., Quinn, C.J.: Crowdsourcing high quality labels with a tight budget. In: WSDM, pp. 237–246 (2016)

    Google Scholar 

  29. Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: NIPS, pp. 701–709 (2012)

    Google Scholar 

  30. Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: CDAS: A crowdsourcing data analytics system. PVLDB 5(10), 1040–1051 (2012)

    Google Scholar 

  31. Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., Han, J.: Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In: KDD, pp. 745–754 (2015)

    Google Scholar 

  32. Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)

    Google Scholar 

  33. Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: Query processing with people. In: CIDR, pp. 211–214 (2011)

    Google Scholar 

  34. Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)

    Google Scholar 

  35. Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: declarative crowdsourcing. In: CIKM, pp. 1203–1212. ACM (2012)

    Google Scholar 

  36. Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research 13, 491–518 (2012)

    MathSciNet  MATH  Google Scholar 

  37. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. JMLR 11(Apr), 1297–1322 (2010)

    MathSciNet  Google Scholar 

  38. Roy, S.B., Lykourentzou, I., Thirumuruganathan, S., Amer-Yahia, S., Das, G.: Task assignment optimization in knowledge-intensive crowdsourcing. VLDBJ 24(4), 467–491 (2015)

    Article  Google Scholar 

  39. Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE 27(2), 443–460 (2015)

    Google Scholar 

  40. Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based bayesian aggregation models for crowdsourcing. In: WWW, pp. 155–164 (2014)

    Google Scholar 

  41. Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  Google Scholar 

  42. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1–2), 1–305 (2008)

    Article  Google Scholar 

  43. Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)

    Google Scholar 

  44. Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)

    Google Scholar 

  45. Yuan, D., Li, G., Li, Q., Zheng, Y.: Sybil defense in crowdsourcing platforms. In: CIKM, pp. 1529–1538 (2017)

    Google Scholar 

  46. Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: ECIR, pp. 338–349 (2011)

    Google Scholar 

  47. Zhao, Z., Wei, F., Zhou, M., Chen, W., Ng, W.: Crowd-selection query processing in crowdsourcing databases: A task-driven approach. In: EDBT, pp. 397–408 (2015)

    Google Scholar 

  48. Zhao, Z., Yan, D., Ng, W., Gao, S.: A transfer learning based framework of crowd-selection on twitter. In: SIGKDD, pp. 1514–1517 (2013)

    Google Scholar 

  49. Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT, pp. 193–204 (2015)

    Google Scholar 

  50. Zheng, Y., Li, G., Cheng, R.: DOCS: domain-aware crowdsourcing system. PVLDB 10(4), 361–372 (2016)

    Google Scholar 

  51. Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? PVLDB 10(5), 541–552 (2017)

    Google Scholar 

  52. Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: A quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)

    Google Scholar 

  53. Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: NIPS, pp. 2195–2203 (2012)

    Google Scholar 

  54. Zhou, D., Liu, Q., Platt, J., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML, pp. 262–270 (2014)

    Google Scholar 

  55. Zhu, S., Wu, Y., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural computation 9(8), 1627–1660 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Li, G., Wang, J., Zheng, Y., Fan, J., Franklin, M.J. (2018). Quality Control. In: Crowdsourced Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-10-7847-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7847-7_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7846-0

  • Online ISBN: 978-981-10-7847-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics