Quality Control

  • Guoliang Li
  • Jiannan Wang
  • Yudian Zheng
  • Ju Fan
  • Michael J. Franklin


The results collected from crowd workers may not be reliable because (1) there are some malicious workers that randomly return the answers and (2) some tasks are hard and workers may not be good at these tasks. Thus it is important to exploit the different characteristics of workers and tasks and control the quality in crowdsourcing. Existing studies propose various quality-control techniques to address these issues.


  1. 1.
    Amazon mechanical turk.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M.: Crowdsourcing for multiple-choice question answering. In: AAAI, pp. 2946–2953 (2014)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  7. 7.
    Boim, R., Greenshpan, O., Milo, T., Novgorodov, S., Polyzotis, N., Tan, W.C.: Asking the right questions in crowd data sourcing. In: ICDE, pp. 1261–1264 (2012)Google Scholar
  8. 8.
    Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: EMNLP, pp. 286–295 (2009)Google Scholar
  9. 9.
    Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)Google Scholar
  10. 10.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics pp. 20–28 (1979)CrossRefGoogle Scholar
  11. 11.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)Google Scholar
  12. 12.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J.R.Statist.Soc.B 30(1), 1–38 (1977)Google Scholar
  13. 13.
    Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030 (2015)Google Scholar
  14. 14.
    Fang, Y., Sun, H., Li, G., Zhang, R., Huai, J.: Effective result inference for context-sensitive tasks in crowdsourcing. In: DASFAA, pp. 33–48 (2016)CrossRefGoogle Scholar
  15. 15.
    Feng, J., Li, G., Wang, H., Feng, J.: Incremental quality inference in crowdsourcing. In: DASFAA, pp. 453–467 (2014)CrossRefGoogle Scholar
  16. 16.
    Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)Google Scholar
  17. 17.
    Ho, C.J., Jabbari, S., Vaughan, J.W.: Adaptive task assignment for crowdsourced classification. In: ICML, pp. 534–542 (2013)Google Scholar
  18. 18.
    Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: AAAI (2012)Google Scholar
  19. 19.
    Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)Google Scholar
  20. 20.
    Ipeirotis, P., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: SIGKDD Workshop, pp. 64–67 (2010)Google Scholar
  21. 21.
    Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Evaluating the crowd with confidence. In: SIGKDD, pp. 686–694 (2013)Google Scholar
  22. 22.
    Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)Google Scholar
  23. 23.
    Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: AISTATS, pp. 619–627 (2012)Google Scholar
  24. 24.
    Koller, D., Friedman, N.: Probabilistic Graphical Models - Principles and Techniques. MIT Press (2009)Google Scholar
  25. 25.
    Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: Overview and challenges. In: SIGMOD, pp. 1711–1716 (2017)Google Scholar
  26. 26.
    Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. PVLDB 8(4), 425–436 (2014)Google Scholar
  27. 27.
    Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: SIGMOD, pp. 1187–1198 (2014)Google Scholar
  28. 28.
    Li, Q., Ma, F., Gao, J., Su, L., Quinn, C.J.: Crowdsourcing high quality labels with a tight budget. In: WSDM, pp. 237–246 (2016)Google Scholar
  29. 29.
    Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: NIPS, pp. 701–709 (2012)Google Scholar
  30. 30.
    Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: CDAS: A crowdsourcing data analytics system. PVLDB 5(10), 1040–1051 (2012)Google Scholar
  31. 31.
    Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., Han, J.: Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In: KDD, pp. 745–754 (2015)Google Scholar
  32. 32.
    Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)Google Scholar
  33. 33.
    Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: Query processing with people. In: CIDR, pp. 211–214 (2011)Google Scholar
  34. 34.
    Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)Google Scholar
  35. 35.
    Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: declarative crowdsourcing. In: CIKM, pp. 1203–1212. ACM (2012)Google Scholar
  36. 36.
    Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research 13, 491–518 (2012)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. JMLR 11(Apr), 1297–1322 (2010)MathSciNetGoogle Scholar
  38. 38.
    Roy, S.B., Lykourentzou, I., Thirumuruganathan, S., Amer-Yahia, S., Das, G.: Task assignment optimization in knowledge-intensive crowdsourcing. VLDBJ 24(4), 467–491 (2015)CrossRefGoogle Scholar
  39. 39.
    Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE 27(2), 443–460 (2015)Google Scholar
  40. 40.
    Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based bayesian aggregation models for crowdsourcing. In: WWW, pp. 155–164 (2014)Google Scholar
  41. 41.
    Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1–2), 1–305 (2008)CrossRefGoogle Scholar
  43. 43.
    Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)Google Scholar
  44. 44.
    Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)Google Scholar
  45. 45.
    Yuan, D., Li, G., Li, Q., Zheng, Y.: Sybil defense in crowdsourcing platforms. In: CIKM, pp. 1529–1538 (2017)Google Scholar
  46. 46.
    Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: ECIR, pp. 338–349 (2011)Google Scholar
  47. 47.
    Zhao, Z., Wei, F., Zhou, M., Chen, W., Ng, W.: Crowd-selection query processing in crowdsourcing databases: A task-driven approach. In: EDBT, pp. 397–408 (2015)Google Scholar
  48. 48.
    Zhao, Z., Yan, D., Ng, W., Gao, S.: A transfer learning based framework of crowd-selection on twitter. In: SIGKDD, pp. 1514–1517 (2013)Google Scholar
  49. 49.
    Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT, pp. 193–204 (2015)Google Scholar
  50. 50.
    Zheng, Y., Li, G., Cheng, R.: DOCS: domain-aware crowdsourcing system. PVLDB 10(4), 361–372 (2016)Google Scholar
  51. 51.
    Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? PVLDB 10(5), 541–552 (2017)Google Scholar
  52. 52.
    Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: A quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)Google Scholar
  53. 53.
    Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: NIPS, pp. 2195–2203 (2012)Google Scholar
  54. 54.
    Zhou, D., Liu, Q., Platt, J., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML, pp. 262–270 (2014)Google Scholar
  55. 55.
    Zhu, S., Wu, Y., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural computation 9(8), 1627–1660 (1997)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jiannan Wang
    • 2
  • Yudian Zheng
    • 3
  • Ju Fan
    • 4
  • Michael J. Franklin
    • 5
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  3. 3.Twitter Inc.San FranciscoUSA
  4. 4.DEKE Lab & School of InformationRenmin University of ChinaBeijingChina
  5. 5.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations