Quality Control

Li, Guoliang; Wang, Jiannan; Zheng, Yudian; Fan, Ju; Franklin, Michael J.

doi:10.1007/978-981-10-7847-7_3

Quality Control

Guoliang Li⁶,
Jiannan Wang⁷,
Yudian Zheng⁸,
Ju Fan⁹ &
…
Michael J. Franklin¹⁰

Chapter
First Online: 13 October 2018

424 Accesses

Abstract

The results collected from crowd workers may not be reliable because (1) there are some malicious workers that randomly return the answers and (2) some tasks are hard and workers may not be good at these tasks. Thus it is important to exploit the different characteristics of workers and tasks and control the quality in crowdsourcing. Existing studies propose various quality-control techniques to address these issues.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amazon mechanical turk. https://www.mturk.com/
Chi-squared distribution. https://en.wikipedia.org/wiki/Chi-squared_distribution
Crowdflower. http://www.crowdflower.com
External hit. http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/Welcome.html
Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M.: Crowdsourcing for multiple-choice question answering. In: AAAI, pp. 2946–2953 (2014)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3(Jan), 993–1022 (2003)
MATH Google Scholar
Boim, R., Greenshpan, O., Milo, T., Novgorodov, S., Polyzotis, N., Tan, W.C.: Asking the right questions in crowd data sourcing. In: ICDE, pp. 1261–1264 (2012)
Google Scholar
Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: EMNLP, pp. 286–295 (2009)
Google Scholar
Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics pp. 20–28 (1979)
Article Google Scholar
Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J.R.Statist.Soc.B 30(1), 1–38 (1977)
Google Scholar
Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030 (2015)
Google Scholar
Fang, Y., Sun, H., Li, G., Zhang, R., Huai, J.: Effective result inference for context-sensitive tasks in crowdsourcing. In: DASFAA, pp. 33–48 (2016)
Chapter Google Scholar
Feng, J., Li, G., Wang, H., Feng, J.: Incremental quality inference in crowdsourcing. In: DASFAA, pp. 453–467 (2014)
Chapter Google Scholar
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)
Google Scholar
Ho, C.J., Jabbari, S., Vaughan, J.W.: Adaptive task assignment for crowdsourced classification. In: ICML, pp. 534–542 (2013)
Google Scholar
Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: AAAI (2012)
Google Scholar
Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)
Google Scholar
Ipeirotis, P., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: SIGKDD Workshop, pp. 64–67 (2010)
Google Scholar
Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Evaluating the crowd with confidence. In: SIGKDD, pp. 686–694 (2013)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)
Google Scholar
Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: AISTATS, pp. 619–627 (2012)
Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models - Principles and Techniques. MIT Press (2009)
Google Scholar
Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: Overview and challenges. In: SIGMOD, pp. 1711–1716 (2017)
Google Scholar
Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. PVLDB 8(4), 425–436 (2014)
Google Scholar
Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: SIGMOD, pp. 1187–1198 (2014)
Google Scholar
Li, Q., Ma, F., Gao, J., Su, L., Quinn, C.J.: Crowdsourcing high quality labels with a tight budget. In: WSDM, pp. 237–246 (2016)
Google Scholar
Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: NIPS, pp. 701–709 (2012)
Google Scholar
Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: CDAS: A crowdsourcing data analytics system. PVLDB 5(10), 1040–1051 (2012)
Google Scholar
Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., Han, J.: Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In: KDD, pp. 745–754 (2015)
Google Scholar
Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)
Google Scholar
Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: Query processing with people. In: CIDR, pp. 211–214 (2011)
Google Scholar
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)
Google Scholar
Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: declarative crowdsourcing. In: CIKM, pp. 1203–1212. ACM (2012)
Google Scholar
Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research 13, 491–518 (2012)
MathSciNet MATH Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. JMLR 11(Apr), 1297–1322 (2010)
MathSciNet Google Scholar
Roy, S.B., Lykourentzou, I., Thirumuruganathan, S., Amer-Yahia, S., Das, G.: Task assignment optimization in knowledge-intensive crowdsourcing. VLDBJ 24(4), 467–491 (2015)
Article Google Scholar
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE 27(2), 443–460 (2015)
Google Scholar
Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based bayesian aggregation models for crowdsourcing. In: WWW, pp. 155–164 (2014)
Google Scholar
Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Article MathSciNet Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1–2), 1–305 (2008)
Article Google Scholar
Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)
Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)
Google Scholar
Yuan, D., Li, G., Li, Q., Zheng, Y.: Sybil defense in crowdsourcing platforms. In: CIKM, pp. 1529–1538 (2017)
Google Scholar
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: ECIR, pp. 338–349 (2011)
Google Scholar
Zhao, Z., Wei, F., Zhou, M., Chen, W., Ng, W.: Crowd-selection query processing in crowdsourcing databases: A task-driven approach. In: EDBT, pp. 397–408 (2015)
Google Scholar
Zhao, Z., Yan, D., Ng, W., Gao, S.: A transfer learning based framework of crowd-selection on twitter. In: SIGKDD, pp. 1514–1517 (2013)
Google Scholar
Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT, pp. 193–204 (2015)
Google Scholar
Zheng, Y., Li, G., Cheng, R.: DOCS: domain-aware crowdsourcing system. PVLDB 10(4), 361–372 (2016)
Google Scholar
Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? PVLDB 10(5), 541–552 (2017)
Google Scholar
Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: A quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)
Google Scholar
Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: NIPS, pp. 2195–2203 (2012)
Google Scholar
Zhou, D., Liu, Q., Platt, J., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML, pp. 262–270 (2014)
Google Scholar
Zhu, S., Wu, Y., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural computation 9(8), 1627–1660 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, Beijing, China
Guoliang Li
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Jiannan Wang
Twitter Inc., San Francisco, CA, USA
Yudian Zheng
DEKE Lab & School of Information, Renmin University of China, Beijing, Beijing, China
Ju Fan
Department of Computer Science, University of Chicago, Chicago, IL, USA
Michael J. Franklin

Authors

Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiannan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yudian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Franklin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, G., Wang, J., Zheng, Y., Fan, J., Franklin, M.J. (2018). Quality Control. In: Crowdsourced Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-10-7847-7_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-7847-7_3
Published: 13 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7846-0
Online ISBN: 978-981-10-7847-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics