Improving the Quality of Crowdsourced Image Labeling via Label Similarity

Fang, Yi-Li; Sun, Hai-Long; Chen, Peng-Peng; Deng, Ting

doi:10.1007/s11390-017-1770-7

Improving the Quality of Crowdsourced Image Labeling via Label Similarity

Regular Paper
Published: 20 September 2017

Volume 32, pages 877–889, (2017)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yi-Li Fang^1,2,
Hai-Long Sun^1,2,
Peng-Peng Chen^1,2 &
…
Ting Deng^1,2

294 Accesses
15 Citations
Explore all metrics

Abstract

Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g., dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. For task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform the second round task allocation. Regarding result inference, after quantifying the similarity of all labels, two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these methods in comparison with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Active Learning Framework for Crowdsourcing-Enhanced Image Classification and Segmentation

Active Crowd Counting with Limited Supervision

Incorporating Feature Labeling into Crowdsourcing for More Accurate Aggregation Labels

References

Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J R. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. the 23rd Annual Conference on Neural Information Processing Systems (NIPS), December 2009, pp.2035-2043.
Li G, Wang J, Zheng Y, Franklin M J. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng., 2016, 28(9): 2296-2319.
Article Google Scholar
Deng J, Dong W, Socher R, Li L, Li K, Li F. ImageNet: A large-scale hierarchical image database. In Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009, pp.248-255.
von Ahn L, Dabbish L. Labeling images with a computer game. In Proc. the 2004 Conference on Human Factors in Computing Systems (CHI), April 2004, pp.319-326.
Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.1541-1547.
Fang Y, Sun H, Li G, Zhang R, Huai J. Effective result inference for context-sensitive tasks in crowdsourcing. In Proc. the 21st Database Systems for Advanced Applications (DASFAA), April 2016, pp.33-48.
Li G, Zheng Y, Fan J, Wang J, Cheng R. Crowdsourced data management: Overview and challenges. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1711-1716.
Li G, Chai C, Fan J, Weng X, Li J, Zheng Y, Li Y, Yu X, Zhang X, Yuan H. CDB: Optimizing queries with crowd-based selections and joins. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1463-1478.
Hu H, Zheng Y, Bao Z, Li G, Feng J, Cheng R. Crowdsourced POI labelling: Location-aware result inference and task assignment. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.61-72.
Zheng Y, Wang J, Li G, Cheng R, Feng J. QASCA: A quality-aware task assignment system for crowdsourcing applications. In Proc. the ACM SIGMOD International Conference on Management of Data (SIGMOD), May 31-June 4, 2015, pp.1031-1046.
Zheng Y, Cheng R, Maniu S, Mo L. On optimality of jury selection in crowdsourcing. In Proc. the 18th International Conference on Extending Database Technology (EDBT), Mar. 2015, pp.193-204.
Zheng Y, Li G, Li Y, Shan C, Cheng R. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment (PVLDB), 2017, 10(5): 541-552.
Article Google Scholar
Zheng Y, Li G, Cheng R. DOCS: A domain-aware crowdsourcing system using knowledge bases. Proceedings of the VLDB Endowment (PVLDB), 2016, 10(4): 361-372.
Article Google Scholar
von Ahn L. Duolingo: Learn a language for free while helping to translate the Web. In Proc. the 18th International Conference on Intelligent User Interfaces (IUI), March 2013.
Hu J, Oh J, Gershman A. Learning lexical entries for robotic commands using crowdsourcing. arXiv: 1609.02549, 2016. https://arxiv.org/abs/1609.02549, July 2017.
Li Z, Wang T, Zhang Y, Zhan Y, Yin G. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proc. the 8th Asia-Pacific Symposium on Internetware, September 2016, pp.36-44.
Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, 1979, 28(1): 20-28.
Google Scholar
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. J. Mach. Learn. Res., 2010, 11: 1297-1322.
MathSciNet Google Scholar
Lakshminarayanan B, Teh Y W. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv: 1305.0015, 2013. https://arxiv.org/abs/1305.0015, July 2017.
Mamykina L, Smyth T N, Dimond J P, Gajos K Z. Learning from the crowd: Observational learning in crowdsourcing communities. In Proc. CHI Conference on Human Factors in Computing Systems (CHI), May 2016, pp.2635-2644.
Pan S, Larson K, Bradshaw J, Law E. Dynamic task allocation algorithm for hiring workers that learn. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.3825-3831.
Liu Y, Liu Y, Zhang M, Ma S. Pay me and I’ll follow you: Detection of crowdturfing following activities in microblog environment. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.3789-3796.
Liu Q, Peng J, Ihler A T. Variational inference for crowdsourcing. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.701-709.
Sheng V S, Provost F J, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), August 2008, pp.614-622.
Gao C, Lu Y, Zhou D. Exact exponent in optimal rates for crowdsourcing. In Proc. the 33rd International Conference on Machine Learning (ICML), June 2016, pp.603-611.
Zhang Y, Chen X, Zhou D, Jordan M I. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Proc. the 28th Advances in Neural Information Processing Systems (NIPS), December 2014, pp.1260-1268.
Zhou D, Platt J C, Basu S, Mao Y. Learning from the wisdom of crowds by minimax entropy. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.2204-2212.
Chiang L H, Russell R D. Pattern Classification. Springer London, 2001.
Yang J, Bozzon A, Houben G. Knowledge crowdsourcing acceleration. In Proc. the 15th International Conference Engineering the Web in the Big Data Era, June 2015, pp.639-643.
Li H, Zhao B, Fuxman A. The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In Proc. the 23rd International Conference on World Wide Web (WWW), April 2014, pp.165-176.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen & Ting Deng
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen & Ting Deng

Authors

Yi-Li Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Long Sun
View author publications
You can also search for this author in PubMed Google Scholar
Peng-Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ting Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai-Long Sun.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 635 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, YL., Sun, HL., Chen, PP. et al. Improving the Quality of Crowdsourced Image Labeling via Label Similarity. J. Comput. Sci. Technol. 32, 877–889 (2017). https://doi.org/10.1007/s11390-017-1770-7

Download citation

Received: 02 March 2017
Revised: 29 August 2017
Published: 20 September 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11390-017-1770-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Quality of Crowdsourced Image Labeling via Label Similarity

Abstract

Access this article

Similar content being viewed by others

Deep Active Learning Framework for Crowdsourcing-Enhanced Image Classification and Segmentation

Active Crowd Counting with Limited Supervision

Incorporating Feature Labeling into Crowdsourcing for More Accurate Aggregation Labels

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the Quality of Crowdsourced Image Labeling via Label Similarity

Abstract

Access this article

Similar content being viewed by others

Deep Active Learning Framework for Crowdsourcing-Enhanced Image Classification and Segmentation

Active Crowd Counting with Limited Supervision

Incorporating Feature Labeling into Crowdsourcing for More Accurate Aggregation Labels

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation