Skip to main content
Log in

Improving the Quality of Crowdsourced Image Labeling via Label Similarity

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g., dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. For task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform the second round task allocation. Regarding result inference, after quantifying the similarity of all labels, two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these methods in comparison with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J R. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. the 23rd Annual Conference on Neural Information Processing Systems (NIPS), December 2009, pp.2035-2043.

  2. Li G, Wang J, Zheng Y, Franklin M J. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng., 2016, 28(9): 2296-2319.

    Article  Google Scholar 

  3. Deng J, Dong W, Socher R, Li L, Li K, Li F. ImageNet: A large-scale hierarchical image database. In Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009, pp.248-255.

  4. von Ahn L, Dabbish L. Labeling images with a computer game. In Proc. the 2004 Conference on Human Factors in Computing Systems (CHI), April 2004, pp.319-326.

  5. Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.1541-1547.

  6. Fang Y, Sun H, Li G, Zhang R, Huai J. Effective result inference for context-sensitive tasks in crowdsourcing. In Proc. the 21st Database Systems for Advanced Applications (DASFAA), April 2016, pp.33-48.

  7. Li G, Zheng Y, Fan J, Wang J, Cheng R. Crowdsourced data management: Overview and challenges. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1711-1716.

  8. Li G, Chai C, Fan J, Weng X, Li J, Zheng Y, Li Y, Yu X, Zhang X, Yuan H. CDB: Optimizing queries with crowd-based selections and joins. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1463-1478.

  9. Hu H, Zheng Y, Bao Z, Li G, Feng J, Cheng R. Crowdsourced POI labelling: Location-aware result inference and task assignment. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.61-72.

  10. Zheng Y, Wang J, Li G, Cheng R, Feng J. QASCA: A quality-aware task assignment system for crowdsourcing applications. In Proc. the ACM SIGMOD International Conference on Management of Data (SIGMOD), May 31-June 4, 2015, pp.1031-1046.

  11. Zheng Y, Cheng R, Maniu S, Mo L. On optimality of jury selection in crowdsourcing. In Proc. the 18th International Conference on Extending Database Technology (EDBT), Mar. 2015, pp.193-204.

  12. Zheng Y, Li G, Li Y, Shan C, Cheng R. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment (PVLDB), 2017, 10(5): 541-552.

    Article  Google Scholar 

  13. Zheng Y, Li G, Cheng R. DOCS: A domain-aware crowdsourcing system using knowledge bases. Proceedings of the VLDB Endowment (PVLDB), 2016, 10(4): 361-372.

    Article  Google Scholar 

  14. von Ahn L. Duolingo: Learn a language for free while helping to translate the Web. In Proc. the 18th International Conference on Intelligent User Interfaces (IUI), March 2013.

  15. Hu J, Oh J, Gershman A. Learning lexical entries for robotic commands using crowdsourcing. arXiv: 1609.02549, 2016. https://arxiv.org/abs/1609.02549, July 2017.

  16. Li Z, Wang T, Zhang Y, Zhan Y, Yin G. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proc. the 8th Asia-Pacific Symposium on Internetware, September 2016, pp.36-44.

  17. Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, 1979, 28(1): 20-28.

    Google Scholar 

  18. Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. J. Mach. Learn. Res., 2010, 11: 1297-1322.

    MathSciNet  Google Scholar 

  19. Lakshminarayanan B, Teh Y W. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv: 1305.0015, 2013. https://arxiv.org/abs/1305.0015, July 2017.

  20. Mamykina L, Smyth T N, Dimond J P, Gajos K Z. Learning from the crowd: Observational learning in crowdsourcing communities. In Proc. CHI Conference on Human Factors in Computing Systems (CHI), May 2016, pp.2635-2644.

  21. Pan S, Larson K, Bradshaw J, Law E. Dynamic task allocation algorithm for hiring workers that learn. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.3825-3831.

  22. Liu Y, Liu Y, Zhang M, Ma S. Pay me and I’ll follow you: Detection of crowdturfing following activities in microblog environment. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.3789-3796.

  23. Liu Q, Peng J, Ihler A T. Variational inference for crowdsourcing. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.701-709.

  24. Sheng V S, Provost F J, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), August 2008, pp.614-622.

  25. Gao C, Lu Y, Zhou D. Exact exponent in optimal rates for crowdsourcing. In Proc. the 33rd International Conference on Machine Learning (ICML), June 2016, pp.603-611.

  26. Zhang Y, Chen X, Zhou D, Jordan M I. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Proc. the 28th Advances in Neural Information Processing Systems (NIPS), December 2014, pp.1260-1268.

  27. Zhou D, Platt J C, Basu S, Mao Y. Learning from the wisdom of crowds by minimax entropy. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.2204-2212.

  28. Chiang L H, Russell R D. Pattern Classification. Springer London, 2001.

  29. Yang J, Bozzon A, Houben G. Knowledge crowdsourcing acceleration. In Proc. the 15th International Conference Engineering the Web in the Big Data Era, June 2015, pp.639-643.

  30. Li H, Zhao B, Fuxman A. The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In Proc. the 23rd International Conference on World Wide Web (WWW), April 2014, pp.165-176.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Long Sun.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 635 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, YL., Sun, HL., Chen, PP. et al. Improving the Quality of Crowdsourced Image Labeling via Label Similarity. J. Comput. Sci. Technol. 32, 877–889 (2017). https://doi.org/10.1007/s11390-017-1770-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-017-1770-7

Keywords

Navigation