SPGLAD: A Self-paced Learning-Based Crowdsourcing Classification Model

Zhang, Xianchao; Shi, Heng; Li, Yuangang; Liang, Wenxin

doi:10.1007/978-3-319-67274-8_17

Xianchao Zhang¹⁷,
Heng Shi¹⁷,
Yuangang Li^18,19 &
…
Wenxin Liang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10526))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

932 Accesses
3 Citations

Abstract

Crowdsourcing platforms like Amazon’s Mechanical Turk provide fast and effective solutions of collecting massive datasets for performing tasks in domains such as image classification, information retrieval, etc. Crowdsourcing quality control plays an essential role in such systems. However, existing algorithms are prone to get stuck in a bad local optimum because of ill-defined datasets. To overcome the above drawbacks, we propose a novel self-paced quality control model integrating a priority-based sample-picking strategy. The proposed model ensures the evident samples do better efforts during iterations. We also empirically demonstrate that the proposed self-paced learning strategy promotes common quality control methods.

This work was supported by 863 project of China (No. 2015AA015403) and NSFC (No. 61632019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.mturk.com.
2.
http://crowdflower.com.
3.
Data are download from http://i.cs.hku.hk/~ydzheng2/crowd_survey/datasets.html.

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Appl. Stat. 28, 20–28 (1979)
Article Google Scholar
Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp. 1809–1815 (2014)
Google Scholar
Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 131–140. ACM (2010)
Google Scholar
Karataev, E., Zadorozhny, V.: Adaptive social learning based on crowdsourcing. IEEE Trans. Learn. Technol. 10(2), 128–139 (2016)
Google Scholar
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems, pp. 1189–1197 (2010)
Google Scholar
Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., Han, J.: Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 745–754. ACM (2015)
Google Scholar
Oyama, S., Baba, Y., Sakurai, Y., Kashima, H.: Accurate integration of crowdsourced labels using workers’ self-reported confidence scores. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2554–2560. AAAI Press (2013)
Google Scholar
Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based bayesian aggregation models for crowdsourcing. In: the 23rd International Conference, pp. 155–164. ACM, New York (2014)
Google Scholar
Welinder, P., Branson, S., Perona, P.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems 23 (2010)
Google Scholar
Whitehill, J., Wu, T.f., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)
Google Scholar
Xu, C., Tao, D., Xu, C.: Multi-view self-paced learning for clustering. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 3974–3980. AAAI Press (2015)
Google Scholar
Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)
Article Google Scholar
Zhou, D., Liu, Q., Platt, J.C., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML, pp. 262–270 (2014)
Google Scholar
Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: Advances in Neural Information Processing Systems, pp. 2195–2203 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Technology, Dalian University of Technology, Dalian, 116024, China
Xianchao Zhang, Heng Shi & Wenxin Liang
Shanghai University of Finance and Economics, Shanghai, 200433, China
Yuangang Li
Goldpac Limited, Zhuhai, 519070, China
Yuangang Li

Authors

Xianchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Heng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yuangang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Liang .

Editor information

Editors and Affiliations

Seoul National University, Seoul, Korea (Republic of)
U Kang
School of Information Systems, Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Shi, H., Li, Y., Liang, W. (2017). SPGLAD: A Self-paced Learning-Based Crowdsourcing Classification Model. In: Kang, U., Lim, EP., Yu, J., Moon, YS. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10526. Springer, Cham. https://doi.org/10.1007/978-3-319-67274-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-67274-8_17
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67273-1
Online ISBN: 978-3-319-67274-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics