Abstract
Annotating a dataset is one of the major bottlenecks in supervised learning tasks, as it can be expensive and time-consuming. Instead, with the development of crowdsourcing services, it has become easy and fast to collect labels from multiple annotators. Our contribution in this paper is to propose a Bayesian probabilistic approach integrating annotator’s uncertainty in the task of learning from multiple noisy annotators (annotators who generate errors). Furthermore, unlike previous work, our proposed approach is directly formulated to handle categorical labels. This is an important point as real-world datasets often have multiple classes available. Extensive experiments on datasets validate the effectiveness of our approach against previous efficient algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.: Uci machine learning repository (2007)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Mathematical and Physical Sciences, 453–461 (1946)
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast - but is it good? evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics, Stroudsburg (2008)
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)
Yan, Y., Rosales, R., Fung, G., Schmidt, M.W., Valadez, G.H., Bogoni, L., Moy, L., Dy, J.G.: Modeling annotator expertise: Learning when everybody knows a bit of something. Journal of Machine Learning Research - Proceedings Track, 932–939 (2010)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)
Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proceedings of the First IEEE Worshop on Internet Vision at CVPR 2008, pp. 254–263 (2008)
Goetghebeur, E., Molenberghs, G., Kenward, M.G.: Sense and sensitivity when intended data are missing. Kwantitatieve Technieken 62, 79–94 (1999)
Wolley, C., Quafafou, M.: Learning from multiple naive annotators. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS, vol. 7713, pp. 173–185. Springer, Heidelberg (2012)
Hand, D.J., Till, R.J.: A simple generalisation of the Area Under the ROC Curve for multiple class classification problems. Mach. Learn. 45, 171–186 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wolley, C., Quafafou, M. (2013). Multiclass Learning from Multiple Uncertain Annotations. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-41398-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41397-1
Online ISBN: 978-3-642-41398-8
eBook Packages: Computer ScienceComputer Science (R0)