Abstract
Hierarchical classification requires annotations with hierarchical class structures. Although crowdsourcing services are inexpensive ways to collect annotations for hierarchical classification, the results are often incomplete because of the workers’ limited abilities that unable to label all classes, and crowdsourcing platforms also allow suspensions during the labeling flow. Unfortunately, existing quality control approaches for refining low-quality annotations discard those incomplete annotations, and this limits the quality improvement of the results. We propose a quality control method for hierarchical classification that leverages incomplete annotations and the similarity between classes in the hierarchy for estimating the true leaf classes. Our method probabilistically models the labeling process and estimates the true leaf classes by considering the class-likelihood of samples and workers’ class-dependent expertise. Our method embeds the class hierarchy into a latent space and represents samples as well as the worker’s prototypical samples for classes (prototypes) as vectors in this space. The similarities between the vectors in the latent space are used to estimate the true leaf classes. The experimental results on both real-world and synthetic datasets demonstrate the effectiveness of our method and its superiority over the baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Brecheisen, S., Kriegel, H.P., Kunath, P., Pryakhin, A.: Hierarchical genre classification for large music collections. In: ICME, pp. 1385–1388. IEEE (2006)
Cox, M.A.A., Cox, T.F.: Multidimensional scaling. In: Handbook of Data Visualization, pp. 315–347. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Daniel, F., et al.: Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput. Surv. 51, 1–40 (2018)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 20–28 (1979)
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Kiritchenko, S., Matwin, S., Famili, A.F.: Functional annotation of genes using hierarchical text categorization. In: Proceedings of the ACL Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics (2005)
Li, S.-Y., Jiang, Y.: Multi-label crowdsourcing learning with incomplete annotations. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 232–245. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_18
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Otani, N., Baba, Y., Kashima, H.: Quality control for crowdsourced hierarchical classification. In: ICDM, pp. 937–942. IEEE (2015)
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)
Tu, J., et al.: Multi-label answer aggregation based on joint matrix factorization. In: ICDM, pp. 517–526. IEEE (2018)
Whitehill, J., et al.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NeurIPS, pp. 2035–2043 (2009)
Yan, Y., Huang, S.: Cost-effective active learning for hierarchical multi-label classification. In: IJCAI, pp. 2962–2968 (2018)
Zheng, Y., et al.: Truth inference in crowdsourcing: is the problem solved? Proc. VLDB Endow. 10(5), 541–552 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Enomoto, M., Takeoka, K., Dong, Y., Oyamada, M., Okadome, T. (2021). Quality Control for Hierarchical Classification with Incomplete Annotations. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-75768-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75767-0
Online ISBN: 978-3-030-75768-7
eBook Packages: Computer ScienceComputer Science (R0)