Quality Control for Hierarchical Classification with Incomplete Annotations

Enomoto, Masafumi; Takeoka, Kunihiro; Dong, Yuyang; Oyamada, Masafumi; Okadome, Takeshi

doi:10.1007/978-3-030-75768-7_18

Masafumi Enomoto¹⁵,
Kunihiro Takeoka¹⁶,
Yuyang Dong¹⁶,
Masafumi Oyamada¹⁶ &
…
Takeshi Okadome¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12714))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1528 Accesses

Abstract

Hierarchical classification requires annotations with hierarchical class structures. Although crowdsourcing services are inexpensive ways to collect annotations for hierarchical classification, the results are often incomplete because of the workers’ limited abilities that unable to label all classes, and crowdsourcing platforms also allow suspensions during the labeling flow. Unfortunately, existing quality control approaches for refining low-quality annotations discard those incomplete annotations, and this limits the quality improvement of the results. We propose a quality control method for hierarchical classification that leverages incomplete annotations and the similarity between classes in the hierarchy for estimating the true leaf classes. Our method probabilistically models the labeling process and estimates the true leaf classes by considering the class-likelihood of samples and workers’ class-dependent expertise. Our method embeds the class hierarchy into a latent space and represents samples as well as the worker’s prototypical samples for classes (prototypes) as vectors in this space. The similarities between the vectors in the latent space are used to estimate the true leaf classes. The experimental results on both real-world and synthetic datasets demonstrate the effectiveness of our method and its superiority over the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Uncovering the Latent Structures of Crowd Labeling

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Learning from crowds with sparse and imbalanced annotations

Article 14 June 2022

Notes

1.
https://www.mturk.com/.

References

Brecheisen, S., Kriegel, H.P., Kunath, P., Pryakhin, A.: Hierarchical genre classification for large music collections. In: ICME, pp. 1385–1388. IEEE (2006)
Google Scholar
Cox, M.A.A., Cox, T.F.: Multidimensional scaling. In: Handbook of Data Visualization, pp. 315–347. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Daniel, F., et al.: Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput. Surv. 51, 1–40 (2018)
Article Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 20–28 (1979)
Google Scholar
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Google Scholar
Kiritchenko, S., Matwin, S., Famili, A.F.: Functional annotation of genes using hierarchical text categorization. In: Proceedings of the ACL Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics (2005)
Google Scholar
Li, S.-Y., Jiang, Y.: Multi-label crowdsourcing learning with incomplete annotations. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 232–245. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_18
Chapter Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Otani, N., Baba, Y., Kashima, H.: Quality control for crowdsourced hierarchical classification. In: ICDM, pp. 937–942. IEEE (2015)
Google Scholar
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)
Article MathSciNet Google Scholar
Tu, J., et al.: Multi-label answer aggregation based on joint matrix factorization. In: ICDM, pp. 517–526. IEEE (2018)
Google Scholar
Whitehill, J., et al.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NeurIPS, pp. 2035–2043 (2009)
Google Scholar
Yan, Y., Huang, S.: Cost-effective active learning for hierarchical multi-label classification. In: IJCAI, pp. 2962–2968 (2018)
Google Scholar
Zheng, Y., et al.: Truth inference in crowdsourcing: is the problem solved? Proc. VLDB Endow. 10(5), 541–552 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Technology, Kwansei Gakuin University, Sanda, Hyogo, Japan
Masafumi Enomoto & Takeshi Okadome
NEC Corporation, Tokyo, Japan
Kunihiro Takeoka, Yuyang Dong & Masafumi Oyamada

Authors

Masafumi Enomoto
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiro Takeoka
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Masafumi Oyamada
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Okadome
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masafumi Enomoto .

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Enomoto, M., Takeoka, K., Dong, Y., Oyamada, M., Okadome, T. (2021). Quality Control for Hierarchical Classification with Incomplete Annotations. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-75768-7_18
Published: 08 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75767-0
Online ISBN: 978-3-030-75768-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Quality Control for Hierarchical Classification with Incomplete Annotations

Abstract

Access this chapter

Similar content being viewed by others

Uncovering the Latent Structures of Crowd Labeling

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Learning from crowds with sparse and imbalanced annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Quality Control for Hierarchical Classification with Incomplete Annotations

Abstract

Access this chapter

Similar content being viewed by others

Uncovering the Latent Structures of Crowd Labeling

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Learning from crowds with sparse and imbalanced annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation