Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder

Zhang, Weining; Wang, Dong; Tan, Xiaoyang

doi:10.1007/978-3-319-92537-0_30

Weining Zhang¹⁷,
Dong Wang¹⁷ &
Xiaoyang Tan^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10878))

Included in the following conference series:

International Symposium on Neural Networks

3908 Accesses
2 Citations

Abstract

We present a simple but effective method for data cleaning and classification in the presence of label noise. The key idea is to treat the data points with label noise as outliers of the class indicated by the corresponding noisy label. However, finding such dubious observations is challenging in general. We therefore propose to reduce their potential influence using feature learning method by class-specific autoencoder. Particularly, we learn for each class a feature space using all the samples labeled as that class, including those with noisy labels. Furthermore, in the case of high label noise, we propose a weighted class-specific autoencoder by considering the effect of each data point. To fully exploit the advantage of the learned feature space, we use a minimum reconstruction error based method for testing. Experiments on several datasets show that the proposed method achieves state-of-the-art performance on the related tasks with noisy labels.

This work is partially supported by National Science Foundation of China (61672280, 61373060, 61732006), Jiangsu 333 Project (BRA2017377) and Qing Lan Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krishna, R.A., Hata, K., Chen, S., Kravitz, J., Shamma, D.A., Fei-Fei, L., Bernstein, M.S.: Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 3167–3179. ACM (2016)
Google Scholar
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, pp. 64–67. ACM (2010)
Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Jeatrakul, P., Wong, K.W., Fung, C.C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inform. 14(3), 297–302 (2010)
Article Google Scholar
Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_19
Chapter Google Scholar
Fefilatyev, S., Shreve, M., Kramer, K., Hall, L., Goldgof, D., Kasturi, R., Daly, K., Remsen, A., Bunke, H.: Label-noise reduction with support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3504–3508. IEEE (2012)
Google Scholar
Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 447–461 (2016)
Article MathSciNet Google Scholar
Wang, D., Tan, X.: Robust distance metric learning via bayesian inference. IEEE Trans. Image Process. 27(3), 1542–1553 (2018)
Article MathSciNet Google Scholar
Wang, D., Tan, X.: Bayesian neighborhood component analysis. IEEE Transactions on Neural Networks and Learning Systems (2017)
Google Scholar
Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Chapter Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Wang, D., Tan, X.: Label-denoising auto-encoder for classification with inaccurate supervision information. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3648–3653. IEEE (2014)
Google Scholar
Wang, D., Tan, X.: Robust distance metric learning in the presence of label noise. In: AAAI, pp. 1321–1327 (2014)
Google Scholar
Rebbapragada, U.D.: Strategic targeting of outliers for expert review. Ph.D. thesis, Tufts University (2010)
Google Scholar
Ekambaram, R., Fefilatyev, S., Shreve, M., Kramer, K., Hall, L.O., Goldgof, D.B., Kasturi, R.: Active cleaning of label noise. Pattern Recogn. 51, 463–480 (2016)
Article Google Scholar
Qian, Q., Hu, J., Jin, R., Pei, J., Zhu, S.: Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–332. ACM (2014)
Google Scholar
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005)
Article Google Scholar
Wang, H., Nie, F., Huang, H.: Robust distance metric learning via simultaneous l1-norm minimization and maximization. In: International Conference on Machine Learning, pp. 1836–1844 (2014)
Google Scholar
Yang, L., Jin, R., Sukthankar, R.: Bayesian active distance metric learning. arXiv preprint arXiv:1206.5283 (2012)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
Weining Zhang, Dong Wang & Xiaoyang Tan
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210016, China
Xiaoyang Tan

Authors

Weining Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyang Tan .

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Doha, Qatar
Tingwen Huang
Sichuan University, Chengdu, China
Jiancheng Lv
Southeast University, Nanjing, China
Changyin Sun
United Institute of Informatics Problems, Minsk, Belarus
Alexander V. Tuzikov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Wang, D., Tan, X. (2018). Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-92537-0_30
Published: 26 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92536-3
Online ISBN: 978-3-319-92537-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder