Abstract
We present a simple but effective method for data cleaning and classification in the presence of label noise. The key idea is to treat the data points with label noise as outliers of the class indicated by the corresponding noisy label. However, finding such dubious observations is challenging in general. We therefore propose to reduce their potential influence using feature learning method by class-specific autoencoder. Particularly, we learn for each class a feature space using all the samples labeled as that class, including those with noisy labels. Furthermore, in the case of high label noise, we propose a weighted class-specific autoencoder by considering the effect of each data point. To fully exploit the advantage of the learned feature space, we use a minimum reconstruction error based method for testing. Experiments on several datasets show that the proposed method achieves state-of-the-art performance on the related tasks with noisy labels.
This work is partially supported by National Science Foundation of China (61672280, 61373060, 61732006), Jiangsu 333 Project (BRA2017377) and Qing Lan Project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krishna, R.A., Hata, K., Chen, S., Kravitz, J., Shamma, D.A., Fei-Fei, L., Bernstein, M.S.: Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 3167–3179. ACM (2016)
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, pp. 64–67. ACM (2010)
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Jeatrakul, P., Wong, K.W., Fung, C.C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inform. 14(3), 297–302 (2010)
Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_19
Fefilatyev, S., Shreve, M., Kramer, K., Hall, L., Goldgof, D., Kasturi, R., Daly, K., Remsen, A., Bunke, H.: Label-noise reduction with support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3504–3508. IEEE (2012)
Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 447–461 (2016)
Wang, D., Tan, X.: Robust distance metric learning via bayesian inference. IEEE Trans. Image Process. 27(3), 1542–1553 (2018)
Wang, D., Tan, X.: Bayesian neighborhood component analysis. IEEE Transactions on Neural Networks and Learning Systems (2017)
Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Wang, D., Tan, X.: Label-denoising auto-encoder for classification with inaccurate supervision information. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3648–3653. IEEE (2014)
Wang, D., Tan, X.: Robust distance metric learning in the presence of label noise. In: AAAI, pp. 1321–1327 (2014)
Rebbapragada, U.D.: Strategic targeting of outliers for expert review. Ph.D. thesis, Tufts University (2010)
Ekambaram, R., Fefilatyev, S., Shreve, M., Kramer, K., Hall, L.O., Goldgof, D.B., Kasturi, R.: Active cleaning of label noise. Pattern Recogn. 51, 463–480 (2016)
Qian, Q., Hu, J., Jin, R., Pei, J., Zhu, S.: Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–332. ACM (2014)
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005)
Wang, H., Nie, F., Huang, H.: Robust distance metric learning via simultaneous l1-norm minimization and maximization. In: International Conference on Machine Learning, pp. 1836–1844 (2014)
Yang, L., Jin, R., Sukthankar, R.: Bayesian active distance metric learning. arXiv preprint arXiv:1206.5283 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhang, W., Wang, D., Tan, X. (2018). Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-92537-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92536-3
Online ISBN: 978-3-319-92537-0
eBook Packages: Computer ScienceComputer Science (R0)