Skip to main content

Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2018 (ISNN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10878))

Included in the following conference series:

Abstract

We present a simple but effective method for data cleaning and classification in the presence of label noise. The key idea is to treat the data points with label noise as outliers of the class indicated by the corresponding noisy label. However, finding such dubious observations is challenging in general. We therefore propose to reduce their potential influence using feature learning method by class-specific autoencoder. Particularly, we learn for each class a feature space using all the samples labeled as that class, including those with noisy labels. Furthermore, in the case of high label noise, we propose a weighted class-specific autoencoder by considering the effect of each data point. To fully exploit the advantage of the learned feature space, we use a minimum reconstruction error based method for testing. Experiments on several datasets show that the proposed method achieves state-of-the-art performance on the related tasks with noisy labels.

This work is partially supported by National Science Foundation of China (61672280, 61373060, 61732006), Jiangsu 333 Project (BRA2017377) and Qing Lan Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krishna, R.A., Hata, K., Chen, S., Kravitz, J., Shamma, D.A., Fei-Fei, L., Bernstein, M.S.: Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 3167–3179. ACM (2016)

    Google Scholar 

  2. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, pp. 64–67. ACM (2010)

    Google Scholar 

  3. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

    Article  Google Scholar 

  4. Jeatrakul, P., Wong, K.W., Fung, C.C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inform. 14(3), 297–302 (2010)

    Article  Google Scholar 

  5. Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_19

    Chapter  Google Scholar 

  6. Fefilatyev, S., Shreve, M., Kramer, K., Hall, L., Goldgof, D., Kasturi, R., Daly, K., Remsen, A., Bunke, H.: Label-noise reduction with support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3504–3508. IEEE (2012)

    Google Scholar 

  7. Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 447–461 (2016)

    Article  MathSciNet  Google Scholar 

  8. Wang, D., Tan, X.: Robust distance metric learning via bayesian inference. IEEE Trans. Image Process. 27(3), 1542–1553 (2018)

    Article  MathSciNet  Google Scholar 

  9. Wang, D., Tan, X.: Bayesian neighborhood component analysis. IEEE Transactions on Neural Networks and Learning Systems (2017)

    Google Scholar 

  10. Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8

    Chapter  Google Scholar 

  11. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  12. Wang, D., Tan, X.: Label-denoising auto-encoder for classification with inaccurate supervision information. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3648–3653. IEEE (2014)

    Google Scholar 

  13. Wang, D., Tan, X.: Robust distance metric learning in the presence of label noise. In: AAAI, pp. 1321–1327 (2014)

    Google Scholar 

  14. Rebbapragada, U.D.: Strategic targeting of outliers for expert review. Ph.D. thesis, Tufts University (2010)

    Google Scholar 

  15. Ekambaram, R., Fefilatyev, S., Shreve, M., Kramer, K., Hall, L.O., Goldgof, D.B., Kasturi, R.: Active cleaning of label noise. Pattern Recogn. 51, 463–480 (2016)

    Article  Google Scholar 

  16. Qian, Q., Hu, J., Jin, R., Pei, J., Zhu, S.: Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–332. ACM (2014)

    Google Scholar 

  17. Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005)

    Article  Google Scholar 

  18. Wang, H., Nie, F., Huang, H.: Robust distance metric learning via simultaneous l1-norm minimization and maximization. In: International Conference on Machine Learning, pp. 1836–1844 (2014)

    Google Scholar 

  19. Yang, L., Jin, R., Sukthankar, R.: Bayesian active distance metric learning. arXiv preprint arXiv:1206.5283 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyang Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, W., Wang, D., Tan, X. (2018). Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92537-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92536-3

  • Online ISBN: 978-3-319-92537-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics