Skip to main content

Taming the Cross Entropy Loss

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11269))

Abstract

We present the Tamed Cross Entropy (TCE) loss function, a robust derivative of the standard Cross Entropy (CE) loss used in deep learning for classification tasks. However, unlike other robust losses, the TCE loss is designed to exhibit the same training properties than the CE loss in noiseless scenarios. Therefore, the TCE loss requires no modification on the training regime compared to the CE loss and, in consequence, can be applied in all applications where the CE loss is currently used. We evaluate the TCE loss using the ResNet architecture on four image datasets that we artificially contaminated with various levels of label noise. The TCE loss outperforms the CE loss in every tested scenario.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cobb, A.D., Roberts, S.J., Gal, Y.: Loss-calibrated approximate inference in Bayesian neural networks. arXiv preprint arXiv:1805.03901 (2018)

  2. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)

    Google Scholar 

  3. Flatow, D., Penner, D.: On the robustness of ConvNets to training on noisy labels. Technical Report. Stanford University (2017)

    Google Scholar 

  4. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. In: Advances in Neural Information Processing Systems (NIPS)

    Google Scholar 

  5. Ghosh, A., Kumar, H., Sastry, P.: Robust loss functions under label noise for deep neural networks. In: Association for the Advancement of Artificial Intelligence, AAAI (2017)

    Google Scholar 

  6. Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, ICCV (2015)

    Google Scholar 

  7. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, ICML (2017)

    Google Scholar 

  8. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Conference on Computer Vision and Pattern Recognition, CVPR (2006)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, CVPR (2016)

    Google Scholar 

  10. Huber, P.J., et al.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)

    Article  MathSciNet  Google Scholar 

  11. Jindal, I., Nokleby, M., Chen, X.: Learning deep networks from noisy labels with dropout regularization. In: International Conference Data Mining, ICDM (2016)

    Google Scholar 

  12. Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comput. Vis. Image Underst. (CVIU) 117, 479–492 (2013)

    Article  Google Scholar 

  13. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Lezama, J., Qiu, Q., Musé, P., Sapiro, G.: OLE: orthogonal low-rank embedding, a plug and play geometric loss for deep learning. In: Conference on Computer Vision and Pattern Recognition, CVPR (2018)

    Google Scholar 

  16. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  17. Prakash, A., Moran, N., Garber, S., DiLillo, A., Storer, J.: Protecting JPEG images against adversarial attacks. In: Data Compression Conference, DCC (2018)

    Google Scholar 

  18. Rolnick, D., Veit, A., Belongie, S., Shavit, N.: Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694 (2017)

  19. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition, CVPR (2015)

    Google Scholar 

  20. Tewari, A., Bartlett, P.L.: On the consistency of multiclass classification methods. J. Mach. Learn. Res. 8, 1007–1025 (2007)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Martinez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martinez, M., Stiefelhagen, R. (2019). Taming the Cross Entropy Loss. In: Brox, T., Bruhn, A., Fritz, M. (eds) Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science(), vol 11269. Springer, Cham. https://doi.org/10.1007/978-3-030-12939-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-12939-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-12938-5

  • Online ISBN: 978-3-030-12939-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics