Taming the Cross Entropy Loss

Martinez, Manuel; Stiefelhagen, Rainer

doi:10.1007/978-3-030-12939-2_43

Taming the Cross Entropy Loss

Manuel Martinez¹⁵ &
Rainer Stiefelhagen¹⁵

Conference paper
First Online: 14 February 2019

3862 Accesses
16 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11269))

Abstract

We present the Tamed Cross Entropy (TCE) loss function, a robust derivative of the standard Cross Entropy (CE) loss used in deep learning for classification tasks. However, unlike other robust losses, the TCE loss is designed to exhibit the same training properties than the CE loss in noiseless scenarios. Therefore, the TCE loss requires no modification on the training regime compared to the CE loss and, in consequence, can be applied in all applications where the CE loss is currently used. We evaluate the TCE loss using the ResNet architecture on four image datasets that we artificially contaminated with various levels of label noise. The TCE loss outperforms the CE loss in every tested scenario.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cobb, A.D., Roberts, S.J., Gal, Y.: Loss-calibrated approximate inference in Bayesian neural networks. arXiv preprint arXiv:1805.03901 (2018)
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)
Google Scholar
Flatow, D., Penner, D.: On the robustness of ConvNets to training on noisy labels. Technical Report. Stanford University (2017)
Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. In: Advances in Neural Information Processing Systems (NIPS)
Google Scholar
Ghosh, A., Kumar, H., Sastry, P.: Robust loss functions under label noise for deep neural networks. In: Association for the Advancement of Artificial Intelligence, AAAI (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, ICCV (2015)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, ICML (2017)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Conference on Computer Vision and Pattern Recognition, CVPR (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar
Huber, P.J., et al.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)
Article MathSciNet Google Scholar
Jindal, I., Nokleby, M., Chen, X.: Learning deep networks from noisy labels with dropout regularization. In: International Conference Data Mining, ICDM (2016)
Google Scholar
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comput. Vis. Image Underst. (CVIU) 117, 479–492 (2013)
Article Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lezama, J., Qiu, Q., Musé, P., Sapiro, G.: OLE: orthogonal low-rank embedding, a plug and play geometric loss for deep learning. In: Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Prakash, A., Moran, N., Garber, S., DiLillo, A., Storer, J.: Protecting JPEG images against adversarial attacks. In: Data Compression Conference, DCC (2018)
Google Scholar
Rolnick, D., Veit, A., Belongie, S., Shavit, N.: Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694 (2017)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition, CVPR (2015)
Google Scholar
Tewari, A., Bartlett, P.L.: On the consistency of multiclass classification methods. J. Mach. Learn. Res. 8, 1007–1025 (2007)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology, Karlsruhe, Germany
Manuel Martinez & Rainer Stiefelhagen

Authors

Manuel Martinez
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Stiefelhagen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Martinez .

Editor information

Editors and Affiliations

University of Freiburg, Freiburg im Breisgau, Baden-Württemberg, Germany
Thomas Brox
University of Stuttgart, Stuttgart, Baden-Württemberg, Germany
Andrés Bruhn
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Mario Fritz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martinez, M., Stiefelhagen, R. (2019). Taming the Cross Entropy Loss. In: Brox, T., Bruhn, A., Fritz, M. (eds) Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science(), vol 11269. Springer, Cham. https://doi.org/10.1007/978-3-030-12939-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-12939-2_43
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12938-5
Online ISBN: 978-3-030-12939-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics