Memorization in Deep Neural Networks: Does the Loss Function Matter?

Patel, Deep; Sastry, P. S.

doi:10.1007/978-3-030-75765-6_11

Deep Patel¹⁵ &
P. S. Sastry¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12713))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2549 Accesses
2 Citations

Abstract

Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether choice of loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function as opposed to either cross entropy or squared error loss results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arpit, D., et al.: A closer look at memorization in deep networks. In: ICML (2017)
Google Scholar
Demirkaya, A., Chen, J., Oymak, S.: Exploring the role of loss functions in multiclass classification. In: 2020 54th Annual Conference on Information Sciences and Systems (CISS), pp. 1–5 (2020)
Google Scholar
Feldman, V.: Does learning require memorization? a short tale about a long tail. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 954–959 (2020)
Google Scholar
Feldman, V., Zhang, C.: What neural networks memorize and why: discovering the long tail via influence estimation. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Ghosh, A., Kumar, H., Sastry, P.: Robust loss functions under label noise for deep neural networks. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1919–1925 (2017)
Google Scholar
Gu, J., Tresp, V.: Neural network memorization dissection (2019)
Google Scholar
Hornik, K., Stinchcombe, M., White, H., et al.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Hui, L., Belkin, M.: Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images. Ph.D. thesis, University of Toronto (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Kumar, H., Sastry, P.S.: Robust loss functions for learning multi-class classifiers. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 687–692 (2018)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)
Article Google Scholar
Muthukumar, V., Narang, A., Subramanian, V., Belkin, M., Hsu, D., Sahai, A.: Classification vs regression in overparameterized regimes: Does the loss function matter? (2020)
Google Scholar
Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Que, Q., Belkin, M.: Back to the future: radial basis function networks revisited. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 51, pp. 1375–1383. PMLR, Cadiz (2016)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet MATH Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Rifkin, R.M.: Everything old is new again: a fresh look at historical approaches in machine learning. Ph.D. thesis, Massachussets Insitute of Technology (2002)
Google Scholar
Shu, J., et al.: Meta-weight-net: Learning an explicit mapping for sample weighting. In: Advances in Neural Information Processing Systems, pp. 1919–1930 (2019)
Google Scholar
Wang, W., et al.: StructBERT: incorporating language structures into pre-training for deep language understanding (2019)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Science, Bangalore, 560012, India
Deep Patel & P. S. Sastry

Authors

Deep Patel
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deep Patel .

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patel, D., Sastry, P.S. (2021). Memorization in Deep Neural Networks: Does the Loss Function Matter?. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-75765-6_11
Published: 08 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics