Abstract
Improving generalization is one of the main challenges for training deep neural networks on classification tasks. In particular, a number of techniques have been proposed, aiming to boost the performance on unseen data: from standard data augmentation techniques to the \(\ell _2\) regularization, dropout, batch normalization, entropy-driven SGD and many more.
In this work we propose an elegant, simple and principled approach: post-synaptic potential regularization (PSP). We tested this regularization on a number of different state-of-the-art scenarios. Empirical results show that PSP achieves a classification error comparable to more sophisticated learning strategies in the MNIST scenario, while improves the generalization compared to \(\ell _2\) regularization in deep architectures trained on CIFAR-10.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
All the source code is publicly available at https://github.com/enzotarta/PSP.
References
Calimeri, F., Marzullo, A., Stamile, C., Terracina, G.: Biomedical data augmentation using generative adversarial neural networks. In: Lintas, A., Rovetta, S., Verschure, P.F.M.J., Villa, A.E.P. (eds.) ICANN 2017. LNCS, vol. 10614, pp. 626–634. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_71
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). https://doi.org/10.1007/978-1-4615-5529-2_5
Chaudhari, P., et al.: Entropy-SGD: biasing gradient descent into wide valleys. arXiv preprint arXiv:1611.01838 (2016)
Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(9), 1469–1477 (2015). https://doi.org/10.1109/icassp.2014.6854671
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. arXiv preprint arXiv:1808.05377 (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Goldberg, P.W., Williams, C.K., Bishop, C.M.: Regression with input-dependent noise: a Gaussian process treatment. In: Advances in Neural Information Processing Systems, pp. 493–499 (1998)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). https://doi.org/10.1007/s10710-017-9314-z
Gulcehre, C., Moczulski, M., Visin, F., Bengio, Y.: Mollifying networks. arXiv preprint arXiv:1608.04980 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/cvpr.2016.90
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kukacka, J., Golkov, V., Cremers, D.: Regularization for deep learning: a taxonomy. CoRR abs/1710.10686 (2017). http://arxiv.org/abs/1710.10686
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lin, H.W., Tegmark, M., Rolnick, D.: Why does deep and cheap learning work so well? J. Stat. Phys. 168(6), 1223–1247 (2017). https://doi.org/10.1007/s10955-017-1836-5
Maclaurin, D., Duvenaud, D., Adams, R.P.: Autograd: effortless gradients in numpy. In: ICML 2015 AutoML Workshop (2015)
Prechelt, L.: Automatic early stopping using cross validation: quantifying the criteria. Neural Networks 11(4), 761–767 (1998). https://doi.org/10.1016/s0893-6080(98)00010-0
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
Reyes, O., Ventura, S.: Performing multi-target regression via a parameter sharing-based deep network. Int. J. Neural Syst. (2019). https://doi.org/10.1142/S012906571950014X
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/cvpr.2018.00474
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Tartaglione, E., Lepsøy, S., Fiandrotti, A., Francini, G.: Learning sparse neural networks via sensitivity-driven regularization. In: Advances in Neural Information Processing Systems, pp. 3882–3892 (2018)
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR. 39, 195–198 (1943)
Wager, S., Wang, S., Liang, P.S.: Dropout training as adaptive regularization. In: Advances in Neural Information Processing Systems, pp. 351–359 (2013)
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using DropConnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Back-propagation, weight-elimination and time series prediction. In: Connectionist Models, pp. 105–116. Elsevier (1991). https://doi.org/10.1016/B978-1-4832-1448-1.50016-0
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tartaglione, E., Perlo, D., Grangetto, M. (2019). Post-synaptic Potential Regularization Has Potential. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)