Abstract
We propose to formulate the training of neural networks with side optimization goals, such as obtaining structured weight matrices, as lexicographic optimization problem. The lexicographic order can be maintained during training by optimizing the side-optimization goal exclusively in the null space of batch activations. We call the resulting training method Safe Regularization, because the side optimization goal can be safely integrated into the training with limited influence on the main optimization goal. Moreover, this results in a higher robustness regarding the choice of regularization hyperparameters. We validate our training method with multiple real-world regression data sets with the side-optimization goal of obtaining sparse weight matrices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Belsley, D.A., Kuh, E., Welsch, R.: Regression diagnostics: identifying influential data and sources of collinearity (1980)
Cheng, Y., Felix, X.Y., Feris, R.S., Kumar, S., Choudhary, A., Chang, S.F.: Fast neural networks with circulant projections. arXiv preprint arXiv:1502.03436 (2015)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)
Feppon, F., Allaire, G., Dapogny, C.: Null space gradient flows for constrained optimization with applications to shape optimization (2019)
Gerritsma, J., Onnink, R., Versluis, A.: Geometry, resistance and stability of the delft systematic yacht hull series. Int. Shipbuilding Prog. 28(328), 276–297 (1981)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (2010)
Goggin, S.D., Gustafson, K.E., Johnson, K.M.: Accessing the null space with nonlinear multilayer neural networks. In: Science of Artificial Neural Networks (1992)
Goldstein, T., Studer, C., Baraniuk, R.: A field guide to forward-backward splitting with a FASTA implementation. arXiv preprint arXiv:1411.3406 (2014)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Harrison Jr., D., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Kukačka, J., Golkov, V., Cremers, D.: Regularization for deep learning: a taxonomy. arXiv preprint arXiv:1710.10686 (2017)
Lopes, M.E.: Estimating unknown sparsity in compressed sensing. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28 (2013)
Ortigosa, I., Lopez, R., Garcia, J.: A neural networks approach to residuary resistance of sailing yachts prediction. In: Proceedings of the International Conference on Marine Engineering MARINE (2007)
Potlapalli, H., Luo, R.C.: Projection learning for self-organizing neural networks. IEEE Trans. Ind. Electron. 43(4), 485–491 (1996)
Redmond, M., Baveja, A.: A data-driven software tool for enabling cooperative information sharing among police departments. Eur. J. Oper. Res. 141(3), 660–678 (2002)
Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017)
Shen, H.: Towards a mathematical understanding of the difficulty in learning with feedforward neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Sindhwani, V., Sainath, T., Kumar, S.: Structured transforms for small-footprint deep learning. In: Advances in Neural Information Processing Systems (2015)
Wang, H., Langley, R., Kim, S., McCord-Snook, E., Wang, H.: Efficient exploration of gradient space for online learning to rank. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018)
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Advances in Neural Information Processing Systems (1991)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems (2016)
Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Xu, K., Cao, T., Shah, S., Maung, C., Schweitzer, H.: Cleaning the null space: a privacy mechanism for predictors. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
Yoon, S.W., Seo, J., Moon, J.: Meta learner with linear nulling. arXiv preprint arXiv:1806.01010 (2018)
Yuan, Y.: A null space algorithm for constrained optimization. In: Advances in Scientific Computing. Science Press, Beijing (2001)
Zhang, C., Patras, P., Haddadi, H.: Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surv. Tutor. 21(3), 2224–2287 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kissel, M., Gottwald, M., Diepold, K. (2020). Neural Network Training with Safe Regularization in the Null Space of Batch Activations. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)