Abstract
Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK) [9]. In the majority of works this kernel is considered to be time-invariant [9, 13]. In contrast, we empirically explore these properties along the optimization and show that in practice top eigenfunctions of NTK align toward the target function learned by NN which improves the overall optimization performance. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, we study how learning rate decay affects the neural spectrum. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In some papers [17] FIM is also referred to as a Hessian of NN, due to the tight relation between \(F_t\) and the Hessian of the loss (see Appendix B for details).
- 2.
Related code can be accessed via a repository https://bit.ly/2kGVHhG.
- 3.
Trend \(\lambda _{max}^t \rightarrow \frac{2 N}{\delta _t}\) was consistent in FC NNs for a wide range of initial learning rates, number of layers and neurons, and various datasets (see Appendix [12]), making it an interesting venue for a future theoretical investigation
References
Arora, S., Du, S.S., Hu, W., Li, Z., Wang, R.: Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv preprint arXiv:1901.08584 (2019)
Basri, R., Jacobs, D., Kasten, Y., Kritchman, S.: The convergence rate of neural networks for learned functions of different frequencies. arXiv preprint arXiv:1906.00425 (2019)
Dou, X., Liang, T.: Training neural networks as learning data-adaptive kernels: Provable representation and approximation benefits. arXiv preprint arXiv:1901.07114 (2019)
Dyer, E., Gur-Ari, G.: Asymptotics of wide networks from Feynman diagrams. arXiv preprint arXiv:1909.11304 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Gur-Ari, G., Roberts, D.A., Dyer, E.: Gradient descent happens in a tiny subspace. arXiv preprint arXiv:1812.04754 (2018)
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)
Huang, J., Yau, H.T.: Dynamics of deep neural networks and neural tangent hierarchy. arXiv preprint arXiv:1909.08156 (2019)
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: Convergence and generalization in neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 8571–8580 (2018)
Karakida, R., Akaho, S., Amari, S.I.: Universal statistics of fisher information in deep neural networks: mean field approach. arXiv preprint arXiv:1806.01316 (2018)
Kopitkov, D., Indelman, V.: General probabilistic surface optimization and log density estimation. arXiv preprint arXiv:1903.10567 (2019)
Kopitkov, D., Indelman, V.: Neural spectrum alignment: empirical study - appendix. https://bit.ly/3aipgtl (2019)
Lee, J., Xiao, L., Schoenholz, S.S., Bahri, Y., Sohl-Dickstein, J., Pennington, J.: Wide neural networks of any depth evolve as linear models under gradient descent. arXiv preprint arXiv:1902.06720 (2019)
Ollivier, Y.: Riemannian metrics for neural networks I: feedforward networks. Inf. Infer.: J. IMA 4(2), 108–153 (2015)
Oymak, S., Fabian, Z., Li, M., Soltanolkotabi, M.: Generalization guarantees for neural networks via harnessing the low-rank structure of the Jacobian. arXiv preprint arXiv:1906.05392 (2019)
Rahaman, N., et al.: On the spectral bias of neural networks. arXiv preprint arXiv:1806.08734 (2018)
Sagun, L., Evci, U., Guney, V.U., Dauphin, Y., Bottou, L.: Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454 (2017)
Woodworth, B., Gunasekar, S., Lee, J., Soudry, D., Srebro, N.: Kernel and deep regimes in overparametrized models. arXiv preprint arXiv:1906.05827 (2019)
Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. arXiv preprint arXiv:1612.05533 (2016)
Acknowledgments
The authors thank Daniel Soudry and Dar Gilboa for discussions on dynamics of a Neural Tangent Kernel (NTK). This work was supported in part by the Israel Ministry of Science & Technology (MOST) and Intel Corporation. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU, which, among other GPUs, was used for this research.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kopitkov, D., Indelman, V. (2020). Neural Spectrum Alignment: Empirical Study. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)