Skip to main content

Neural Spectrum Alignment: Empirical Study

  • Conference paper
  • First Online:
Book cover Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Abstract

Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK) [9]. In the majority of works this kernel is considered to be time-invariant [9, 13]. In contrast, we empirically explore these properties along the optimization and show that in practice top eigenfunctions of NTK align toward the target function learned by NN which improves the overall optimization performance. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, we study how learning rate decay affects the neural spectrum. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In some papers [17] FIM is also referred to as a Hessian of NN, due to the tight relation between \(F_t\) and the Hessian of the loss (see Appendix B for details).

  2. 2.

    Related code can be accessed via a repository https://bit.ly/2kGVHhG.

  3. 3.

    Trend \(\lambda _{max}^t \rightarrow \frac{2 N}{\delta _t}\) was consistent in FC NNs for a wide range of initial learning rates, number of layers and neurons, and various datasets (see Appendix [12]), making it an interesting venue for a future theoretical investigation

References

  1. Arora, S., Du, S.S., Hu, W., Li, Z., Wang, R.: Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv preprint arXiv:1901.08584 (2019)

  2. Basri, R., Jacobs, D., Kasten, Y., Kritchman, S.: The convergence rate of neural networks for learned functions of different frequencies. arXiv preprint arXiv:1906.00425 (2019)

  3. Dou, X., Liang, T.: Training neural networks as learning data-adaptive kernels: Provable representation and approximation benefits. arXiv preprint arXiv:1901.07114 (2019)

  4. Dyer, E., Gur-Ari, G.: Asymptotics of wide networks from Feynman diagrams. arXiv preprint arXiv:1909.11304 (2019)

  5. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  6. Gur-Ari, G., Roberts, D.A., Dyer, E.: Gradient descent happens in a tiny subspace. arXiv preprint arXiv:1812.04754 (2018)

  7. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)

    Google Scholar 

  8. Huang, J., Yau, H.T.: Dynamics of deep neural networks and neural tangent hierarchy. arXiv preprint arXiv:1909.08156 (2019)

  9. Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: Convergence and generalization in neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 8571–8580 (2018)

    Google Scholar 

  10. Karakida, R., Akaho, S., Amari, S.I.: Universal statistics of fisher information in deep neural networks: mean field approach. arXiv preprint arXiv:1806.01316 (2018)

  11. Kopitkov, D., Indelman, V.: General probabilistic surface optimization and log density estimation. arXiv preprint arXiv:1903.10567 (2019)

  12. Kopitkov, D., Indelman, V.: Neural spectrum alignment: empirical study - appendix. https://bit.ly/3aipgtl (2019)

  13. Lee, J., Xiao, L., Schoenholz, S.S., Bahri, Y., Sohl-Dickstein, J., Pennington, J.: Wide neural networks of any depth evolve as linear models under gradient descent. arXiv preprint arXiv:1902.06720 (2019)

  14. Ollivier, Y.: Riemannian metrics for neural networks I: feedforward networks. Inf. Infer.: J. IMA 4(2), 108–153 (2015)

    MathSciNet  MATH  Google Scholar 

  15. Oymak, S., Fabian, Z., Li, M., Soltanolkotabi, M.: Generalization guarantees for neural networks via harnessing the low-rank structure of the Jacobian. arXiv preprint arXiv:1906.05392 (2019)

  16. Rahaman, N., et al.: On the spectral bias of neural networks. arXiv preprint arXiv:1806.08734 (2018)

  17. Sagun, L., Evci, U., Guney, V.U., Dauphin, Y., Bottou, L.: Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454 (2017)

  18. Woodworth, B., Gunasekar, S., Lee, J., Soudry, D., Srebro, N.: Kernel and deep regimes in overparametrized models. arXiv preprint arXiv:1906.05827 (2019)

  19. Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. arXiv preprint arXiv:1612.05533 (2016)

Download references

Acknowledgments

The authors thank Daniel Soudry and Dar Gilboa for discussions on dynamics of a Neural Tangent Kernel (NTK). This work was supported in part by the Israel Ministry of Science & Technology (MOST) and Intel Corporation. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU, which, among other GPUs, was used for this research.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dmitry Kopitkov or Vadim Indelman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kopitkov, D., Indelman, V. (2020). Neural Spectrum Alignment: Empirical Study. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics