Skip to main content

Generic Representation of Neural Networks

  • Chapter
  • First Online:
Deep Neural Networks in a Mathematical Framework

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

In the previous chapter, we took the first step towards creating a standard mathematical framework for neural networks by developing mathematical tools for vector-valued functions and their derivatives. We use these tools in this chapter to describe the operations employed in a generic layered neural network. Since neural networks have been empirically shown to reap performance benefits from stacking increasingly more layers in succession, it is important to develop a solid and concise theory for representing repeated function composition as it pertains to neural networks, and we see how this can be done in this chapter. Furthermore, since neural networks often learn their parameters via some form of gradient descent, we also compute derivatives of these functions with respect to the parameters at each layer. The derivative maps that we compute remain in the same vector space as the parameters, which allow us to perform gradient descent naturally over these vector spaces for each parameter. This approach contrasts with standard approaches to neural network modelling where the parameters are broken down into their components. We can avoid this unnecessary operation using the framework that we describe. We begin this chapter by formulating a generic neural network as the composition of parameter-dependent functions. We then introduce standard loss functions based on this composition for both the regression and classification cases, and take their derivatives with respect to the parameters at each layer. There are some commonalities between these two cases that we explore. In particular, both employ the same form of error backpropagation, albeit with a slightly differing initialization. We are able to express backpropagation in terms of adjoints of derivative maps over generic vector spaces, which has not been explored before. We then outline a concise algorithm for computing derivatives of the loss functions with respect to their parameters directly over the vector space in which the parameters are defined. This helps to clarify the theoretical results presented. We also model a higher-order loss function that imposes a penalty on the derivative towards the end of this chapter. This demonstrates one way to extend the framework that we have developed to a more complicated loss function and also demonstrates its flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Classification will not be explicitly considered in this section but it is not a difficult extension.

  2. 2.

    \(\hat y_C - y\) in the case of classification.

References

  1. A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)

    Google Scholar 

  2. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778

    Google Scholar 

  3. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  4. A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in NIPS, vol. 4 (1991), pp. 950–957

    Google Scholar 

  5. E. Nering, Linear Algebra and Matrix Theory (Wiley, Hoboken, 1970)

    MATH  Google Scholar 

  6. S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011), pp. 833–840

    Google Scholar 

  7. S. Ruder, An overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016, preprint)

    Google Scholar 

  8. P. Simard, B. Victorri, Y. LeCun, J. Denker, Tangent Prop — a formalism for specifying selected invariances in an adaptive network, in Advances in Neural Information Processing Systems (1992), pp. 895–903

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Caterini, A.L., Chang, D.E. (2018). Generic Representation of Neural Networks. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75304-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75303-4

  • Online ISBN: 978-3-319-75304-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics