Generic Representation of Neural Networks

Caterini, Anthony L.; Chang, Dong Eui

doi:10.1007/978-3-319-75304-1_3

Anthony L. Caterini¹⁶ &
Dong Eui Chang¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

4070 Accesses
1 Citations

Abstract

In the previous chapter, we took the first step towards creating a standard mathematical framework for neural networks by developing mathematical tools for vector-valued functions and their derivatives. We use these tools in this chapter to describe the operations employed in a generic layered neural network. Since neural networks have been empirically shown to reap performance benefits from stacking increasingly more layers in succession, it is important to develop a solid and concise theory for representing repeated function composition as it pertains to neural networks, and we see how this can be done in this chapter. Furthermore, since neural networks often learn their parameters via some form of gradient descent, we also compute derivatives of these functions with respect to the parameters at each layer. The derivative maps that we compute remain in the same vector space as the parameters, which allow us to perform gradient descent naturally over these vector spaces for each parameter. This approach contrasts with standard approaches to neural network modelling where the parameters are broken down into their components. We can avoid this unnecessary operation using the framework that we describe. We begin this chapter by formulating a generic neural network as the composition of parameter-dependent functions. We then introduce standard loss functions based on this composition for both the regression and classification cases, and take their derivatives with respect to the parameters at each layer. There are some commonalities between these two cases that we explore. In particular, both employ the same form of error backpropagation, albeit with a slightly differing initialization. We are able to express backpropagation in terms of adjoints of derivative maps over generic vector spaces, which has not been explored before. We then outline a concise algorithm for computing derivatives of the loss functions with respect to their parameters directly over the vector space in which the parameters are defined. This helps to clarify the theoretical results presented. We also model a higher-order loss function that imposes a penalty on the derivative towards the end of this chapter. This demonstrates one way to extend the framework that we have developed to a more complicated loss function and also demonstrates its flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Classification will not be explicitly considered in this section but it is not a difficult extension.
2.
\(\hat y_C - y\) in the case of classification.

References

A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
Google Scholar
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in NIPS, vol. 4 (1991), pp. 950–957
Google Scholar
E. Nering, Linear Algebra and Matrix Theory (Wiley, Hoboken, 1970)
MATH Google Scholar
S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011), pp. 833–840
Google Scholar
S. Ruder, An overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016, preprint)
Google Scholar
P. Simard, B. Victorri, Y. LeCun, J. Denker, Tangent Prop — a formalism for specifying selected invariances in an adaptive network, in Advances in Neural Information Processing Systems (1992), pp. 895–903
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, Oxford, Oxfordshire, UK
Anthony L. Caterini
School of Electrical Engineering Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Dong Eui Chang

Authors

Anthony L. Caterini
View author publications
You can also search for this author in PubMed Google Scholar
Dong Eui Chang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caterini, A.L., Chang, D.E. (2018). Generic Representation of Neural Networks. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-75304-1_3
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75303-4
Online ISBN: 978-3-319-75304-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics