Abstract
In the previous chapter, we took the first step towards creating a standard mathematical framework for neural networks by developing mathematical tools for vector-valued functions and their derivatives. We use these tools in this chapter to describe the operations employed in a generic layered neural network. Since neural networks have been empirically shown to reap performance benefits from stacking increasingly more layers in succession, it is important to develop a solid and concise theory for representing repeated function composition as it pertains to neural networks, and we see how this can be done in this chapter. Furthermore, since neural networks often learn their parameters via some form of gradient descent, we also compute derivatives of these functions with respect to the parameters at each layer. The derivative maps that we compute remain in the same vector space as the parameters, which allow us to perform gradient descent naturally over these vector spaces for each parameter. This approach contrasts with standard approaches to neural network modelling where the parameters are broken down into their components. We can avoid this unnecessary operation using the framework that we describe. We begin this chapter by formulating a generic neural network as the composition of parameter-dependent functions. We then introduce standard loss functions based on this composition for both the regression and classification cases, and take their derivatives with respect to the parameters at each layer. There are some commonalities between these two cases that we explore. In particular, both employ the same form of error backpropagation, albeit with a slightly differing initialization. We are able to express backpropagation in terms of adjoints of derivative maps over generic vector spaces, which has not been explored before. We then outline a concise algorithm for computing derivatives of the loss functions with respect to their parameters directly over the vector space in which the parameters are defined. This helps to clarify the theoretical results presented. We also model a higher-order loss function that imposes a penalty on the derivative towards the end of this chapter. This demonstrates one way to extend the framework that we have developed to a more complicated loss function and also demonstrates its flexibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Classification will not be explicitly considered in this section but it is not a difficult extension.
- 2.
\(\hat y_C - y\) in the case of classification.
References
A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in NIPS, vol. 4 (1991), pp. 950–957
E. Nering, Linear Algebra and Matrix Theory (Wiley, Hoboken, 1970)
S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011), pp. 833–840
S. Ruder, An overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016, preprint)
P. Simard, B. Victorri, Y. LeCun, J. Denker, Tangent Prop — a formalism for specifying selected invariances in an adaptive network, in Advances in Neural Information Processing Systems (1992), pp. 895–903
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Caterini, A.L., Chang, D.E. (2018). Generic Representation of Neural Networks. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-75304-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75303-4
Online ISBN: 978-3-319-75304-1
eBook Packages: Computer ScienceComputer Science (R0)