Abstract
A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect and is often competitive with ResNets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Unless stated otherwise we will assume \(\mathcal {F}\) retains the dimension of \(\mathbf {x}_l\) and set \(\mathbf {W}_l'\) to the identity.
References
Balduzzi, D., et al.: The shattered gradients problem: If ResNets are the answer, then what is the question? In: ICML (2017)
Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Bengio, Y.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Clevert, D., et al.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)
Glorot, X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)
Goodfellow, I., et al.: Deep Learning. MIT Press, Cambridge (2016)
Greff, K., et al.: Highway and residual networks learn unrolled iterative estimation. In: ICLR (2017)
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Hochreiter, S., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)
Huang, G., et al.: Densely connected convolutional networks. In: CVPR (2016)
Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Klambauer, G., et al.: Self-normalizing neural networks. In: NIPS (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Lee, C.-Y., et al.: Deeply-supervised nets. In: AISTATS (2015)
Raiko, T., et al.: Deep learning made easier by linear transformations in perceptrons. In: AISTATS (2012)
Schraudolph, N.N.: Centering neural network gradient factors. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 207–226. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_11
Srivastava, R., et al.: Training very deep networks. In: NIPS (2015)
Xiao, H., et al.: Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMCV (2016)
Zagoruyko, S., Komodakis, N.: DiracNets: training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Monti, R.P., Tootoonian, S., Cao, R. (2018). Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-01424-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01423-0
Online ISBN: 978-3-030-01424-7
eBook Packages: Computer ScienceComputer Science (R0)