Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections

Monti, Ricardo Pio; Tootoonian, Sina; Cao, Robin

doi:10.1007/978-3-030-01424-7_44

Ricardo Pio Monti¹⁸,
Sina Tootoonian¹⁸ &
Robin Cao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11141))

Included in the following conference series:

International Conference on Artificial Neural Networks

8672 Accesses
10 Citations

Abstract

A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect and is often competitive with ResNets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Unless stated otherwise we will assume \(\mathcal {F}\) retains the dimension of \(\mathbf {x}_l\) and set \(\mathbf {W}_l'\) to the identity.

References

Balduzzi, D., et al.: The shattered gradients problem: If ResNets are the answer, then what is the question? In: ICML (2017)
Google Scholar
Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet Google Scholar
Bengio, Y.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Book Google Scholar
Clevert, D., et al.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)
Google Scholar
Glorot, X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)
Google Scholar
Goodfellow, I., et al.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Greff, K., et al.: Highway and residual networks learn unrolled iterative estimation. In: ICLR (2017)
Google Scholar
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hochreiter, S., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)
Google Scholar
Huang, G., et al.: Densely connected convolutional networks. In: CVPR (2016)
Google Scholar
Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Klambauer, G., et al.: Self-normalizing neural networks. In: NIPS (2017)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lee, C.-Y., et al.: Deeply-supervised nets. In: AISTATS (2015)
Google Scholar
Raiko, T., et al.: Deep learning made easier by linear transformations in perceptrons. In: AISTATS (2012)
Google Scholar
Schraudolph, N.N.: Centering neural network gradient factors. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 207–226. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_11
Chapter Google Scholar
Srivastava, R., et al.: Training very deep networks. In: NIPS (2015)
Google Scholar
Xiao, H., et al.: Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMCV (2016)
Google Scholar
Zagoruyko, S., Komodakis, N.: DiracNets: training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Gatsby Computational Neuroscience Unit, UCL, London, W1T 4JG, UK
Ricardo Pio Monti, Sina Tootoonian & Robin Cao

Authors

Ricardo Pio Monti
View author publications
You can also search for this author in PubMed Google Scholar
Sina Tootoonian
View author publications
You can also search for this author in PubMed Google Scholar
Robin Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Pio Monti .

Editor information

Editors and Affiliations

Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Open University of Cyprus, Latsia, Cyprus
Yannis Manolopoulos
CITEC Bielefeld University, Bielefeld, Germany
Barbara Hammer
Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Piraeus, Piraeus, Greece
Ilias Maglogiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monti, R.P., Tootoonian, S., Cao, R. (2018). Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-01424-7_44
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01423-0
Online ISBN: 978-3-030-01424-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics