Skip to main content

Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2018 (ICANN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11141))

Included in the following conference series:

Abstract

A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect and is often competitive with ResNets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Unless stated otherwise we will assume \(\mathcal {F}\) retains the dimension of \(\mathbf {x}_l\) and set \(\mathbf {W}_l'\) to the identity.

References

  1. Balduzzi, D., et al.: The shattered gradients problem: If ResNets are the answer, then what is the question? In: ICML (2017)

    Google Scholar 

  2. Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  Google Scholar 

  3. Bengio, Y.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  4. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

    Book  Google Scholar 

  5. Clevert, D., et al.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)

    Google Scholar 

  6. Glorot, X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)

    Google Scholar 

  7. Goodfellow, I., et al.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  8. Greff, K., et al.: Highway and residual networks learn unrolled iterative estimation. In: ICLR (2017)

    Google Scholar 

  9. He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)

    Google Scholar 

  10. He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  12. Hochreiter, S., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)

    Google Scholar 

  13. Huang, G., et al.: Densely connected convolutional networks. In: CVPR (2016)

    Google Scholar 

  14. Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)

    Article  Google Scholar 

  15. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  16. Klambauer, G., et al.: Self-normalizing neural networks. In: NIPS (2017)

    Google Scholar 

  17. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  18. Lee, C.-Y., et al.: Deeply-supervised nets. In: AISTATS (2015)

    Google Scholar 

  19. Raiko, T., et al.: Deep learning made easier by linear transformations in perceptrons. In: AISTATS (2012)

    Google Scholar 

  20. Schraudolph, N.N.: Centering neural network gradient factors. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 207–226. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_11

    Chapter  Google Scholar 

  21. Srivastava, R., et al.: Training very deep networks. In: NIPS (2015)

    Google Scholar 

  22. Xiao, H., et al.: Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017)

    Google Scholar 

  23. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMCV (2016)

    Google Scholar 

  24. Zagoruyko, S., Komodakis, N.: DiracNets: training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017)

  25. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Pio Monti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Monti, R.P., Tootoonian, S., Cao, R. (2018). Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01424-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01423-0

  • Online ISBN: 978-3-030-01424-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics