Abstract
One of the most important steps in training a neural network is choosing its depth. Theoretically, it is possible to construct a complex decision-making function by cascading a number of shallow networks. It can produce a similar in accuracy result while providing a significant performance cost benefit. In practice, at some point, just increasing the depth of a network can actually decrease its performance due to overlearning. In literature, this is called “vanishing gradient descent”.
Vanishing gradient descent is observed as a vanishing decrease of magnitudes of gradients of weights for each subsequent layer, effectively preventing the weight from changing its value in the lower layers of a deep network when applying the backward propagation of errors algorithm.
There is an approach called Residual Network (ResNet) to solve this problem for standard convolutional networks. However, the ResNet solves the problem only partially, as the resulting network is not sequential, but is an ensemble of shallow networks with all drawbacks typical for them.
In this article, we investigate a convolutional network with fully connected layers (so-called network in network architecture, NiN) and suggest another way to build an ensemble of shallow networks. In our case, we gradually reduce the number of parallel connections by using sequential network connections.
This allows to eliminate the influence of the vanishing gradient descent and to reduce the redundancy of the network by using all weight coefficients and not using residual blocks as ResNet does.
For this method to work it is not required to change the network architecture, but only needed to properly initialize its weights.
Keywords
This work was financially supported by the Government of the Russian Federation (Grant 08-08).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, S., Terrail, J.O.D., Jurie, F.: Recent advances in object detection in the age of deep convolutional neural networks, CoRR, vol. abs/1809.03193 (2018)
Zhao, Z., Zheng, P., Xu, S., Wu, X.: Object detection with deep learning: a review, CoRR, vol. abs/1807.05511 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, ArXiv e-prints, June 2015
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection, CoRR, vol. abs/1506.02640 (2015)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement, arXiv (2018)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection, CoRR, vol. abs/1708.02002 (2017)
Liu, W., et al.: SSD: single shot multibox detector, CoRR, vol. abs/1512.02325 (2015)
Wong, A., Shafiee, M.J., Li, F., Chwyl, B.: Tiny SSD: a tiny single-shot detection deep convolutional neural network for real-time embedded object detection, CoRR, vol. abs/1802.06488 (2018)
Lin, M., Chen, Q., Yan, S.: Network in network, CoRR, vol. abs/1312.4400 (2013)
Alexeev, A., Matveev, Y., Kukharev, G.: A3Net: fast end-to-end object detector on neural network for scenes with arbitrary size. In: Robotics and Technical Cybernetics, vol. 3, pp. 43–52, September 2018
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks, CoRR, vol. abs/1311.2901 (2013)
Gong, Y., Poellabauer, C.: Impact of aliasing on deep CNN-based end-to-end acoustic models. In: Proceedings of Interspeech 2018, pp. 2698–2702 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, CoRR, vol. abs/1512.03385 (2015)
Veit, A., Wilber, M.J., Belongie, S.J.: Residual networks are exponential ensembles of relatively shallow networks, CoRR, vol. abs/1605.06431 (2016)
Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units, CoRR, vol. abs/1504.00941 (2015)
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (IndRNN): building a longer and deeper RNN, CoRR, vol. abs/1803.04831 (2018)
Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks, CoRR, vol. abs/1511.06464 (2015)
Jing, L., et al.: Tunable efficient unitary neural networks (EUNN) and their application to RNN, CoRR, vol. abs/1612.05231 (2016)
Mathieu, M., LeCun, Y.: Fast approximation of rotations and Hessians matrices, CoRR, vol. abs/1404.7195 (2014)
Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4880–4888. Curran Associates, Inc. (2016)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks, CoRR, vol. abs/1706.02515 (2017)
Pennington, J., Schoenholz, S.S., Ganguli, S.: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice, CoRR, vol. abs/1711.04735 (2017)
Schoenholz, S.S., Gilmer, J., Ganguli, S., Sohl-Dickstein, J.: Deep information propagation, CoRR, vol. abs/1611.01232 (2016)
Yang, G., Pennington, J., Rao, V., Sohl-Dickstein, J., Schoenholz, S.S.: A mean field theory of batch normalization, CoRR, vol. abs/1902.08129 (2019)
Gilboa, D., et al.: Dynamical isometry and a mean field theory of LSTMs and GRUs, CoRR, vol. abs/1901.08987 (2019)
Chen, M., Pennington, J., Schoenholz, S.S.: Dynamical isometry and a mean field theory of RNNs: gating enables signal propagation in recurrent neural networks. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. JMLR Workshop and Conference Proceedings, vol. 80, pp. 872–881. JMLR.org (2018)
Schoenholz, S.S., Pennington, J., Sohl-Dickstein, J.: A correspondence between random neural networks and statistical field theory, CoRR, vol. abs/1710.06570 (2017)
Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S., Pennington, J.: Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 5393–5402. PMLR (2018)
Pennington, J., Schoenholz, S.S., Ganguli, S.: The emergence of spectral universality in deep networks. In: Storkey, A.J., Pérez-Cruz, F. (eds.) International Conference on Artificial Intelligence and Statistics, AISTATS 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, 9–11 April 2018. Proceedings of Machine Learning Research, vol. 84, pp. 1924–1932. PMLR (2018)
Zagoruyko, S., Komodakis, N.: DiracNets: training very deep neural networks without skip-connections, CoRR, vol. abs/1706.00388 (2017)
Shakhuro, V.I., Konushin, A.S.: Russian traffic sign images dataset. Comput. Opt. 40(2), 294–300 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alexeev, A., Matveev, Y., Matveev, A., Pavlenko, D. (2019). Residual Learning for FC Kernels of Convolutional Network. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)