Recent years, many researches attempt to open the black box of deep neural networks and propose a various of theories to understand it. Among them, information bottleneck (IB) theory claims that there are two distinct phases consisting of fitting phase and compression phase in the course of training. This statement attracts many attentions since its success in explaining the inner behavior of feedforward neural networks. In this paper, we employ IB theory to understand the dynamic behavior of convolutional neural networks (CNNs) and investigate how the fundamental features such as convolutional layer width, kernel size, network depth, pooling layers and multi-fully connected layer have impact on the performance of CNNs. In particular, through a series of experimental analysis on benchmark of MNIST and Fashion-MNIST, we demonstrate that the compression phase is not observed in all these cases. This shows us the CNNs have a rather complicated behavior than feedforward neural networks.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Advani MS, Saxe AM (2017) High-dimensional dynamics of generalization error in neural networks. Preprint arXiv:1710.03667
Amjad RA, Geiger BC (2019) Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans Pattern Anal Mach Intell 42:2225–2239
Chechik G, Globerson A, Tishby N, Weiss Y (2005) Information bottleneck for Gaussian variables. J Mach Learn Res 6(Jan):165–188
Dai B, Zhu C, Wipf D (2018) Compressing neural networks using the variational information bottleneck. Preprint arXiv:1802.10399
Elidan G, Friedman N (2005) Learning hidden variable networks: the information bottleneck approach. J Mach Learn Res 6(Jan):81–127
Friedman N, Mosenzon O, Slonim N, Tishby N (2013) Multivariate information bottleneck. Preprint arXiv:1301.2270
Gabrié M, Manoel A, Luneau C, Macris N, Krzakala F, Zdeborová L et al (2018) Entropy and mutual information in models of deep neural networks. In: Advances in neural information processing systems, pp 1821–1831
Goldfeld Z, Berg E, Greenewald K, Melnyk I, Nguyen N, Kingsbury B, Polyanskiy Y (2018) Estimating information flow in deep neural networks. Preprint arXiv:1810.05728
Goldfeld Z, Van Den Berg E, Greenewald K, Melnyk I, Nguyen N, Kingsbury B, Polyanskiy Y (2019) Estimating information flow in deep neural networks. In: Proceedings of the 36th international conference on machine learning, vol 97, pp 2299–2308
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, New York
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2019) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):93
Hsu WH, Kennedy LS, Chang SF (2006) Video search reranking via information bottleneck principle. In: Proceedings of the 14th ACM international conference on multimedia, pp 35–44
Jónsson H, Cherubini G, Eleftheriou E (2019) Convergence of DNNS with mutual-information-based regularization. In: Proceedings of the Bayesian deep learning@ advances in neural information processing systems (NeurIPS), Vancouver
Kadmon J, Sompolinsky H (2016) Optimal architectures in a solvable model of deep networks. In: Advances in neural information processing systems, pp 4781–4789
Kolchinsky A, Tracey B (2017) Estimating mixture entropy with pairwise distances. Entropy 19(7):361
Kolchinsky A, Tracey BD, Wolpert DH (2019) Nonlinear information bottleneck. Entropy 21(12):1181
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Painsky A, Tishby N (2017) Gaussian lower bound for the information bottleneck limit. J Mach Learn Res 18(1):7908–7936
Poole B, Ozair S, Oord A, Alemi AA, Tucker G (2019) On variational bounds of mutual information. Preprint arXiv:1905.06922
Saxe AM, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey BD, Cox DD (2019) On the information bottleneck theory of deep learning. J Stat Mech Theory Exp 2019(12):124020
Saxe AM, Mcclelland JL, Ganguli S (2014) Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In: In International conference on learning representations. Citeseer, New York
Shamir O, Sabato S, Tishby N (2010) Learning and generalization with the information bottleneck. Theoret Comput Sci 411(29–30):2696–2711
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. Preprint arXiv:1703.00810
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 208–215
Strouse D, Schwab DJ (2017) The deterministic information bottleneck. Neural Comput 29(6):1611–1630
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tishby N (2000) The information bottleneck method. Computing Research Repository (CoRR)
Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE information theory workshop (ITW). IEEE, New York, pp 1–5
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans Image Process 28(9):4376–4386
Wang Q, Yuan Z, Du Q, Li X (2018) Getnet: a general end-to-end 2-D CNN framework for hyperspectral image change detection. IEEE Trans Geosci Remote Sens 57(1):3–13
Yu S, Principe JC (2019) Understanding autoencoders with information theoretic concepts. Neural Netw 117:104–123
Yu S, Wickstrøm K, Jenssen R, Principe JC (2020) Understanding convolutional neural networks with information theory: an initial exploration. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.2968509
Yu Y, Chan KHR, You C, Song C, Ma Y (2020) Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Preprint arXiv:2006.08558
Our research is supported by the Tianjin Natural Science Foundation of China (20JCYBJC00500), the Science and Technology Development Fund of Tianjin Education Commission for Higher Education (2018KJ217).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
In order to further verify our conclusions, we conduct additional experiments on the CIFAR-10 dataset . This dataset consists of 60,000 \(32\times 32\) colour images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images.
In this experiment, the whole 50,000 training images and 10,000 test images are selected as our training dataset and test dataset respectively, which is the only different setting from Experiments and discussion section. Furthermore, because of the arithmetic of computing mutual information, we choose to average the image of three channels and turn it into a signal channel as input data.
The Figs. 9 and 10 show the MI with different widths and depths on training data respectively. Figure 11 shows the MI with pooling layer on test data. These results offer more proof about the IB theory.
About this article
Cite this article
Li, J., Liu, D. Information Bottleneck Theory on Convolutional Neural Networks. Neural Process Lett (2021). https://doi.org/10.1007/s11063-021-10445-6
- Information bottleneck
- Convolutional neural networks
- Deep learning
- Representation learning