Advertisement

Adversarial Network Compression

  • Vasileios BelagiannisEmail author
  • Azade Farshad
  • Fabio Galasso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

Neural network compression has recently received much attention due to the computational requirements of modern deep models. In this work, our objective is to transfer knowledge from a deep and accurate model to a smaller one. Our contributions are threefold: (i) we propose an adversarial network compression approach to train the small student network to mimic the large teacher, without the need for labels during training; (ii) we introduce a regularization scheme to prevent a trivially-strong discriminator without reducing the network capacity and (iii) our approach generalizes on different teacher-student models.

In an extensive evaluation on five standard datasets, we show that our student has small accuracy drop, achieves better performance than other knowledge transfer approaches and it surpasses the performance of the same network trained with labels. In addition, we demonstrate state-of-the-art results compared to other compression strategies.

Notes

Acknowledgements

This research was partially supported by BMWi - Federal Ministry for Economic Affairs and Energy (MEC-View Project).

References

  1. 1.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)Google Scholar
  3. 3.
    Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 468–475. IEEE (2017)Google Scholar
  4. 4.
    Buciluâ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)Google Scholar
  5. 5.
    Carreira-Perpinán, M.A., Idelbayev, Y.: “learning-compression” algorithms for neural net pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8532–8541 (2018)Google Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  7. 7.
    Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. In: International Conference on Machine Learning, pp. 2285–2294 (2015)Google Scholar
  8. 8.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016)
  9. 9.
    Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)Google Scholar
  10. 10.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  11. 11.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)Google Scholar
  12. 12.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)Google Scholar
  13. 13.
    Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
  14. 14.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  15. 15.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
  16. 16.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  17. 17.
    Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp. 164–171 (1993)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  20. 20.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  21. 21.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  22. 22.
    Huang, G., Liu, S., van der Maaten, L., Weinberger, K.Q.: CondenseNet: an efficient DenseNet using learned group convolutions. Group 3(12), 11 (2017)Google Scholar
  23. 23.
    Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_39CrossRefGoogle Scholar
  24. 24.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: alexnet-level accuracy with 50X fewer parameters and \(<\)0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
  25. 25.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016)
  26. 26.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
  27. 27.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  29. 29.
    Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv preprint arXiv:1412.6553 (2014)
  30. 30.
    LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)Google Scholar
  31. 31.
    Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
  32. 32.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. arXiv preprint arXiv:1608.08710 (2016)
  33. 33.
    Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  34. 34.
    Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
  35. 35.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  36. 36.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014)Google Scholar
  37. 37.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (2011)Google Scholar
  38. 38.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  39. 39.
    Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78. ACM (2004)Google Scholar
  40. 40.
    Polino, A., Pascanu, R., Alistarh, D.: Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018)
  41. 41.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  42. 42.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
  43. 43.
    Rigamonti, R., Sironi, A., Lepetit, V., Fua, P.: Learning separable filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2754–2761 (2013)Google Scholar
  44. 44.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
  45. 45.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  46. 46.
    Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. 4(2), 234–242 (1992)CrossRefGoogle Scholar
  47. 47.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  48. 48.
    Soudry, D., Hubara, I., Meir, R.: Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. In: Advances in Neural Information Processing Systems, pp. 963–971 (2014)Google Scholar
  49. 49.
    Ström, N.: Phoneme probability estimation with dynamic sparsely connected artificial neural networks. Free Speech J. 5, 1–41 (1997)Google Scholar
  50. 50.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  51. 51.
    Tung, F., Mori, G.: CLIP-Q: deep network compression learning by in-parallel pruning-quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7873–7882 (2018)Google Scholar
  52. 52.
    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. arXiv preprint arXiv:1702.05464 (2017)
  53. 53.
    Urban, G., et al.: Do deep convolutional nets really need to be deep and convolutional? arXiv preprint arXiv:1603.05691 (2016)
  54. 54.
    Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)Google Scholar
  55. 55.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
  56. 56.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)
  57. 57.
    Xu, Z., Hsu, Y.C., Huang, J.: Training student networks for acceleration with conditional adversarial networks. In: BMVC. British Machine Vision Association (2018)Google Scholar
  58. 58.
    Yang, Z., et al.: Deep fried ConvNets. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1476–1483 (2015)Google Scholar
  59. 59.
    Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  60. 60.
    Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. Preprint at https://arxiv.org/abs/1711.05908 (2017)
  61. 61.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
  62. 62.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924 (2018)
  63. 63.
    Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
  64. 64.
    Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNs with low-precision weights (2017)Google Scholar
  65. 65.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Vasileios Belagiannis
    • 1
    Email author
  • Azade Farshad
    • 1
    • 2
  • Fabio Galasso
    • 1
  1. 1.Innovation OSRAM GmbHGarching b. MünchenGermany
  2. 2.Technische Universität MünchenGarching b. MünchenGermany

Personalised recommendations