Advertisement

Targeted Kernel Networks: Faster Convolutions with Attentive Regularization

  • Kashyap ChittaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance.

Keywords

Soft attention Region of interest Network acceleration 

References

  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016)Google Scholar
  2. 2.
    Almahairi, A., Ballas, N., Cooijmans, T., Zheng, Y., Larochelle, H., Courville, A.C.: Dynamic capacity networks. CoRR abs/1511.07838 (2015)Google Scholar
  3. 3.
    Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. CoRR abs/1412.7755 (2014)Google Scholar
  4. 4.
    Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: ICCV (2015)Google Scholar
  5. 5.
    Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR (2017)Google Scholar
  6. 6.
    Chen, T., Goodfellow, I.J., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. CoRR abs/1511.05641 (2015)Google Scholar
  7. 7.
    Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. CoRR abs/1504.04788 (2015)Google Scholar
  8. 8.
    Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. ArXiv e-prints (2017)Google Scholar
  9. 9.
    Chollet, F.: Keras (2015). https://github.com/fchollet/keras
  10. 10.
    Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24, 2151–2184 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Denton, E., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. CoRR abs/1404.0736 (2014)Google Scholar
  12. 12.
    Dieleman, S., Fauw, J.D., Kavukcuoglu, K.: Exploiting cyclic symmetry in convolutional neural networks. CoRR abs/1602.02660 (2016)Google Scholar
  13. 13.
    Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: a more complicated network with less inference complexity. CoRR abs/1703.08651 (2017). http://arxiv.org/abs/1703.08651
  14. 14.
    Ekman, P., Friesen, W., Hager, J.: Facs manual. In: A Human Face (2002). https://www.scirp.org/(S(i43dyn45teexjx455qlt3d2q))/reference/ReferencesPapers.aspx?ReferenceID=1850657
  15. 15.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)CrossRefGoogle Scholar
  16. 16.
    Figurnov, M., Vetrov, D.P., Kohli, P.: PerforatedCNNs: acceleration through elimination of redundant convolutions. CoRR abs/1504.08362 (2015). http://arxiv.org/abs/1504.08362
  17. 17.
    Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR (2017)Google Scholar
  18. 18.
    Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. In: NIPS (2017)Google Scholar
  19. 19.
    Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015)Google Scholar
  20. 20.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)Google Scholar
  21. 21.
    Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: ICML (2015)Google Scholar
  22. 22.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  24. 24.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015)Google Scholar
  25. 25.
    Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R.J., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. CoRR abs/1511.05284 (2015)Google Scholar
  26. 26.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ArXiv e-prints (2015)Google Scholar
  27. 27.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, July 2017Google Scholar
  28. 28.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)Google Scholar
  29. 29.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. CoRR abs/1506.02025 (2015)Google Scholar
  30. 30.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. CoRR abs/1405.3866 (2014)Google Scholar
  31. 31.
    Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. ArXiv e-prints (2017)Google Scholar
  32. 32.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  33. 33.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  34. 34.
    Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS (2010)Google Scholar
  35. 35.
    Lebedev, V., Lempitsky, V.S.: Fast ConvNets using group-wise brain damage. CoRR abs/1506.02515 (2015)Google Scholar
  36. 36.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  37. 37.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. ArXiv e-prints (2014)Google Scholar
  38. 38.
    Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. CoRR abs/1704.03067 (2017)Google Scholar
  39. 39.
    Li, W., Abtahi, F., Zhu, Z., Yin, L.: EAC-NET: a region-based deep enhancing and cropping approach for facial action unit detection. CoRR abs/1702.02925 (2017)Google Scholar
  40. 40.
    Liao, Z., Carneiro, G.: Competitive multi-scale convolution. CoRR abs/1511.05635 (2015)Google Scholar
  41. 41.
    Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)Google Scholar
  42. 42.
    Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)Google Scholar
  43. 43.
    Lucey, P., Cohn, J.F., Prkachin, K.M., Solomon, P.E., Matthews, I.: Painful data: the UNBC-McMaster shoulder pain expression archive database. In: Face and Gesture 2011 (2011)Google Scholar
  44. 44.
    Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: CVPR (2017)Google Scholar
  45. 45.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)Google Scholar
  46. 46.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016)Google Scholar
  47. 47.
    Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: SBNet: sparse blocks network for fast inference. CoRR abs/1801.02108 (2018). http://arxiv.org/abs/1801.02108
  48. 48.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015)Google Scholar
  49. 49.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. CoRR abs/1412.6550 (2014)Google Scholar
  50. 50.
    Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. ArXiv e-prints (2017)Google Scholar
  51. 51.
    Schmidhuber, J., Huber, R.: Learning to generate artificial fovea trajectories for target detection. Int. J. Neural Syst. 2, 125–134 (1991)CrossRefGoogle Scholar
  52. 52.
    Seo, P.H., Lin, Z., Cohen, S., Shen, X., Han, B.: Hierarchical attention networks. CoRR abs/1606.02393 (2016)Google Scholar
  53. 53.
    Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. CoRR abs/1511.07394 (2015)Google Scholar
  54. 54.
    Shyam, P., Gupta, S., Dukkipati, A.: Attentive recurrent comparators. In: ICML (2017)Google Scholar
  55. 55.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  56. 56.
    Srinivas, S., Babu, R.V.: Data-free parameter pruning for deep neural networks. CoRR abs/1507.06149 (2015)Google Scholar
  57. 57.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  58. 58.
    Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The German traffic sign recognition Benchmark: a multi-class classification competition. In: IEEE International Joint Conference on Neural Networks (2011)Google Scholar
  59. 59.
    Stollenga, M.F., Masci, J., Gomez, F.J., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. CoRR abs/1407.3068 (2014)Google Scholar
  60. 60.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)Google Scholar
  61. 61.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML (2013)Google Scholar
  62. 62.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016)Google Scholar
  63. 63.
    Tai, C., Xiao, T., Wang, X., E, W.: Convolutional neural networks with low-rank regularization. CoRR abs/1511.06067 (2015)Google Scholar
  64. 64.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  65. 65.
    Wu, B., Iandola, F.N., Jin, P.H., Keutzer, K.: SqueezeDet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. CoRR abs/1612.01051 (2016)Google Scholar
  66. 66.
    Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. CoRR abs/1411.6447 (2014)Google Scholar
  67. 67.
    Xiong, C., Merity, S., Socher, R.: Dynamic memory networks for visual and textual question answering. CoRR abs/1603.01417 (2016)Google Scholar
  68. 68.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. CoRR abs/1502.03044 (2015)Google Scholar
  69. 69.
    Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: CVPR (2016)Google Scholar
  70. 70.
    Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. CoRR abs/1612.03928 (2016)Google Scholar
  71. 71.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. CoRR abs/1605.07146 (2016)Google Scholar
  72. 72.
    Zhang, Q., Wu, Y.N., Zhu, S.: Interpretable convolutional neural networks. CoRR abs/1710.00935 (2017)Google Scholar
  73. 73.
    Zhao, K., Chu, W.S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations