Skip to main content
Log in

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, which has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact networks and large number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus reducing the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to 4\(\times \) more compact networks in terms of the number of parameters at similar or better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. A low-level CUDA implementation of the DAU convolution filters are available in Caffe as well as Tensorflow at: https://github.com/skokec/DAU-ConvNet-caffe and https://github.com/skokec/DAU-ConvNet.

  2. Note that reasonable aggregation perimeter value \(\sigma \) can in fact be estimated for a given problem by pre-training using the derivatives in Eq. (8), but using fixed value has proven sufficient. See Sect. 4.1 for the analysis of different choices of this parameter.

  3. Our current implementation in CUDA allows only distances up to 4 or 8 pixels. This limitation can be overcome by modifying the implementation.

  4. DAU layers with stride operation are not yet implemented.

  5. Current implementation of DAUs requires an even number of channels.

  6. https://github.com/jiangsutx/SRN-Deblur.

References

  • Amidror, I. (2013). Mastering the discrete Fourier transform in one, two or several dimensions. Berlin: Springer.

    Book  Google Scholar 

  • Bruna, J., & Mallat, S. (2013). Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1872–86. https://doi.org/10.1109/TPAMI.2012.230.

    Article  Google Scholar 

  • Chang, J., Gu, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2018). Structure-aware convolutional neural networks. In Proceedings of the neural information processing systems (pp. 1–10).

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016a). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In Pattern analysis and machine intelligence (pp. 1–14). arXiv:1606.00915.

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016b). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.

    Article  Google Scholar 

  • Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.

  • Chen, L. C., Zhu, Y., Papandreou, G., & Schroff, F. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on machine learning: Workshop on music and machine learning.

  • Cheng, M. M., Zhang, Z., Lin, W. Y., & Torr, P. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In Computer vision and pattern recognition (pp. 3286–3293). IEEE. https://doi.org/10.1109/CVPR.2014.414.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.350.

  • Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In International conference on computer vision. https://doi.org/10.1051/0004-6361/201527329.

  • Eigen, D., Rolfe, J., Fergus, R., & Lecun, Y. (2014). Understanding deep architectures using a recursive convolutional network (pp. 1–9). arXiv:1312.1847v2.

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2011). The Pascal visual object classes challenge 2011 (VOC2011) results. Retrieved December 17, 2019, from http://host.robots.ox.ac.uk/pascal/VOC/voc2011/index.html.

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (vol. 9, pp. 249–256).

  • Hariharan, B., Arbel, P., Bourdev, L., Maji, S., Malik, J., Berkeley, U. C., Systems, A., Ave, P., & Jose, S. (2011). Semantic contours from inverse detectors. In International conference on computer vision.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European conference on computer vision (pp. 346–361).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In CVPR (pp. 171–180). https://doi.org/10.3389/fpsyg.2013.00124.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016b) Identity mappings in deep residual networks. In European conference on computer vision (vol. 9908, pp. 630–645). LNCS. https://doi.org/10.1007/978-3-319-46493-0_38.

  • Holschneider, M., Kronland-Martinet, R., Morlet, J., & Tchamitchian, P. (1990). A real-time algorithm for signal analysis with the help of the wavelet transform. In J. M. Combes, A. Grossmann, & P. Tchamitchian (Eds.), Wavelets (pp. 286–297). Berlin: Springer.

    Chapter  Google Scholar 

  • Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50\(\times \)fewer parameters and \(<0.5\) MB model size (pp 1–13). https://doi.org/10.1007/978-3-319-24553-9.

  • Jacobsen, J. H., van Gemert, J., Lou, Z., & Smeulders, A. W. M. (2016). Structured receptive fields in CNNs. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2610–2619). https://doi.org/10.1109/CVPR.2016.286.

  • Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In British machine vision conference (p. 7). https://doi.org/10.5244/C.28.88.

  • Jeon, Y., & Kim, J. (2017). Active convolution: Learning the shape of convolution for image classification. https://doi.org/10.1109/CVPR.2017.200.

  • Kaiming, H., Gkioxara, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In International conference on computer vision (pp. 2961–2969). arXiv:1703.06870.

  • Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (pp. 1–13). arXiv:1412.6980v5.

  • Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Science Department, University of Toronto, Tech Report (pp. 1–60).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105. arXiv:1102.0183.

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2323. https://doi.org/10.1109/5.726791.

    Article  Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8828, 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965.

    Article  Google Scholar 

  • Luan, S., Zhang, B., Chen, C., Cao, X., Ye, Q., Han, J., & Liu, J. (2017). Gabor convolutional networks. In British machine vision conference (pp. 1–12). arXiv:1705.01450.

  • Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In Computer vision and pattern recognition (CVPR) (pp. 2718–2726). https://doi.org/10.1109/ICCV.2017.296.

  • Luo, W., Li, Y., Urtasun, R., & Richard, Z. (2016). Understanding the effective receptive field in deep convolutional neural networks. In NIPS. arXiv:1701.04128.

  • Nah, S., Kim, T. H., & Lee, K. M. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In Computer vision and pattern recognition (pp. 3883–3891). https://doi.org/10.1109/CVPR.2017.35.

  • Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. https://doi.org/10.1109/IJCNN.2015.7280696.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention—MICCAI 2015 (pp. 234–241). https://doi.org/10.1007/978-3-319-24574-4_28.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision , 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.

    Article  MathSciNet  Google Scholar 

  • Shelhamer, E., Long, J., & Darrell, T. (2016). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. https://doi.org/10.1109/TPAMI.2016.2572683.

    Article  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (pp 1–14). arXiv:1409.1556v6.

  • Tabernik, D., Kristan, M., & Leonardis, A. (2018). Spatially-adaptive filter units for deep neural networks. In Computer vision and pattern recognition (pp. 9388–9396). arXiv:1711.11473.

  • Tabernik, D., Kristan, M., Wyatt, J. L., & Leonardis, A. (2016). Towards deep compositional networks. In International conference on pattern recognition. arXiv:1609.03795.

  • Tao, X., Gao, H., Wang, Y., Shen, X., Wang, J., & Jia, J. (2018). Scale-recurrent network for deep image deblurring. In Computer vision and pattern recognition (pp. 8174–8182). https://doi.org/10.1109/CVPR.2018.00853.

  • Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2017.634.

  • Yu, F., Koltun, V., & Funkhouser, T. (2017). Dilated residual networks. In Computer vision and pattern recognition. arxiv:1705.09914.

Download references

Acknowledgements

The authors would like to thank Hector Basevi for his valuable comments and suggestion on improving the paper. This work was supported in part by the following research projects and programs: Project GOSTOP C3330-16-529000, DIVID J2-9433 and ViAMaRo L2-6765, Program P2-0214 financed by Slovenian Research Agency ARRS, and MURI Project financed by MoD/Dstl and EPSRC through EP/N019415/1 Grant. We thank Vitjan Zavrtanik for his contribution in porting the DAUs to the TensorFlow framework.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Domen Tabernik.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tabernik, D., Kristan, M. & Leonardis, A. Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks. Int J Comput Vis 128, 2049–2067 (2020). https://doi.org/10.1007/s11263-019-01282-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01282-1

Keywords

Navigation