Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Tabernik, Domen; Kristan, Matej; Leonardis, Aleš

doi:10.1007/s11263-019-01282-1

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Published: 02 January 2020

Volume 128, pages 2049–2067, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1192 Accesses
14 Citations
3 Altmetric
Explore all metrics

Abstract

Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, which has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact networks and large number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus reducing the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to 4\(\times \) more compact networks in terms of the number of parameters at similar or better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Simple and Light-Weight Attention Module for Convolutional Neural Networks

Article 28 January 2020

C-volution: A Hybrid Operator for Visual Recognition

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

A low-level CUDA implementation of the DAU convolution filters are available in Caffe as well as Tensorflow at: https://github.com/skokec/DAU-ConvNet-caffe and https://github.com/skokec/DAU-ConvNet.
Note that reasonable aggregation perimeter value \(\sigma \) can in fact be estimated for a given problem by pre-training using the derivatives in Eq. (8), but using fixed value has proven sufficient. See Sect. 4.1 for the analysis of different choices of this parameter.
Our current implementation in CUDA allows only distances up to 4 or 8 pixels. This limitation can be overcome by modifying the implementation.
DAU layers with stride operation are not yet implemented.
Current implementation of DAUs requires an even number of channels.
https://github.com/jiangsutx/SRN-Deblur.

References

Amidror, I. (2013). Mastering the discrete Fourier transform in one, two or several dimensions. Berlin: Springer.
Book Google Scholar
Bruna, J., & Mallat, S. (2013). Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1872–86. https://doi.org/10.1109/TPAMI.2012.230.
Article Google Scholar
Chang, J., Gu, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2018). Structure-aware convolutional neural networks. In Proceedings of the neural information processing systems (pp. 1–10).
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016a). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In Pattern analysis and machine intelligence (pp. 1–14). arXiv:1606.00915.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016b). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.
Article Google Scholar
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
Chen, L. C., Zhu, Y., Papandreou, G., & Schroff, F. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on machine learning: Workshop on music and machine learning.
Cheng, M. M., Zhang, Z., Lin, W. Y., & Torr, P. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In Computer vision and pattern recognition (pp. 3286–3293). IEEE. https://doi.org/10.1109/CVPR.2014.414.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.350.
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In International conference on computer vision. https://doi.org/10.1051/0004-6361/201527329.
Eigen, D., Rolfe, J., Fergus, R., & Lecun, Y. (2014). Understanding deep architectures using a recursive convolutional network (pp. 1–9). arXiv:1312.1847v2.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2011). The Pascal visual object classes challenge 2011 (VOC2011) results. Retrieved December 17, 2019, from http://host.robots.ox.ac.uk/pascal/VOC/voc2011/index.html.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (vol. 9, pp. 249–256).
Hariharan, B., Arbel, P., Bourdev, L., Maji, S., Malik, J., Berkeley, U. C., Systems, A., Ave, P., & Jose, S. (2011). Semantic contours from inverse detectors. In International conference on computer vision.
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European conference on computer vision (pp. 346–361).
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In CVPR (pp. 171–180). https://doi.org/10.3389/fpsyg.2013.00124.
He, K., Zhang, X., Ren, S., & Sun, J. (2016b) Identity mappings in deep residual networks. In European conference on computer vision (vol. 9908, pp. 630–645). LNCS. https://doi.org/10.1007/978-3-319-46493-0_38.
Holschneider, M., Kronland-Martinet, R., Morlet, J., & Tchamitchian, P. (1990). A real-time algorithm for signal analysis with the help of the wavelet transform. In J. M. Combes, A. Grossmann, & P. Tchamitchian (Eds.), Wavelets (pp. 286–297). Berlin: Springer.
Chapter Google Scholar
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50\(\times \)fewer parameters and \(<0.5\) MB model size (pp 1–13). https://doi.org/10.1007/978-3-319-24553-9.
Jacobsen, J. H., van Gemert, J., Lou, Z., & Smeulders, A. W. M. (2016). Structured receptive fields in CNNs. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2610–2619). https://doi.org/10.1109/CVPR.2016.286.
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In British machine vision conference (p. 7). https://doi.org/10.5244/C.28.88.
Jeon, Y., & Kim, J. (2017). Active convolution: Learning the shape of convolution for image classification. https://doi.org/10.1109/CVPR.2017.200.
Kaiming, H., Gkioxara, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In International conference on computer vision (pp. 2961–2969). arXiv:1703.06870.
Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (pp. 1–13). arXiv:1412.6980v5.
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Science Department, University of Toronto, Tech Report (pp. 1–60).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105. arXiv:1102.0183.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2323. https://doi.org/10.1109/5.726791.
Article Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8828, 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965.
Article Google Scholar
Luan, S., Zhang, B., Chen, C., Cao, X., Ye, Q., Han, J., & Liu, J. (2017). Gabor convolutional networks. In British machine vision conference (pp. 1–12). arXiv:1705.01450.
Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In Computer vision and pattern recognition (CVPR) (pp. 2718–2726). https://doi.org/10.1109/ICCV.2017.296.
Luo, W., Li, Y., Urtasun, R., & Richard, Z. (2016). Understanding the effective receptive field in deep convolutional neural networks. In NIPS. arXiv:1701.04128.
Nah, S., Kim, T. H., & Lee, K. M. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In Computer vision and pattern recognition (pp. 3883–3891). https://doi.org/10.1109/CVPR.2017.35.
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. https://doi.org/10.1109/IJCNN.2015.7280696.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention—MICCAI 2015 (pp. 234–241). https://doi.org/10.1007/978-3-319-24574-4_28.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision , 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.
Article MathSciNet Google Scholar
Shelhamer, E., Long, J., & Darrell, T. (2016). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. https://doi.org/10.1109/TPAMI.2016.2572683.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (pp 1–14). arXiv:1409.1556v6.
Tabernik, D., Kristan, M., & Leonardis, A. (2018). Spatially-adaptive filter units for deep neural networks. In Computer vision and pattern recognition (pp. 9388–9396). arXiv:1711.11473.
Tabernik, D., Kristan, M., Wyatt, J. L., & Leonardis, A. (2016). Towards deep compositional networks. In International conference on pattern recognition. arXiv:1609.03795.
Tao, X., Gao, H., Wang, Y., Shen, X., Wang, J., & Jia, J. (2018). Scale-recurrent network for deep image deblurring. In Computer vision and pattern recognition (pp. 8174–8182). https://doi.org/10.1109/CVPR.2018.00853.
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2017.634.
Yu, F., Koltun, V., & Funkhouser, T. (2017). Dilated residual networks. In Computer vision and pattern recognition. arxiv:1705.09914.

Download references

Acknowledgements

The authors would like to thank Hector Basevi for his valuable comments and suggestion on improving the paper. This work was supported in part by the following research projects and programs: Project GOSTOP C3330-16-529000, DIVID J2-9433 and ViAMaRo L2-6765, Program P2-0214 financed by Slovenian Research Agency ARRS, and MURI Project financed by MoD/Dstl and EPSRC through EP/N019415/1 Grant. We thank Vitjan Zavrtanik for his contribution in porting the DAUs to the TensorFlow framework.

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Domen Tabernik, Matej Kristan & Aleš Leonardis
School of Computer Science, University of Birmingham, Birmingham, UK
Aleš Leonardis

Authors

Domen Tabernik
View author publications
You can also search for this author in PubMed Google Scholar
Matej Kristan
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Leonardis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Domen Tabernik.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabernik, D., Kristan, M. & Leonardis, A. Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks. Int J Comput Vis 128, 2049–2067 (2020). https://doi.org/10.1007/s11263-019-01282-1

Download citation

Received: 15 February 2019
Accepted: 11 December 2019
Published: 02 January 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11263-019-01282-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Simple and Light-Weight Attention Module for Convolutional Neural Networks

C-volution: A Hybrid Operator for Visual Recognition

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Simple and Light-Weight Attention Module for Convolutional Neural Networks

C-volution: A Hybrid Operator for Visual Recognition

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation