FFnet: Residual Block-Based Convolutional Neural Network for Crowd Counting
Due to the nonuniform scale variations and severe occlusion, most current state-of-the-art approaches use multicolumn CNN architectures with different receptive fields to tackle these obstacles. We design a single-column network to verify the necessity of multicolumn network, and we find that under similar number of parameters and size of receptive field, single network is able to perform as well as multicolumn network. Following that, we propose a single-column network called FFnet based on residual block. FFnet is a fully convolutional network and easy to train. We perform extensive experiments on Shanghaitech dataset and the UCF_CC_50 dataset, and the results show that our method achieves a better performance than Switch-CNN with nearly half number of parameters, and a closing performance to the state-of-the-art model CP-CNN with almost one-tenth parameters.
KeywordsCrowd counting ResNet Density map estimation Multi column Receptive field
This work was supported in part by the National Sciences Foundation for the Distinguished Young Scholars of China under Grant 61525103, and the Shenzhen Fundamental Research Project under Grant JCYJ20150930150304185.
- 2.Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: International Conference on Neural Information Processing Systems (2010)Google Scholar
- 3.Ooro-Rubio, D., Lpez-Sastre, R.J.: Towards Perspective-Free Object Counting with Deep Learning, pp. 615–629 (2016)Google Scholar
- 4.Zhang, Y., et al.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
- 5.Sam, D.B., et al.: Switching convolutional neural network for crowd counting. Comput. Vis. Pattern Recognit. (2017)Google Scholar
- 6.Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision (2017)Google Scholar
- 7.Szegedy, C., et al.: Rethinking the inception architecture for computer vision. Comput. Sci. 2818–2826 (2015)Google Scholar
- 9.Ge, W., Collins, R.T.: Marked point processes for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR (2009)Google Scholar
- 10.Chan, A.B., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (2010)Google Scholar
- 11.Idrees, H., et al.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
- 12.Zhang, C., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
- 13.He, K., et al.: Deep Residual Learning for Image Recognition, pp. 770–778 (2015)Google Scholar
- 14.Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, pp. 448–456 (2015)Google Scholar